Skip to main content
POST
/
api
/
r2r
/
v3
/
retrieval
/
rag
RAG Query
curl --request POST \
  --url https://api.intelligence.io.solutions/api/r2r/v3/retrieval/rag \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "query": "<string>",
  "search_mode": "custom",
  "search_settings": {
    "use_hybrid_search": false,
    "use_semantic_search": true,
    "use_fulltext_search": false,
    "filters": "<string>",
    "limit": 10,
    "offset": "0",
    "include_metadatas": true,
    "include_scores": true,
    "search_strategy": "vanilla",
    "hybrid_settings": {
      "full_text_weight": 1,
      "semantic_weight": 5,
      "full_text_limit": 200,
      "rrf_k": 50
    },
    "chunk_settings": {
      "index_measure": "l2_distance",
      "probes": 10,
      "ef_search": 40,
      "enabled": true
    },
    "graph_settings": {
      "limits": [
        "<any>"
      ],
      "enabled": true
    },
    "num_sub_queries": 5
  },
  "rag_generation_config": {
    "model": "<string>",
    "temperature": 123,
    "top_p": 123,
    "max_tokens_to_sample": 123,
    "stream": true,
    "functions": [
      "<any>"
    ],
    "tools": [
      "<any>"
    ],
    "add_generation_kwargs": [
      "<any>"
    ],
    "api_base": "<string>",
    "response_format": [
      {
        "Base Model": {}
      }
    ],
    "extended_thinking": false,
    "thinking_budget": 123,
    "reasoning_effort": "<string>"
  },
  "task_prompt": "<string>",
  "include_title_if_available": false,
  "include_web_search": false
}'
{
  "key": "value"
}
The RAG Query endpoint executes a Retrieval-Augmented Generation (RAG) workflow by combining semantic search, optional knowledge graph integration, and large language model (LLM) generation. It returns contextually grounded, source-cited responses derived from your document corpus and external web content (if enabled). This endpoint is ideal for applications that require explainable AI answers, document-grounded responses, and real-time contextual reasoning.

Key Features

  • Combined retrieval and generation: Merges vector search, optional graph traversal, and LLM output generation in one request.
  • Automatic source citation: Each referenced document includes a unique citation identifier.
  • Streaming and non-streaming modes: Supports token-level updates or full-response delivery.
  • Provider flexibility: Compatible with OpenAI, Anthropic, Ollama, and other LiteLLM-supported models.
  • Web search integration: Optionally augments internal context with real-time external data.

Model Support

ProviderDescription
OpenAIDefault provider supporting GPT-based models (gpt-4o, gpt-4o-mini, etc.).
AnthropicSupports Claude models (requires ANTHROPIC_API_KEY).
OllamaEnables local model execution via Ollama runtime.
LiteLLMProvides access to additional supported model providers.

Request Body

The request body combines search configuration (for retrieval) and generation configuration (for LLM behavior).
All search parameters available in the /search endpoint can be reused here, including filters, hybrid search, and graph-enhanced retrieval.

Generation Configuration

Control model behavior using the rag_generation_config object. Example:
{
    "model": "openai/gpt-4o-mini",
    "temperature": 0.7,
    "max_tokens": 1500,
    "stream": true
}
Parameters:
  • model: Specifies the model used for generation.
  • temperature: Controls output randomness (0 for deterministic, 1 for creative).
  • max_tokens: Sets maximum output length.
  • stream: Enables or disables token streaming for real-time responses.

Streaming Responses

When stream: true is enabled, the API emits Server-Sent Events (SSE) during processing.
Each event type corresponds to a distinct phase of the retrieval and generation workflow.
Event TypeDescription
search_resultsContains the initial search results from your documents.
messageStreams partial tokens as the model generates them.
citationEmits citation metadata when a source is referenced.
final_answerContains the complete, generated response with structured citations.
Example Response:
{
	"generated_answer": "DeepSeek-R1 is a model that demonstrates impressive performance...[1]",
	"search_results": { ... },
	"citations": [
    	{
        	"id": "cit.123456",
        	"object": "citation",
        	"payload": { ... }
    }
]
}

Authorizations

Authorization
string
header
required

The access token received from the authorization server in the OAuth 2.0 flow.

Body

application/json
query
string

The user's question

search_mode
enum<string>
default:custom

Default value of custom allows full control over search settings. Pre-configured search modes: basic: A simple semantic-based search. advanced: A more powerful hybrid search combining semantic and full-text. custom: Full control via search_settings. If filters or limit are provided alongside basic or advanced, they will override the default settings for that mode.

Available options:
basic,
advanced,
custom
search_settings
object

The search configuration object. If search_mode is custom, these settings are used as-is. For basic or advanced, these settings will override the default mode configuration. Common overrides include filters to narrow results and limit to control how many results are returned.

rag_generation_config
object

Configuration for RAG generation

task_prompt
string

Optional custom prompt to override default

include_title_if_available
boolean
default:false

Include document titles in responses when available

Include web search results provided to the LLM.

Response

200

key
string
Example:

"value"