Skip to main content
POST
/
api
/
r2r
/
v3
/
retrieval
/
rag
RAG Query
curl --request POST \
  --url https://api.intelligence.io.solutions/api/r2r/v3/retrieval/rag \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "query": "<string>",
  "search_mode": "custom",
  "search_settings": {
    "use_hybrid_search": false,
    "use_semantic_search": true,
    "use_fulltext_search": false,
    "filters": "<string>",
    "limit": 10,
    "offset": "0",
    "include_metadatas": true,
    "include_scores": true,
    "search_strategy": "vanilla",
    "hybrid_settings": {
      "full_text_weight": 1,
      "semantic_weight": 5,
      "full_text_limit": 200,
      "rrf_k": 50
    },
    "chunk_settings": {
      "index_measure": "l2_distance",
      "probes": 10,
      "ef_search": 40,
      "enabled": true
    },
    "graph_settings": {
      "limits": [
        "<any>"
      ],
      "enabled": true
    },
    "num_sub_queries": 5
  },
  "rag_generation_config": {
    "model": "<string>",
    "temperature": 123,
    "top_p": 123,
    "max_tokens_to_sample": 123,
    "stream": true,
    "functions": [
      "<any>"
    ],
    "tools": [
      "<any>"
    ],
    "add_generation_kwargs": [
      "<any>"
    ],
    "api_base": "<string>",
    "response_format": [
      {
        "Base Model": {}
      }
    ],
    "extended_thinking": false,
    "thinking_budget": 123,
    "reasoning_effort": "<string>"
  },
  "task_prompt": "<string>",
  "include_title_if_available": false,
  "include_web_search": false
}'
{
  "key": "value"
}
Execute a RAG (Retrieval-Augmented Generation) query. This endpoint combines search results with language model generation to produce accurate, contextually-relevant responses based on your document corpus. Features:
  • Combines vector search, optional knowledge graph integration, and LLM generation
  • Automatically cites sources with unique citation identifiers
  • Supports both streaming and non-streaming responses
  • Compatible with various LLM providers (OpenAI, Anthropic, etc.)
  • Web search integration for up-to-date information
Search Configuration: All search parameters from the search endpoint apply here, including filters, hybrid search, and graph-enhanced search. Generation Configuration: Fine-tune the language model’s behavior with rag_generation_config:
{
    "model": "openai/gpt-4o-mini",  // Model to use
    "temperature": 0.7,              // Control randomness (0-1)
    "max_tokens": 1500,              // Maximum output length
    "stream": true                   // Enable token streaming
}
Model Support:
  • OpenAI models (default)
  • Anthropic Claude models (requires ANTHROPIC_API_KEY)
  • Local models via Ollama
  • Any provider supported by LiteLLM
Streaming Responses: When stream: true is set, the endpoint returns Server-Sent Events with the following types:
  • search_results: Initial search results from your documents
  • message: Partial tokens as they’re generated
  • citation: Citation metadata when sources are referenced
  • final_answer: Complete answer with structured citations
Example Response:
{
"generated_answer": "DeepSeek-R1 is a model that demonstrates impressive performance...[1]",
"search_results": { ... },
"citations": [
    {
        "id": "cit.123456",
        "object": "citation",
        "payload": { ... }
    }
]
}

Authorizations

Authorization
string
header
required

The access token received from the authorization server in the OAuth 2.0 flow.

Body

application/json
query
string

The user's question

search_mode
enum<string>
default:custom

Default value of custom allows full control over search settings. Pre-configured search modes: basic: A simple semantic-based search. advanced: A more powerful hybrid search combining semantic and full-text. custom: Full control via search_settings. If filters or limit are provided alongside basic or advanced, they will override the default settings for that mode.

Available options:
basic,
advanced,
custom
search_settings
object

The search configuration object. If search_mode is custom, these settings are used as-is. For basic or advanced, these settings will override the default mode configuration. Common overrides include filters to narrow results and limit to control how many results are returned.

rag_generation_config
object

Configuration for RAG generation

task_prompt
string

Optional custom prompt to override default

include_title_if_available
boolean
default:false

Include document titles in responses when available

Include web search results provided to the LLM.

Response

200

key
string
Example:

"value"

I