RAG Query

curl --request POST \
  --url https://api.intelligence.io.solutions/api/r2r/v3/retrieval/rag \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "query": "<string>",
  "search_mode": "custom",
  "search_settings": {
    "use_hybrid_search": false,
    "use_semantic_search": true,
    "use_fulltext_search": false,
    "filters": "<string>",
    "limit": 10,
    "offset": "0",
    "include_metadatas": true,
    "include_scores": true,
    "search_strategy": "vanilla",
    "hybrid_settings": {
      "full_text_weight": 1,
      "semantic_weight": 5,
      "full_text_limit": 200,
      "rrf_k": 50
    },
    "chunk_settings": {
      "index_measure": "l2_distance",
      "probes": 10,
      "ef_search": 40,
      "enabled": true
    },
    "graph_settings": {
      "limits": [
        "<any>"
      ],
      "enabled": true
    },
    "num_sub_queries": 5
  },
  "rag_generation_config": {
    "model": "<string>",
    "temperature": 123,
    "top_p": 123,
    "max_tokens_to_sample": 123,
    "stream": true,
    "functions": [
      "<any>"
    ],
    "tools": [
      "<any>"
    ],
    "add_generation_kwargs": [
      "<any>"
    ],
    "api_base": "<string>",
    "response_format": [
      {
        "Base Model": {}
      }
    ],
    "extended_thinking": false,
    "thinking_budget": 123,
    "reasoning_effort": "<string>"
  },
  "task_prompt": "<string>",
  "include_title_if_available": false,
  "include_web_search": false
}'

{
  "key": "value"
}

POST

api

r2r

retrieval

rag

RAG Query

curl --request POST \
  --url https://api.intelligence.io.solutions/api/r2r/v3/retrieval/rag \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "query": "<string>",
  "search_mode": "custom",
  "search_settings": {
    "use_hybrid_search": false,
    "use_semantic_search": true,
    "use_fulltext_search": false,
    "filters": "<string>",
    "limit": 10,
    "offset": "0",
    "include_metadatas": true,
    "include_scores": true,
    "search_strategy": "vanilla",
    "hybrid_settings": {
      "full_text_weight": 1,
      "semantic_weight": 5,
      "full_text_limit": 200,
      "rrf_k": 50
    },
    "chunk_settings": {
      "index_measure": "l2_distance",
      "probes": 10,
      "ef_search": 40,
      "enabled": true
    },
    "graph_settings": {
      "limits": [
        "<any>"
      ],
      "enabled": true
    },
    "num_sub_queries": 5
  },
  "rag_generation_config": {
    "model": "<string>",
    "temperature": 123,
    "top_p": 123,
    "max_tokens_to_sample": 123,
    "stream": true,
    "functions": [
      "<any>"
    ],
    "tools": [
      "<any>"
    ],
    "add_generation_kwargs": [
      "<any>"
    ],
    "api_base": "<string>",
    "response_format": [
      {
        "Base Model": {}
      }
    ],
    "extended_thinking": false,
    "thinking_budget": 123,
    "reasoning_effort": "<string>"
  },
  "task_prompt": "<string>",
  "include_title_if_available": false,
  "include_web_search": false
}'

{
  "key": "value"
}

The RAG Query endpoint executes a Retrieval-Augmented Generation (RAG) workflow by combining semantic search, optional knowledge graph integration, and large language model (LLM) generation. It returns contextually grounded, source-cited responses derived from your document corpus and external web content (if enabled). This endpoint is ideal for applications that require explainable AI answers, document-grounded responses, and real-time contextual reasoning.

Key Features

Combined retrieval and generation: Merges vector search, optional graph traversal, and LLM output generation in one request.
Automatic source citation: Each referenced document includes a unique citation identifier.
Streaming and non-streaming modes: Supports token-level updates or full-response delivery.
Provider flexibility: Compatible with OpenAI, Anthropic, Ollama, and other LiteLLM-supported models.
Web search integration: Optionally augments internal context with real-time external data.

Model Support

Provider	Description
OpenAI	Default provider supporting GPT-based models (`gpt-4o`, `gpt-4o-mini`, etc.).
Anthropic	Supports Claude models (requires `ANTHROPIC_API_KEY`).
Ollama	Enables local model execution via Ollama runtime.
LiteLLM	Provides access to additional supported model providers.

Request Body

The request body combines search configuration (for retrieval) and generation configuration (for LLM behavior).
All search parameters available in the /search endpoint can be reused here, including filters, hybrid search, and graph-enhanced retrieval.

Generation Configuration

Control model behavior using the rag_generation_config object. Example:

{
    "model": "openai/gpt-4o-mini",
    "temperature": 0.7,
    "max_tokens": 1500,
    "stream": true
}

Parameters:

model: Specifies the model used for generation.
temperature: Controls output randomness (0 for deterministic, 1 for creative).
max_tokens: Sets maximum output length.
stream: Enables or disables token streaming for real-time responses.

Streaming Responses

When stream: true is enabled, the API emits Server-Sent Events (SSE) during processing.
Each event type corresponds to a distinct phase of the retrieval and generation workflow.

Event Type	Description
`search_results`	Contains the initial search results from your documents.
`message`	Streams partial tokens as the model generates them.
`citation`	Emits citation metadata when a source is referenced.
`final_answer`	Contains the complete, generated response with structured citations.

Example Response:

{
	"generated_answer": "DeepSeek-R1 is a model that demonstrates impressive performance...[1]",
	"search_results": { ... },
	"citations": [
    	{
        	"id": "cit.123456",
        	"object": "citation",
        	"payload": { ... }
    }
]
}

Authorizations

Authorization

string

header

required

The access token received from the authorization server in the OAuth 2.0 flow.

Body

application/json

query

string

The user's question

search_mode

enum<string>

default:custom

Default value of custom allows full control over search settings. Pre-configured search modes: basic: A simple semantic-based search. advanced: A more powerful hybrid search combining semantic and full-text. custom: Full control via search_settings. If filters or limit are provided alongside basic or advanced, they will override the default settings for that mode.

Available options:

basic,

advanced,

custom

search_settings

object

The search configuration object. If search_mode is custom, these settings are used as-is. For basic or advanced, these settings will override the default mode configuration. Common overrides include filters to narrow results and limit to control how many results are returned.

Show child attributes

rag_generation_config

object

Configuration for RAG generation

Show child attributes

task_prompt

string

Optional custom prompt to override default

include_title_if_available

boolean

default:false

Include document titles in responses when available

include_web_search

boolean

default:false

Include web search results provided to the LLM.

Response

200

key

string

Example:

"value"

Search R2R RAG-Powered Conversational Agent

⌘I

IO Explorer

IO Intelligence

IO Cloud

RAG Query

Key Features

Model Support

Request Body

Generation Configuration

Streaming Responses

Authorizations

Body

Response

IO Explorer

IO Intelligence

IO Cloud

​Key Features

​Model Support

​Request Body

​Generation Configuration

​Streaming Responses

Authorizations

Body

Response

Key Features

Model Support

Request Body

Generation Configuration

Streaming Responses