Key Features
- Combined retrieval and generation: Merges vector search, optional graph traversal, and LLM output generation in one request.
 - Automatic source citation: Each referenced document includes a unique citation identifier.
 - Streaming and non-streaming modes: Supports token-level updates or full-response delivery.
 - Provider flexibility: Compatible with OpenAI, Anthropic, Ollama, and other LiteLLM-supported models.
 - Web search integration: Optionally augments internal context with real-time external data.
 
Model Support
| Provider | Description | 
|---|---|
| OpenAI | Default provider supporting GPT-based models (gpt-4o, gpt-4o-mini, etc.). | 
| Anthropic | Supports Claude models (requires ANTHROPIC_API_KEY). | 
| Ollama | Enables local model execution via Ollama runtime. | 
| LiteLLM | Provides access to additional supported model providers. | 
Request Body
The request body combines search configuration (for retrieval) and generation configuration (for LLM behavior).All search parameters available in the
/search endpoint can be reused here, including filters, hybrid search, and graph-enhanced retrieval.
Generation Configuration
Control model behavior using therag_generation_config object.
Example:
model: Specifies the model used for generation.temperature: Controls output randomness (0 for deterministic, 1 for creative).max_tokens: Sets maximum output length.stream: Enables or disables token streaming for real-time responses.
Streaming Responses
Whenstream: true is enabled, the API emits Server-Sent Events (SSE) during processing.Each event type corresponds to a distinct phase of the retrieval and generation workflow.
| Event Type | Description | 
|---|---|
search_results | Contains the initial search results from your documents. | 
message | Streams partial tokens as the model generates them. | 
citation | Emits citation metadata when a source is referenced. | 
final_answer | Contains the complete, generated response with structured citations. | 
Authorizations
The access token received from the authorization server in the OAuth 2.0 flow.
Body
The user's question
Default value of custom allows full control over search settings. Pre-configured search modes: basic: A simple semantic-based search. advanced: A more powerful hybrid search combining semantic and full-text. custom: Full control via search_settings. If filters or limit are provided alongside basic or advanced, they will override the default settings for that mode.
basic, advanced, custom The search configuration object. If search_mode is custom, these settings are used as-is. For basic or advanced, these settings will override the default mode configuration. Common overrides include filters to narrow results and limit to control how many results are returned.
Configuration for RAG generation
Optional custom prompt to override default
Include document titles in responses when available
Include web search results provided to the LLM.
Response
200
"value"