Overview

A Document in R2R is the system’s digital representation of any ingested content—including PDFs, text files, web pages, images, and audio. It serves as the central container for all downstream data objects such as Chunks, Entities, and Relationships, forming the foundation for R2R’s knowledge processing pipeline. Documents transform raw content into structured, searchable, and analyzable knowledge that powers Retrieval-Augmented Generation (RAG) and agentic workflows.

Key Processes

Documents in R2R support several key stages of processing:

Ingestion — Accepts multiple input formats (.pdf, .docx, .txt, .png, .mp3, etc.) via file upload, raw text, or predefined chunks.
Chunking — Splits document content into smaller, retrievable Chunks for semantic search and analysis.
Metadata & Collections — Associates documents with descriptive metadata (e.g., title, source) and organizes them into Collections for access control and sharing.
Enrichment (Optional) — Extracts Entities and Relationships to build knowledge graphs or generates embeddings for semantic search.
Status Tracking — Monitors ingestion, enrichment, and extraction progress for transparency and error handling.

API Endpoints

Method	Endpoint	Description
`POST`	/documents	Ingest new information (file, text, or chunks) as a document.
`GET`	/documents	List existing documents with pagination and filtering.
`GET`	/documents/	Retrieve metadata, ingestion status, or details for a specific document.
`GET`	/documents//download	Download the original source file of a document.
`GET`	/documents//chunks	List the text Chunks generated from a document’s content.
`PATCH`	/documents//metadata	Add or update metadata for a document.
`PUT`	/documents//metadata	Replace all metadata for a document.
`DELETE`	/documents/	Delete a document and its associated data.
`DELETE`	/documents/by-filter	Delete multiple documents that match a filter.
`POST`	/documents/search	Search across generated document summaries.
`GET`	/documents/download_zip	Download multiple original document files as a zip archive.
`POST`	/documents//extract	Start entity and relationship extraction for a document.
`GET`	/documents//entities	List Entities identified within a document.
`GET`	/documents//relationship	List Relationships identified within a document.
`POST`	/documents//deduplicate	Start entity deduplication for a document.

IO Explorer

IO Intelligence

IO Cloud

Key Processes

API Endpoints

IO Explorer

IO Intelligence

IO Cloud

​Key Processes

​API Endpoints

Key Processes

API Endpoints