Skip to main content
A Document in R2R is the system’s digital representation of any ingested content—including PDFs, text files, web pages, images, and audio. It serves as the central container for all downstream data objects such as Chunks, Entities, and Relationships, forming the foundation for R2R’s knowledge processing pipeline. Documents transform raw content into structured, searchable, and analyzable knowledge that powers Retrieval-Augmented Generation (RAG) and agentic workflows.

Key Processes

Documents in R2R support several key stages of processing:
  • Ingestion — Accepts multiple input formats (.pdf, .docx, .txt, .png, .mp3, etc.) via file upload, raw text, or predefined chunks.
  • Chunking — Splits document content into smaller, retrievable Chunks for semantic search and analysis.
  • Metadata & Collections — Associates documents with descriptive metadata (e.g., title, source) and organizes them into Collections for access control and sharing.
  • Enrichment (Optional) — Extracts Entities and Relationships to build knowledge graphs or generates embeddings for semantic search.
  • Status Tracking — Monitors ingestion, enrichment, and extraction progress for transparency and error handling.

API Endpoints

MethodEndpointDescription
POST/documentsIngest new information (file, text, or chunks) as a document.
GET/documentsList existing documents with pagination and filtering.
GET/documents/Retrieve metadata, ingestion status, or details for a specific document.
GET/documents//downloadDownload the original source file of a document.
GET/documents//chunksList the text Chunks generated from a document’s content.
PATCH/documents//metadataAdd or update metadata for a document.
PUT/documents//metadataReplace all metadata for a document.
DELETE/documents/Delete a document and its associated data.
DELETE/documents/by-filterDelete multiple documents that match a filter.
POST/documents/searchSearch across generated document summaries.
GET/documents/download_zipDownload multiple original document files as a zip archive.
POST/documents//extractStart entity and relationship extraction for a document.
GET/documents//entitiesList Entities identified within a document.
GET/documents//relationshipList Relationships identified within a document.
POST/documents//deduplicateStart entity deduplication for a document.