Retrieval-Augmented Generation
lexigram-ai-rag provides a configurable RAG pipeline that ingests documents, chunks them, generates embeddings, retrieves relevant context, and synthesizes grounded answers. The pipeline is protocol-driven: swap chunking strategies, retrieval backends, and synthesis models without changing application code.
For the full configuration reference and advanced features (HyDE, reranking, evaluation), see the lexigram-ai-rag package docs.
1. The Contracts
Section titled “1. The Contracts”The RAG system is built on protocols from lexigram.contracts.ai.rag. Every pipeline operation returns a Result so failures are explicit:
from typing import Any, Protocol, runtime_checkablefrom lexigram.result import Resultfrom lexigram.contracts.ai.rag import RAGContext, RAGResponse, RAGErrorfrom lexigram.contracts.ai.vector import SearchResultProtocol
class RAGPipelineProtocol(Protocol): async def execute(self, context: RAGContext) -> Result[RAGResponse, RAGError]: ...
class RetrievalStrategyProtocol(Protocol): async def retrieve( self, query: str, candidates: list[SearchResultProtocol], *, top_k: int = 5, **kwargs: Any, ) -> list[SearchResultProtocol]: ...RAGContext carries the query and optional filters, and RAGResponse contains the synthesized answer with sources and citations:
from dataclasses import dataclassfrom typing import Anyfrom lexigram.contracts.ai.vector import SearchResultProtocol
@dataclass(frozen=True)class RAGContext: query: str config: dict[str, Any] | None = None filters: dict[str, Any] | None = None session_id: str | None = None
@dataclass(frozen=True)class RAGResponse: answer: str sources: list[SearchResultProtocol] citations: list[Any] | None = None confidence: float | None = None2. Configuration
Section titled “2. Configuration”Add the RAGModule and configure chunking, retrieval, and synthesis:
from lexigram import Applicationfrom lexigram.ai.rag import RAGModule, RAGConfig
app = Application(name="my-app")app.add_module(RAGModule.configure( RAGConfig( chunk_size=512, chunk_overlap=50, embedding_provider="openai", top_k=5, enable_citations=True, collection_name="knowledge_base", ),))ai_rag: enabled: true vector_store_type: pgvector collection_name: knowledge_base top_k: 5 chunk_size: 512 chunk_overlap: 50 chunking_strategy: recursive embedding_provider: openai embedding_model: text-embedding-3-small enable_citations: true enable_hyde: false enable_query_expansion: true use_hybrid_search: true similarity_threshold: 0.7 enable_caching: true cache_ttl: 36003. Chunking Strategies
Section titled “3. Chunking Strategies”The pipeline supports multiple chunking strategies configured via chunking_strategy:
| Strategy | Description |
|---|---|
recursive | Recursive character splitting with overlap (default) |
token | Token-aware splitting at model boundaries |
semantic | Semantic boundary detection using embeddings |
Use the create_chunker factory for programmatic access:
from lexigram.ai.rag import create_chunker, ChunkingConfig
chunker = create_chunker( ChunkingConfig( strategy="recursive", chunk_size=512, chunk_overlap=50, ))chunks = await chunker.chunk(document_text)4. Indexing Documents
Section titled “4. Indexing Documents”Ingest documents into the vector store through the pipeline. Documents are chunked, embedded, and stored automatically:
from lexigram.ai.rag import RAGModule, RAGPipeline, RAGConfigfrom lexigram.contracts.ai.rag import RAGPipelineProtocol
async def index_documents() -> None: async with Application.boot( modules=[RAGModule.configure(RAGConfig(collection_name="kb"))] ) as app: pipeline = await app.container.resolve(RAGPipelineProtocol)
# Documents are chunked, embedded, and indexed automatically result = await pipeline.execute( RAGContext(query="seed document", config={"index_only": True}) )5. Querying
Section titled “5. Querying”Run a RAG query to retrieve relevant documents and synthesize an answer:
from lexigram import Applicationfrom lexigram.ai.rag import RAGModule, RAGConfigfrom lexigram.contracts.ai.rag import RAGPipelineProtocol, RAGContext
async def ask(query: str) -> None: async with Application.boot( modules=[RAGModule.configure(RAGConfig(top_k=5))] ) as app: pipeline = await app.container.resolve(RAGPipelineProtocol)
result = await pipeline.execute(RAGContext(query=query)) if result.is_ok(): response = result.unwrap() print(f"Answer: {response.answer}") for source in response.sources: print(f" Source: {source}") if response.citations: for citation in response.citations: print(f" Citation: {citation}") else: error = result.unwrap_err() print(f"RAG failed: {error}")When enable_citations is True, sources are cited inline in the response. The sources list contains SearchResultProtocol objects with metadata about each retrieved document.
6. Retrieval Strategies
Section titled “6. Retrieval Strategies”The pipeline uses RetrievalStrategyProtocol for pluggable retrieval. The strategy registry (RetrievalStrategyRegistry via with_defaults()) provides built-in strategies:
| Strategy | Description |
|---|---|
similarity | Pure vector similarity search |
hybrid | Combined vector + keyword (default) |
mmr | Maximum marginal relevance for diversity |
Reranking is handled by RerankingStrategyProtocol implementations registered in RerankingStrategyRegistry:
from lexigram.ai.rag import RetrievalStrategyRegistry
registry = RetrievalStrategyRegistry.with_defaults()strategy = registry.get("hybrid")7. Testing
Section titled “7. Testing”Use RAGModule.stub() for isolated tests:
from lexigram import Applicationfrom lexigram.ai.rag import RAGModulefrom lexigram.contracts.ai.rag import RAGPipelineProtocol
async def test_pipeline_resolves() -> None: async with Application.boot(modules=[RAGModule.stub()]) as app: pipeline = await app.container.resolve(RAGPipelineProtocol) assert pipeline is not NoneNext Steps
Section titled “Next Steps”- Vector Stores — configuring pgvector, Qdrant, Pinecone, or in-memory backends
- AI Agents — connecting RAG to agent tool use
- AI Memory — episodic and semantic memory for conversation context
- Dependency Injection — binding protocols to implementations
lexigram-ai-ragpackage — HyDE, reranking, evaluation, reasoning