Skip to content
GitHub

Guide

PackageRequiredPurpose
lexigramYesCore framework
lexigram-contractsYesProtocol definitions
lexigram-ai-llmOptionalLLM-based synthesis
lexigram-vectorOptionalVector store integration

lexigram-ai-rag provides a modular, async RAG pipeline that retrieves relevant context from a knowledge base and synthesises it into grounded answers with citations.

LLMs hallucinate. Their training data is static and doesn’t include your documents. A RAG pipeline solves this by:

  1. Indexing — chunking and embedding your documents into a vector store.
  2. Retrieving — given a query, finding the most relevant chunks.
  3. Synthesising — feeding the retrieved context to an LLM to produce a grounded, cited answer.

The pipeline is a series of composable stages, each swappable via strategy registries:

Query → Query Processing → Retrieval → Reranking → Synthesis → Quality Assurance → Answer

Each stage is optional and configured via PipelineConfig.stages. The default stages are RETRIEVAL, SYNTHESIS, and QUALITY_ASSURANCE.


Top-level configuration for the pipeline — vector store backend, chunking parameters, retrieval settings, citation style, and caching. Injected automatically when the provider has config_key = "ai.rag".

Each stage is a named step in the execution graph. Stages are ordered, so RETRIEVAL always runs before SYNTHESIS. Available stages:

StagePurpose
INGESTIONDocument loading, chunking, and indexing
QUERY_PROCESSINGQuery expansion, HyDE, routing
RETRIEVALVector + keyword search, multi-hop
CONTEXT_OPTIMIZATIONReranking, compression, deduplication
SYNTHESISAnswer generation from retrieved context
QUALITY_ASSURANCEFaithfulness, relevance, hallucination checks
POST_PROCESSINGCaching, metrics, logging

Plug in custom behaviour by registering strategies. Each registry has with_defaults():

  • ChunkingStrategyRegistryrecursive, semantic, token, fixed_size
  • RetrievalStrategyRegistry — retrieval strategies
  • RerankingStrategyRegistrycross-encoder, FlashRank (optional)
  • CompressionStrategyRegistryLLMLingua-2 (optional)
  • HyDEStrategyRegistry — hypothetical document embedding generators
  • ReasoningStrategyRegistry — multi-step reasoning strategies
  • SynthesisStrategyRegistrydirect, extractive, abstractive, hybrid

The pipeline’s execute() method returns Result[RAGResponse, RAGError]. Always check both cases:

result = await pipeline.execute(RAGContext(query="What is Lexigram?"))
if result.is_ok():
response = result.unwrap()
print(f"Answer: {response.answer}")
print(f"Sources: {len(response.sources)} documents")
print(f"Confidence: {response.confidence}")
else:
error = result.unwrap_err()
match error:
case RetrievalError():
logger.error("retrieval_failed", error=str(error))
case SynthesisError():
logger.error("synthesis_failed", error=str(error))
case _:
logger.error("rag_failed", error=str(error))

import asyncio
from lexigram import Application
from lexigram.ai.rag import RAGModule, RAGConfig, PipelineConfig
async def main() -> None:
config = RAGConfig(
vector_store_type="pgvector",
collection_name="docs",
top_k=10,
chunk_size=1024,
chunk_overlap=128,
enable_citations=True,
citation_style="numbered",
enable_hallucination_detection=True,
embedding_model="text-embedding-3-small",
)
async with Application.boot(
name="rag-app",
modules=[RAGModule.configure(config)],
config_path="application.yaml",
) as app:
pipeline = await app.container.resolve(RAGPipelineProtocol)
result = await pipeline.execute(
RAGContext(
query="How do I configure RAG?",
filters={"department": "engineering"},
)
)
if result.is_ok():
response = result.unwrap()
print(f"Confidence: {response.confidence:.2f}")
for i, source in enumerate(response.citations or []):
print(f" [{i + 1}] {source}")
from lexigram.ai.rag.chunking.strategy_registry import ChunkingStrategyRegistry
registry = ChunkingStrategyRegistry.with_defaults()
# Add a custom chunker
registry.register("markdown", MarkdownHeaderChunker())
# Pass to the provider via config
config = RAGConfig(chunking_strategy="markdown")

lexigram-ai-rag communicates with other packages through contracts in lexigram-contracts:

ProtocolUsed forResolved by
RAGPipelineProtocolPipeline execution (this package)RAGProvider
RetrievalStrategyProtocolRetrieval/reranking strategiesRAGProvider
SynthesizerProtocolLLM-based answer generationlexigram-ai-llm (optional)
WorkingMemoryProtocolConversation memory contextlexigram-ai-memory (optional)
GraphStoreProtocolKnowledge graph enhancementlexigram-graph (optional)

The provider boot phase resolves optional bindings. If none exist, pipeline stages fall back gracefully.


  • ✅ Set embedding_model explicitly — it has no vendor-specific default.
  • ✅ Enable hallucination_detection in production.
  • ✅ Use require_citations=True in PipelineConfig for regulated domains.
  • ✅ Match vector_dimension to your embedding model’s output dimension.
  • ✅ Configure cache_ttl based on how often your document set changes.
  • ❌ Don’t use result.unwrap() without checking is_ok() — retrieval can fail.
  • ❌ Don’t set vector_store_type="mock" in production.
  • ❌ Don’t exceed top_k=50 without performance testing.