Guide
Requirements
Section titled “Requirements”| Package | Required | Purpose |
|---|---|---|
lexigram | Yes | Core framework |
lexigram-contracts | Yes | Protocol definitions |
lexigram-ai-llm | Optional | LLM tracing |
lexigram-ai-rag | Optional | RAG tracing |
lexigram-ai-agents | Optional | Agent tracing |
lexigram-vector | Optional | Vector store tracing |
The Problem
Section titled “The Problem”LLM calls and vector searches are opaque — they cross process boundaries to external APIs and databases. When a response is slow, costs spike, or errors rise, you need visibility into what happened. Without instrumentation, debugging AI pipelines is guesswork.
lexigram-ai-observability solves this by wrapping your AI clients with distributed tracing, metrics collection, and health monitoring — automatically.
Mental Model
Section titled “Mental Model”The package has three pillars:
| Pillar | Class | What It Tracks |
|---|---|---|
| Tracing | AITracer | Per-call spans for LLM completions, vector operations, RAG stages, and embeddings |
| Metrics | AIMetrics | Counters, histograms, and gauges for tokens, costs, latency, request volume, and cache hit rates |
| Health | AIHealthMonitor | Registered health checks for LLM endpoints, vector stores, and embedding services |
These three work together via ObservabilityProvider, which auto-wires them around any LLMClientProtocol or VectorStoreProtocol registered in the container.
Core Concepts
Section titled “Core Concepts”Trace Spans
Section titled “Trace Spans”AITracer creates OpenTelemetry-compatible spans for every AI operation:
from lexigram.ai.observability import AITracer
# Typical usage — spans are created automatically by ObservableLLMClientwith tracer.trace_llm_call("openai", "gpt-4o") as span: response = await client.complete(messages) span.set_attribute("llm.tokens.total", response.usage.total_tokens)Available span helpers:
| Method | Creates Span Named |
|---|---|
trace_llm_call(provider, model) | llm.{provider}.{model} |
trace_vector_operation(op, provider, collection) | vector.{op}.{provider} |
trace_embedding_operation(model) | embedding.{model} |
trace_rag_stage(stage, pipeline) | rag.{stage} |
trace_rag_query(query) | rag.query |
Metrics
Section titled “Metrics”AIMetrics registers pre-defined instruments through MetricsCollectorProtocol. All metric names use the intelligence_ prefix:
from lexigram.ai.observability import AIMetrics
# Count a successful LLM callmetrics.llm_requests_total.increment( labels={"provider": "openai", "model": "gpt-4o", "status": "success"})# Record latencymetrics.llm_duration_seconds.observe(value=0.8, labels={"provider": "openai", "model": "gpt-4o"})Auto-Wrapping
Section titled “Auto-Wrapping”The killer feature: ObservabilityProvider.boot() detects existing LLMClientProtocol and VectorStoreProtocol registrations in the container and replaces them with ObservableLLMClient / ObservableVectorStore proxies. The proxy delegates every call to the original while recording spans and metrics.
# After boot, this call is automatically traced and metered:result = await llm_client.complete(messages)# No code changes needed — the wrapping was transparent.Health Monitoring
Section titled “Health Monitoring”AIHealthMonitor manages health checks for AI infrastructure:
from lexigram.ai.observability import AIHealthMonitor
monitor = AIHealthMonitor()monitor.add_llm_check("openai", check_openai_connectivity)monitor.add_vector_check("pgvector", check_pgvector_connectivity)
all_healthy = await monitor.is_ready()Typical Usage
Section titled “Typical Usage”Full Application Wiring
Section titled “Full Application Wiring”from lexigram import Application, LexigramConfigfrom lexigram.ai.observability import ObservabilityModulefrom lexigram.ai.llm import LLMModulefrom lexigram.ai.observability.config import ObservabilityConfig
config = LexigramConfig.from_yaml({ "ai_observability": { "enabled": True, "tracing_enabled": True, "metrics_enabled": True, "health_checks_enabled": True, }})
app = Application(name="observable-app", config=config)app.add_module(LLMModule.configure())app.add_module(ObservabilityModule.configure())await app.start()# Your AI calls are now instrumentedUsing the Decorators
Section titled “Using the Decorators”For custom functions that aren’t auto-wrapped, use the decorator API:
from lexigram.ai.observability import trace_llm, track_llm_call
@trace_llm(provider="openai", model="gpt-4o", tracer=tracer)@track_llm_call(provider="openai", model="gpt-4o", metrics=metrics)async def my_completion(messages): return await client.complete(messages)Common Patterns
Section titled “Common Patterns”Selective Disabling
Section titled “Selective Disabling”config = ObservabilityConfig( enabled=True, metrics_enabled=False, # disable metrics, keep tracing tracing_enabled=True,)Custom Metric Labels
Section titled “Custom Metric Labels”Labels flow through AIMetrics to MetricsCollectorProtocol — add them everywhere for granular breakdowns:
metrics.llm_requests_total.increment(labels={ "provider": "openai", "model": "gpt-4o", "status": "success", "deployment": "prod-eu-west-1",})Hook Integration
Section titled “Hook Integration”The package emits lifecycle hooks you can subscribe to:
from lexigram.ai.observability.hooks import LLMCallTracedHook# Hook payload fired after each traced LLM callBest Practices
Section titled “Best Practices”- Enable tracing and metrics in production — the overhead is minimal (
MetricsCollectorProtocolis designed for hot paths). - Register health checks for every external AI dependency (LLM provider, vector store).
- Use
tracing_enabled=falseduring local development if you don’t need spans. - Attach meaningful span attributes (
tokens.total,cost,error.type) via the wrapper or decorator callbacks. - Set up OpenTelemetry export with
[opentelemetry]extras for Jaeger, Zipkin, or cloud traces.
Next Steps
Section titled “Next Steps”- How-Tos — practical recipes
- Architecture — internal design
- Configuration — every config key