Troubleshooting
MetricsError — Metrics recording failed
Section titled “MetricsError — Metrics recording failed”Exception: lexigram.ai.observability.exceptions.MetricsError
Cause: The MetricsCollectorProtocol backend is unavailable or raised an error during increment() / observe().
Fix: Verify lexigram-monitor or your custom MetricsCollectorProtocol implementation is registered and its backend (Prometheus, StatsD) is reachable. Check the collector’s initialize() completed without errors.
TracingError — Trace span creation failed
Section titled “TracingError — Trace span creation failed”Exception: lexigram.ai.observability.exceptions.TracingError
Cause: The TracerProtocol backend (e.g. OpenTelemetry) is not initialized or the exporter endpoint is unreachable.
Fix: Ensure a TracerProtocol implementation is registered in the container before ObservabilityProvider.boot(). If using OpenTelemetry, confirm the OTLPSpanExporter endpoint is correct.
HealthCheckError — Health check infrastructure operation failed
Section titled “HealthCheckError — Health check infrastructure operation failed”Exception: lexigram.ai.observability.exceptions.HealthCheckError
Cause: Internal error in the health check system itself (not a component being unhealthy).
Fix: Check the exception details — this indicates a bug in the health check registration or execution code. Ensure all registered check callables are async and handle their own errors gracefully.
LLM calls are not traced
Section titled “LLM calls are not traced”Symptom: No spans appear in the trace backend. LLM completions work normally.
Cause: LLMClientProtocol was registered after ObservabilityProvider.boot() ran, or the provider’s enabled flag is false.
Fix:
# Ensure correct wiring orderapp = Application(name="my-app")app.add_module(LLMModule.configure())app.add_module(ObservabilityModule.configure())await app.start()Check observability_enabled in your config:
ai_observability: enabled: trueMetrics are not appearing
Section titled “Metrics are not appearing”Symptom: Prometheus or your metrics backend shows no intelligence_* metrics.
Cause 1: metrics_enabled is false.
ai_observability: metrics_enabled: true # must be trueCause 2: AIMetrics requires a MetricsCollectorProtocol — if lexigram-monitor is not installed or its provider didn’t register the collector, AIMetrics raises a ValueError.
Fix: Install and wire lexigram-monitor:
uv add lexigram-monitorContainer resolution fails with ObservabilityProvider
Section titled “Container resolution fails with ObservabilityProvider”Symptom: ValueError or KeyError during boot().
Cause: The provider tries to gracefully handle missing AITracer/AIMetrics — it logs a debug message and skips wrapping.
Fix: This is not an error. Check debug-level logs for observability_tracer_metrics_unavailable or observability_no_llm_to_wrap. If you expect tracing/metrics, ensure AITracer and AIMetrics are registered (they are, if enabled is true — the provider registers them itself).
Wrapped LLM client not capturing traces
Section titled “Wrapped LLM client not capturing traces”Symptom: LLM calls work normally but no trace spans appear in the backend.
Cause: ObservabilityProvider.boot() runs before the LLM client is registered in the container. The provider only wraps clients that exist at boot time — if an LLM sub-provider registers its client after the observability provider boots, wrapping is skipped.
Fix: Ensure the LLM provider is registered before the observability provider:
app.add_module(LLMModule.configure(config))app.add_module(ObservabilityModule.configure()) # must come after LLMCheck boot-time logs for observability_llm_wrapped (success) vs observability_no_llm_to_wrap (skipped).
HealthCheckError during health check registration
Section titled “HealthCheckError during health check registration”HealthCheckError: Failed to register health check 'vector_store'Cause: The health check callable raised an exception during registration. Health checks must be async functions or callables that accept no required arguments.
Fix: Wrap the health check logic to handle errors internally:
from lexigram.ai.observability.health import AIHealthMonitor
monitor = await container.resolve(AIHealthMonitor)await monitor.register_check("vector_store", check_fn=lambda: store_health_check())Ensure all registered checks return a HealthCheckResult or a boolean.
Metrics recording not reflected in backend
Section titled “Metrics recording not reflected in backend”Symptom: AIMetrics.increment() returns successfully but no data appears in Prometheus or StatsD.
Cause: The MetricsCollectorProtocol backend wasn’t initialized or was registered after AIMetrics was created. AIMetrics holds a reference to the collector at construction time — late-registered collectors are not used.
Fix: Ensure MetricsCollectorProtocol is registered before AIMetrics is resolved:
# Register the collector firstcontainer.singleton(MetricsCollectorProtocol, PrometheusCollector())
# Then AIMetrics picks it up automaticallymetrics = await container.resolve(AIMetrics)Verify the collector’s initialize() method completed without errors.
Tracing context lost across async boundaries
Section titled “Tracing context lost across async boundaries”Symptom: Trace spans appear disconnected — parent-child relationships are missing.
Cause: The AITracer uses contextvars to propagate the active span. When work is dispatched to a thread pool or a new asyncio.Task created with asyncio.create_task(), the context variable is not automatically propagated.
Fix: Use the framework’s tracked task helper to preserve tracing context:
from lexigram.concurrency.task_utils import create_tracked_task
# Instead of asyncio.create_task()task = await create_tracked_task(self._process_job(job))