Guide
Requirements
Section titled “Requirements”| Package | Required | Purpose |
|---|---|---|
lexigram | Yes | Core framework |
lexigram-contracts | Yes | Protocol definitions |
lexigram-cache | Optional | Response caching |
lexigram-resilience | Optional | Retry and rate limiting |
What problem does it solve?
Section titled “What problem does it solve?”lexigram-ai-llm provides a unified, async LLM client interface across multiple providers (OpenAI, Anthropic, Ollama, Groq, Cohere, Mistral, OpenRouter, Gemini, and more). It handles:
- Client creation and lifecycle via
LLMProvider - Provider-specific authentication and connection management
- Response caching, rate limiting, and token counting
- Multi-provider routing (via
LLMRoutingProvider) - Structured output extraction
- Streaming with thinking/reasoning support
- Provider registry for custom model providers
Mental model
Section titled “Mental model”LLMModule.configure(config) │ LLMProvider ├── ProviderRegistry → create_llm_client(config, registry) ├── TokenCounterRegistry → token counting per model ├── ParserRegistry → output parsing ├── LLMModelManager (optional) → local model lifecycle └── LLMCache (optional) → response caching │ ▼ LLMClientProtocol ← injectable via container ├── complete() → Result[CompletionProtocol, LLMError] └── stream_chat() → AsyncStream[StreamChunk, LLMError]Core concepts
Section titled “Core concepts”LLMClientProtocol
Section titled “LLMClientProtocol”The central protocol. Every provider client implements this:
class LLMClientProtocol(Protocol): async def complete( self, messages: Sequence[ChatMessageProtocol], *, model: str | None = None, temperature: float | None = None, max_tokens: int | None = None, tools: Sequence[ToolDefinition] | None = None, stop_sequences: Sequence[str] | None = None, **kwargs: Any, ) -> Result[CompletionProtocol, LLMError]: ...
def stream_chat( self, messages: list[ChatMessageProtocol], ... ) -> AsyncStream[StreamChunk, LLMError]: ...complete() returns Result — check is_ok()/is_err() to handle expected failures (rate limits, content filters, model not found).
ClientConfig
Section titled “ClientConfig”Typed configuration with SecretStr for API keys:
from lexigram.ai.llm import ClientConfig
config = ClientConfig( provider="openai", model="gpt-4o", api_key="sk-...", temperature=0.7, max_tokens=2000, timeout=60, thinking=ThinkingConfig(budget_tokens=5000),)Provider Registry
Section titled “Provider Registry”A central ProviderRegistry maps provider names to client classes. Built-in providers (OpenAI, Anthropic, Ollama, etc.) are registered by default. Custom providers can be added at runtime.
Conversation Manager
Section titled “Conversation Manager”For multi-turn conversations, use ConversationManager:
from lexigram.ai.llm import ConversationManager, ConversationConfig
manager = ConversationManager( config=ConversationConfig(max_turns=10), llm_client=llm,)await manager.add_message({"role": "user", "content": "Hello"})response = await manager.get_response()Typical usage
Section titled “Typical usage”1. Basic completion with error handling
Section titled “1. Basic completion with error handling”from lexigram.contracts.ai import LLMClientProtocolfrom lexigram.contracts.ai.llm import LLMErrorfrom lexigram.result import Result
llm = await container.resolve(LLMClientProtocol)result: Result[CompletionProtocol, LLMError] = await llm.complete( [{"role": "user", "content": "Tell me a joke"}],)
reply = result.match( ok=lambda c: c.content, err=lambda e: f"Failed: {e}",)print(reply)2. Streaming
Section titled “2. Streaming”stream = llm.stream_chat([{"role": "user", "content": "Write a poem"}])
async for chunk in stream: if chunk.delta: print(chunk.delta, end="")3. Structured output (JSON)
Section titled “3. Structured output (JSON)”from lexigram.ai.llm import JSONExtractor
extractor = JSONExtractor(llm_client=llm)schema = {"type": "object", "properties": {"name": {"type": "string"}}}result = await extractor.extract("Extract a name", schema=schema)if result.is_ok(): print(result.unwrap())4. Multi-provider routing
Section titled “4. Multi-provider routing”from lexigram.ai.llm import LLMModulefrom lexigram.ai.llm.routing import LLMConfig
module = LLMModule.configure(routing=LLMConfig())Best practices
Section titled “Best practices”- Always handle the
Result—complete()returns expected failures. Don’tunwrap()without checkingis_ok(). - Set
api_keyvia env var — useLEX_AI_LLM__API_KEYinstead of hardcoding. - Install only the provider extras you need —
pip install lexigram-ai-llm[openai]keeps dependencies minimal. - Use streaming for long responses —
stream_chat()returns anAsyncStreamthat yields tokens incrementally. - Enable caching for repeated queries — set
enable_cache=TrueinClientConfigand provide aCacheBackendProtocol.
Next steps
Section titled “Next steps”- Architecture — client classes, caching, model management
- Configuration — all config keys
- How-tos — streaming, conversation management, structured output
- Troubleshooting — common errors and fixes