API Reference
Protocols
Section titled “Protocols”LLMCacheProtocol
Section titled “LLMCacheProtocol”Protocol for LLM cache implementations.
Get value from cache.
| Parameter | Type | Description |
|---|---|---|
| `key` | str | dict[str, Any] | Cache key (string or structured dict). |
| Type | Description |
|---|---|
| Any | None | Cached value, or ``None`` if not present. |
Set value in cache.
| Parameter | Type | Description |
|---|---|---|
| `key` | str | dict[str, Any] | Cache key (string or structured dict). |
| `value` | Any | Value to store. |
| `ttl` | float | None | Optional time-to-live in seconds. |
Delete entry from cache.
| Parameter | Type | Description |
|---|---|---|
| `key` | str | dict[str, Any] | Cache key to remove. |
| Type | Description |
|---|---|
| bool | ``True`` if the key existed and was removed, ``False`` otherwise. |
Clear all entries.
Return cache statistics.
| Type | Description |
|---|---|
| dict[str, Any] | Mapping of statistic name to value. |
Classes
Section titled “Classes”APIPricingSource
Section titled “APIPricingSource”Pricing source from HTTP API endpoint.
Fetches pricing data from a remote API. Useful for getting the latest pricing updates, but requires network connectivity.
Attributes: endpoint: API endpoint URL. timeout: Request timeout in seconds.
Example
source = APIPricingSource( … “https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json” … ) pricing = await source.get_pricing(“gpt-4”)
Initialize API pricing source.
| Parameter | Type | Description |
|---|---|---|
| `endpoint` | str | URL to fetch pricing from. |
| `timeout` | float | Request timeout in seconds (default: 10). |
async def get_pricing(model: str) -> ModelPricing | None
Get pricing for a specific model.
| Parameter | Type | Description |
|---|---|---|
| `model` | str | Model identifier. |
| Type | Description |
|---|---|
| ModelPricing | None | ModelPricing if found, None otherwise. |
async def get_all_pricing() -> dict[str, ModelPricing]
Get all pricing data.
| Type | Description |
|---|---|
| dict[str, ModelPricing] | All pricing data from API. |
Get source name.
Clear cached pricing data to force refresh.
AbstractPricingSource
Section titled “AbstractPricingSource”Abstract base class for pricing data sources.
All pricing sources must implement get_pricing() to return ModelPricing for a given model name, or None if not found.
async def get_pricing(model: str) -> ModelPricing | None
Get pricing for a specific model.
| Parameter | Type | Description |
|---|---|---|
| `model` | str | Model identifier (e.g., "gpt-4-turbo"). |
| Type | Description |
|---|---|
| ModelPricing | None | ModelPricing if found, None otherwise. |
async def get_all_pricing() -> dict[str, ModelPricing]
Get all available pricing data.
| Type | Description |
|---|---|
| dict[str, ModelPricing] | Dictionary mapping model names to pricing. |
Get the name of this pricing source.
| Type | Description |
|---|---|
| str | Human-readable source name. |
AnthropicClient
Section titled “AnthropicClient”Anthropic Claude LLM client implementation.
Conforms to: LLMClientProtocol protocol via structural typing.
Supports Claude 3 (Opus, Sonnet, Haiku) models with:
- Streaming responses
- Tool calling
- Vision capabilities
- Automatic retry and error handling
Example
from lexigram.ai import ClientConfig config = ClientConfig(provider=“anthropic”, model=“claude-3-sonnet-20240229”) client = AnthropicClient(config) completion = await client.complete([ … ChatMessage(role=“user”, content=“Hello!”) … ])
def __init__(config: ClientConfig)
Initialize Anthropic client.
| Parameter | Type | Description |
|---|---|---|
| `config` | ClientConfig | LLM configuration |
| Exception | Description |
|---|---|
| ImportError | If anthropic package is not installed |
Close the Anthropic client.
Perform health check.
| Type | Description |
|---|---|
| HealthCheckResult | Structured health check result. |
CacheEntry
Section titled “CacheEntry”Cache entry with metadata.
Attributes: key: Cache key. value: Cached value. created_at: When entry was created. expires_at: When entry expires (Unix timestamp). hits: Number of cache hits. size_bytes: Approximate size in bytes.
CacheStats
Section titled “CacheStats”Cache statistics.
Attributes: hits: Number of cache hits. misses: Number of cache misses. evictions: Number of evictions. total_entries: Current number of entries. total_size_bytes: Total cache size in bytes.
CharEstimateCounter
Section titled “CharEstimateCounter”Character-based token count estimator (~4 chars per token).
Always available without any optional dependencies. Suitable as a safe fallback counter.
| Parameter | Type | Description |
|---|---|---|
| `model` | Model name (used for identification only). |
Initialize CharEstimateCounter.
| Parameter | Type | Description |
|---|---|---|
| `model` | str | Model name for identification. |
The model this counter is calibrated for.
Count tokens using character estimation.
def count_messages(messages: list[ChatMessage]) -> int
Count tokens in a list of chat messages.
ChatMessage
Section titled “ChatMessage”A single chat message.
Implements ChatMessageProtocol with DomainModel semantics for validation.
Example
msg = ChatMessage(role=“user”, content=“Hello, how are you?”)
ClientConfig
Section titled “ClientConfig”Configuration for LLM clients.
Example
config = ClientConfig( … provider=“openai”, … model=“gpt-4-turbo”, … api_key=“sk-…”, … temperature=0.7, … max_tokens=2000, … )
CohereClient
Section titled “CohereClient”Client for Cohere's enterprise NLP API.
Conforms to: LLMClientProtocol protocol via structural typing.
Supports Chat, Embeddings, and Reranking with:
- RAG-optimized models (Command R/R+)
- High-performance embeddings
- Native reranking support
def __init__(config: ClientConfig) -> None
Initialize Cohere client.
| Parameter | Type | Description |
|---|---|---|
| `config` | ClientConfig | LLM configuration |
Get API key from config.
Get base URL from config.
async def embed( texts: list[str] | str, model: str = 'embed-english-v3.0', input_type: str = 'search_document', **kwargs: Any ) -> list[list[float]]
Generate embeddings.
| Parameter | Type | Description |
|---|---|---|
| `texts` | list[str] | str | Text or list of texts to embed. |
| `model` | str | Model ID (default: "embed-english-v3.0"). |
| `input_type` | str | Type of input ("search_document", "search_query", "classification", "clustering"). **kwargs: Additional parameters. |
| Type | Description |
|---|---|
| list[list[float]] | List of embedding vectors. |
Example
Embed documents
Section titled “Embed documents”doc_embeddings = await client.embed( … texts=[“Doc 1”, “Doc 2”], … input_type=“search_document” … )
Embed query
Section titled “Embed query”query_embedding = await client.embed( … texts=“What is AI?”, … input_type=“search_query” … )
async def rerank( query: str, documents: list[str] | list[dict[str, str]], model: str = 'rerank-english-v3.0', top_n: int | None = None, **kwargs: Any ) -> list[dict[str, Any]]
Rerank documents for a query.
| Parameter | Type | Description |
|---|---|---|
| `query` | str | Search query. |
| `documents` | list[str] | list[dict[str, str]] | List of documents (strings or dicts with 'text' key). |
| `model` | str | Reranking model (default: "rerank-english-v3.0"). |
| `top_n` | int | None | Return top N results (default: all). **kwargs: Additional parameters. |
| Type | Description |
|---|---|
| list[dict[str, Any]] | List of ranked documents with scores. |
Example
results = await client.rerank( … query=“What is machine learning?”, … documents=[ … “ML is a subset of AI…”, … “Unrelated document…”, … “Deep learning uses neural networks…” … ], … top_n=2 … ) for result in results: … print(f”Score: {result[‘relevance_score’]:.3f} - {result[‘document’][‘text’]}”)
Perform a lightweight health check against the provider.
Close the HTTP client.
Completion
Section titled “Completion”LLM completion response.
Implements completion semantics with DomainModel for validation and additional fields.
Example
completion = Completion( … content=“Hello! I’m doing well, thank you.”, … model=“gpt-4-turbo”, … usage=TokenUsage(prompt_tokens=10, completion_tokens=8, total_tokens=18) … )
ConversationConfig
Section titled “ConversationConfig”Configuration for conversation management.
Example
config = ConversationConfig( … max_tokens=4096, … reserve_tokens=1000, … trim_strategy=“oldest” … )
ConversationManager
Section titled “ConversationManager”Manage multi-turn conversations with automatic context window management.
This class handles:
- Message history management
- Automatic token counting
- Context window trimming
- System prompt handling
- Conversation statistics
Example
from lexigram.ai.llm import OpenAIClient, ConversationManager
client = OpenAIClient(api_key=“sk-…”, model=“gpt-4”) manager = ConversationManager( … client=client, … system_prompt=“You are a helpful assistant.”, … max_tokens=4096 … )
Add user message and get response
Section titled “Add user message and get response”response = await manager.chat(“What is Python?”) print(response.content)
Continue conversation
Section titled “Continue conversation”response = await manager.chat(“Tell me more about it”) print(response.content)
Get conversation history
Section titled “Get conversation history”history = manager.get_history() stats = manager.get_stats() print(f”Total messages: {stats.total_messages}”) print(f”Total tokens: {stats.total_tokens}”)
def __init__( client: AbstractLLMClient, system_prompt: str | None = None, max_tokens: int = 4096, reserve_tokens: int = 1000, trim_strategy: str = 'oldest', metadata: Metadata | None = None, token_counter: TokenCounterProtocol | None = None ) -> None
Initialize conversation manager.
| Parameter | Type | Description |
|---|---|---|
| `client` | AbstractLLMClient | LLM client for completions |
| `system_prompt` | str | None | Optional system prompt (prepended to all conversations) |
| `max_tokens` | int | Maximum context window size |
| `reserve_tokens` | int | Tokens to reserve for completion |
| `trim_strategy` | str | Message trimming strategy ('oldest', 'middle', 'summary') |
| `metadata` | Metadata | None | Additional metadata for the conversation |
| `token_counter` | TokenCounterProtocol | None | Optional TokenCounterProtocol implementation. If not provided, uses CharEstimateCounter. |
async def chat( message: str, role: Role = Role.USER, **completion_kwargs: Any ) -> Completion
Send a message and get a response.
| Parameter | Type | Description |
|---|---|---|
| `message` | str | Message content |
| `role` | Role | Message role (default: USER) **completion_kwargs: Additional kwargs for completion |
| Type | Description |
|---|---|
| Completion | Completion response from LLM |
Example
response = await manager.chat(“Hello!”) print(response.content)
async def add_message( role: Role, content: str, update_stats: bool = True ) -> None
Add a message to conversation history without getting a response.
| Parameter | Type | Description |
|---|---|---|
| `role` | Role | Message role |
| `content` | str | Message content |
| `update_stats` | bool | Whether to update statistics |
Example
await manager.add_message(Role.USER, “Hello”) await manager.add_message(Role.ASSISTANT, “Hi there!”)
def get_history( include_system: bool = True, limit: int | None = None ) -> list[ChatMessage]
Get conversation history.
| Parameter | Type | Description |
|---|---|---|
| `include_system` | bool | Include system message in history |
| `limit` | int | None | Maximum number of messages to return (most recent) |
| Type | Description |
|---|---|
| list[ChatMessage] | List of chat messages |
Example
history = manager.get_history(limit=10) for msg in history: … print(f”{msg.role}: {msg.content}”)
def get_stats() -> ConversationStats
Get conversation statistics.
| Type | Description |
|---|---|
| ConversationStats | Conversation statistics |
Example
stats = manager.get_stats() print(f”Total tokens: {stats.total_tokens}”)
Clear conversation history.
| Parameter | Type | Description |
|---|---|---|
| `keep_system` | bool | Keep system message when clearing |
Example
manager.clear_history()
Update the system prompt.
| Parameter | Type | Description |
|---|---|---|
| `system_prompt` | str | New system prompt |
Example
manager.update_system_prompt(“You are a Python expert.”)
Get current total token count.
| Type | Description |
|---|---|
| int | Total tokens in conversation |
Example
tokens = manager.get_token_count() print(f”Current tokens: {tokens}”)
Get available tokens for completion.
| Type | Description |
|---|---|
| int | Available tokens (max_tokens - current_tokens - reserve_tokens) Can be negative if context window is exceeded |
Example
available = manager.get_available_tokens() print(f”Available for completion: {available}”)
Export conversation history to dictionary.
| Type | Description |
|---|---|
| dict[str, Any] | Dictionary with conversation data (JSON-serializable) |
Example
data = manager.export_history() from lexigram import serialization as json with open(“conversation.json”, “w”) as f: … json.dump(data, f)
def from_history( cls, client: AbstractLLMClient, history_data: dict[str, Any] ) -> ConversationManager
Create conversation manager from exported history.
| Parameter | Type | Description |
|---|---|---|
| `client` | AbstractLLMClient | LLM client |
| `history_data` | dict[str, Any] | Exported history data |
| Type | Description |
|---|---|
| ConversationManager | ConversationManager instance |
Example
from lexigram import serialization as json with open(“conversation.json”) as f: … data = json.load(f) manager = ConversationManager.from_history(client, data)
ConversationStats
Section titled “ConversationStats”Statistics for a conversation.
Example
stats = ConversationStats( … total_messages=10, … total_tokens=2048, … user_messages=5, … assistant_messages=5 … )
CostEstimate
Section titled “CostEstimate”Cost estimation result.
Attributes: prompt_cost: Cost for prompt tokens. completion_cost: Cost for completion tokens. total_cost: Total estimated cost. currency: Currency code (default: USD). model: Model name. rate_per_1k_prompt: Rate per 1000 prompt tokens. rate_per_1k_completion: Rate per 1000 completion tokens.
FunctionCall
Section titled “FunctionCall”Function call request from LLM.
GenerationDefaults
Section titled “GenerationDefaults”Default generation parameters applied to every routing attempt.
Example
defaults = GenerationDefaults(temperature=0.3, max_tokens=2048)
GroqClient
Section titled “GroqClient”Client for Groq's ultra-fast LLM inference API.
Conforms to: LLMClientProtocol protocol via structural typing.
Supports Chat, Stream, and Vision with:
- Ultra-fast LPU hardware synergy
- OpenAI-compatible API surface
- Blazing-fast token generation
def __init__(config: ClientConfig)
Initialize Groq client.
| Parameter | Type | Description |
|---|---|---|
| `config` | ClientConfig | LLM configuration |
Get API key from config.
Get base URL from config.
Run a lightweight provider health probe.
List models available from the Groq API.
Close the HTTP client.
Example
await client.close()
HuggingFaceCounter
Section titled “HuggingFaceCounter”Token counter using HuggingFace AutoTokenizer (lazy-loaded).
When constructed without a model, uses character estimation (~4 chars/token). When constructed with a model name, lazy-loads that model’s tokenizer on first use.
| Parameter | Type | Description |
|---|---|---|
| `model` | Optional HuggingFace model name. If None, uses char estimation fallback. |
Initialize HuggingFaceCounter.
| Parameter | Type | Description |
|---|---|---|
| `model` | str | None | Optional HuggingFace model name for tokenizer loading. |
Backend identifier.
Count tokens in a text string.
def count_messages(messages: list[ChatMessage]) -> int
Count tokens in a list of chat messages.
ImageBase64Part
Section titled “ImageBase64Part”An image pre-encoded as base64 in a multimodal message.
Attributes:
data: Raw base64-encoded bytes (no data: prefix).
media_type: MIME type, e.g. "image/jpeg".
type: Discriminator field, always "image_base64".
ImageUrlPart
Section titled “ImageUrlPart”An image specified by URL in a multimodal message.
The framework passes the URL through to providers that support it natively (OpenAI, Anthropic, Gemini). For providers that require base64 (Ollama, Bedrock), the client fetches and converts.
Attributes:
url: Public or data-URI URL of the image.
detail: OpenAI vision detail level ("auto", "low", "high").
type: Discriminator field, always "image_url".
InstructorExtractor
Section titled “InstructorExtractor”Structured extraction from LLM completions using instructor library.
Extracts typed Pydantic models from LLM responses by:
- Building a ChatMessage list with extraction instructions
- Calling llm_client.complete() to get a Completion
- Parsing the completion text as JSON
- Validating against the response_model
- Retrying on validation/parse failures up to max_retries
Unlike direct instructor usage, this implementation uses the standard
LLMClientProtocol.complete() method, avoiding coupling to provider-specific
client patching mechanisms.
Example
from pydantic import BaseModel
class UserInfo(BaseModel): name: str age: int
extractor = InstructorExtractor(llm_client)result = await extractor.extract( prompt="Extract user info from: 'John is 30 years old'", response_model=UserInfo,)if result.is_ok(): user = result.unwrap() print(user.name, user.age)else: error = result.unwrap_err() # handle ExtractionErrordef __init__( llm_client: LLMClientProtocol, mode: str = 'json', max_retries: int = 3 ) -> None
Initialize InstructorExtractor.
| Parameter | Type | Description |
|---|---|---|
| `llm_client` | LLMClientProtocol | LLMClientProtocol instance for making LLM calls. |
| `mode` | str | Instructor patching mode (reserved for future provider-level integration; currently unused). |
| `max_retries` | int | Maximum number of retries on validation/parse failure. |
async def extract( prompt: str, response_model: type[T], context: list | None = None, **kwargs: Any ) -> Result[T, ExtractionError]
Extract a structured response_model instance from an LLM call.
| Parameter | Type | Description |
|---|---|---|
| `prompt` | str | User prompt for extraction. |
| `response_model` | type[T] | Pydantic BaseModel class to extract and validate. |
| `context` | list | None | Optional list of additional ChatMessage objects for context. **kwargs: Additional parameters passed to llm_client.complete(). |
| Type | Description |
|---|---|
| Result[T, ExtractionError] | ``Ok(instance)`` on successful extraction and validation. ``Err(ExtractionError)`` on parse, validation, or max retries failure. |
| Exception | Description |
|---|
JSONExtractor
Section titled “JSONExtractor”Extract and parse JSON from LLM responses.
Extract JSON from text.
Extract JSON array from text.
JSONFilePricingSource
Section titled “JSONFilePricingSource”Pricing source from local JSON file.
This is the fastest and most reliable source as it doesn’t require network calls and works offline.
Attributes: file_path: Path to JSON pricing file. cache: In-memory cache of loaded pricing.
Example
source = JSONFilePricingSource(Path(“custom_pricing.json”)) pricing = await source.get_pricing(“gpt-4-turbo”)
Initialize JSON file pricing source.
| Parameter | Type | Description |
|---|---|---|
| `file_path` | Path | Path to JSON file containing pricing data. |
async def get_pricing(model: str) -> ModelPricing | None
Get pricing for a specific model.
| Parameter | Type | Description |
|---|---|---|
| `model` | str | Model identifier. |
| Type | Description |
|---|---|
| ModelPricing | None | ModelPricing if found, None otherwise. |
async def get_all_pricing() -> dict[str, ModelPricing]
Get all pricing data.
| Type | Description |
|---|---|
| dict[str, ModelPricing] | All pricing data from JSON file. |
Get source name.
Clear cached pricing data to force reload.
LLMCache
Section titled “LLMCache”In-memory cache for LLM responses with TTL.
Implements LRU eviction when max_size is reached.
| Parameter | Type | Description |
|---|---|---|
| `ttl` | Time-to-live in seconds (default: 1 hour). | |
| `max_size` | Maximum number of entries (default: 1000). | |
| `max_size_bytes` | Maximum cache size in bytes (default: 100MB). |
Example
cache = LLMCache(ttl=3600, max_size=500) result = await cache.get(“key”) await cache.set(“key”, “value”)
Initialize LLM cache.
| Parameter | Type | Description |
|---|---|---|
| `ttl` | float | Time-to-live in seconds. |
| `max_size` | int | Maximum number of entries. |
| `max_size_bytes` | int | Maximum total size in bytes. |
Get value from cache.
| Parameter | Type | Description |
|---|---|---|
| `key` | str | dict[str, Any] | Cache key (string or dict). |
| Type | Description |
|---|---|
| Any | None | Cached value or None if not found/expired. |
Set value in cache.
| Parameter | Type | Description |
|---|---|---|
| `key` | str | dict[str, Any] | Cache key (string or dict). |
| `value` | Any | Value to cache. |
| `ttl` | float | None | Optional TTL override. |
async def get_or_compute( key: str | dict[str, Any], compute_fn: Callable[[], Any], ttl: float | None = None ) -> Any
Get from cache or compute and cache result.
| Parameter | Type | Description |
|---|---|---|
| `key` | str | dict[str, Any] | Cache key. |
| `compute_fn` | Callable[[], Any] | Function to compute value if cache miss. |
| `ttl` | float | None | Optional TTL override. |
| Type | Description |
|---|---|
| Any | Cached or computed value. |
Example
result = await cache.get_or_compute( … key=“greeting”, … compute_fn=lambda: llm.complete(“Say hello”) … )
Delete entry from cache.
| Parameter | Type | Description |
|---|---|---|
| `key` | str | dict[str, Any] | Cache key to delete. |
| Type | Description |
|---|---|
| bool | True if entry was deleted. |
Clear all cache entries.
Get cache statistics.
| Type | Description |
|---|---|
| CacheStats | CacheStats object. |
LLMCompletionEvent
Section titled “LLMCompletionEvent”Emitted when an LLM completion is received.
Distinct from LLMCallStartedHook (which intercepts); this is the immutable record that a completion happened.
Consumed by: cost accounting, audit, safety review.
LLMConfig
Section titled “LLMConfig”Root configuration object for the LLM routing system.
All providers are opt-in: a provider joins the cascade only when its
credential environment variable is set. Use from_env to build
from LEX_AI_LLM__ environment variables.
Example
config = LLMConfig( providers=[ ProviderConfig(name="groq", model="llama-3.3-70b-versatile", api_key="gsk_..."), ProviderConfig(name="gemini", model="gemini-2.5-flash", api_key="AIza..."), ], defaults=GenerationDefaults(temperature=0.3),)Environment variables (prefix LEX_AI_LLM__)
Global:
LEX_AI_LLM__STRATEGY sequential | parallel_race | cost_optimized | latency_optimized LEX_AI_LLM__DEFAULTS__TEMPERATURE float (default 0.2) LEX_AI_LLM__DEFAULTS__MAX_TOKENS int (default: provider default) LEX_AI_LLM__QUOTA__BACKEND memory | database (default memory) LEX_AI_LLM__LOG__BACKEND memory | database (default memory) LEX_AI_LLM__LOG__MAX_ENTRIES int (default 1000)
Per-provider (pattern: LEX_AI_LLM__PROVIDERS__{NAME}__{FIELD}):
__{NAME}__API_KEY str API key -- activates key-auth providers __{NAME}__BASE_URL str Endpoint -- activates local/custom providers __{NAME}__MODEL str Model override (has per-provider defaults) __{NAME}__TIMEOUT int Request timeout in seconds (default 30) __{NAME}__ENABLED bool Explicit enable/disable (default true)
Supported provider names and their activation:
OPENAI API_KEY required default model: gpt-4o ANTHROPIC API_KEY required default model: claude-3-5-sonnet-20241022 GROQ API_KEY required default model: llama-3.3-70b-versatile GEMINI API_KEY required default model: gemini-2.5-flash MISTRAL API_KEY required default model: mistral-large-latest COHERE API_KEY required default model: command-r-plus OPENROUTER API_KEY required default model: openai/gpt-4o-mini DEEPSEEK API_KEY required default model: deepseek-chat TOGETHER API_KEY required default model: meta-llama/Llama-3-8b-chat-hf FIREWORKS API_KEY required default model: accounts/fireworks/models/llama-v3-70b-instruct OLLAMA BASE_URL required default model: llama3.2 (default base: http://localhost:11434) LOCAL BASE_URL + MODEL required (generic OpenAI-compatible: LM Studio, VLLM, etc.)
Azure-specific extras (activated by AZURE__API_KEY + AZURE__BASE_URL):
LEX_AI_LLM__PROVIDERS__AZURE__EXTRAS__AZURE_RESOURCE LEX_AI_LLM__PROVIDERS__AZURE__EXTRAS__AZURE_DEPLOYMENT LEX_AI_LLM__PROVIDERS__AZURE__EXTRAS__AZURE_API_VERSION
Cloudflare-specific extras (activated by CLOUDFLARE__EXTRAS__CF_ACCOUNT_ID):
LEX_AI_LLM__PROVIDERS__CLOUDFLARE__EXTRAS__CF_ACCOUNT_ID <- activates LEX_AI_LLM__PROVIDERS__CLOUDFLARE__EXTRAS__CF_API_TOKEN LEX_AI_LLM__PROVIDERS__CLOUDFLARE__MODEL
AWS Bedrock extras (activated by BEDROCK__EXTRAS__AWS_REGION):
LEX_AI_LLM__PROVIDERS__BEDROCK__EXTRAS__AWS_REGION <- activates LEX_AI_LLM__PROVIDERS__BEDROCK__MODEL LEX_AI_LLM__PROVIDERS__BEDROCK__EXTRAS__AWS_ACCESS_KEY_ID LEX_AI_LLM__PROVIDERS__BEDROCK__EXTRAS__AWS_SECRET_ACCESS_KEY
Google Vertex AI extras (activated by VERTEX__EXTRAS__VERTEX_PROJECT):
LEX_AI_LLM__PROVIDERS__VERTEX__EXTRAS__VERTEX_PROJECT <- activates LEX_AI_LLM__PROVIDERS__VERTEX__EXTRAS__VERTEX_LOCATION LEX_AI_LLM__PROVIDERS__VERTEX__MODEL LEX_AI_LLM__PROVIDERS__VERTEX__EXTRAS__VERTEX_CREDENTIALS_FILEdef from_env(cls) -> LLMConfig
Build a routing config from LEX_AI_LLM__ environment variables.
LLMModule
Section titled “LLMModule”LLM client and model-management integration.
Call configure to register an LLMClientProtocol implementation and optional model manager for injection.
Usage
from lexigram.ai.llm.config import ClientConfig
@module( imports=[ LLMModule.configure( ClientConfig(provider="openai", model="gpt-4o") ) ])class AppModule(Module): passMulti-provider routing
from lexigram.ai.llm import LLMModule
@module( imports=[LLMModule.configure(routing=LLMConfig())])class AppModule(Module): passdef configure( cls, config: ClientConfig | Any | None = None, *, routing: LLMConfig | Any | None = None, enable_model_manager: bool = False, enable_streaming: bool = True ) -> DynamicModule
Create an LLMModule with a single configured provider.
| Parameter | Type | Description |
|---|---|---|
| `config` | ClientConfig | Any | None | ClientConfig or ``None`` to read configuration from environment variables. |
| `routing` | LLMConfig | Any | None | Optional LLMConfig enabling the multi-provider routing layer instead of the single-provider client. |
| `enable_model_manager` | bool | Register LLMModelManager for local model lifecycle control. |
| `enable_streaming` | bool | Enable streaming response support. Defaults to ``True``; set to ``False`` to restrict to non-streaming clients only. |
| Type | Description |
|---|---|
| DynamicModule | A DynamicModule descriptor. |
def stub( cls, config: ClientConfig | Any | None = None ) -> DynamicModule
Create an LLMModule suitable for unit and integration testing.
Uses a no-op or stub LLM client with minimal external dependencies. Streaming is disabled by default to simplify test assertions.
| Parameter | Type | Description |
|---|---|---|
| `config` | ClientConfig | Any | None | Optional ClientConfig override. Uses safe test defaults when ``None``. |
| Type | Description |
|---|---|
| DynamicModule | A DynamicModule descriptor. |
LLMProvider
Section titled “LLMProvider”Provider that registers LLM services with the Lexigram DI container.
Registers an LLMClientProtocol, optional LLM response cache, and an LLMModelManager so all three are injectable throughout the application.
Example
from lexigram.ai.llm.di.provider import LLMProvider from lexigram.ai.llm.config import ClientConfig
app.use(LLMProvider(ClientConfig(provider=“openai”, model=“gpt-4o”)))
LLMClientProtocol is now injectable:
Section titled “LLMClientProtocol is now injectable:”class MyService: … def init(self, llm: LLMClientProtocol) -> None: … self.llm = llm
def __init__( config: ClientConfig | None = None, enable_model_manager: bool = False, enable_streaming: bool = True, name: str = 'llm', cache_backend: CacheBackendProtocol | None = None ) -> None
Initialize the LLM Provider.
| Parameter | Type | Description |
|---|---|---|
| `config` | ClientConfig | None | LLM client configuration; defaults to ClientConfig() (reads env). |
| `enable_model_manager` | bool | Register LLMModelManager for local model control. |
| `enable_streaming` | bool | Enable streaming response support. |
| `name` | str | Provider name used for identification. |
| `cache_backend` | CacheBackendProtocol | None | Injected cache backend for optional response caching. |
async def register(container: ContainerRegistrarProtocol) -> None
Register LLM services with the DI container.
| Parameter | Type | Description |
|---|---|---|
| `container` | ContainerRegistrarProtocol | The Lexigram DI container registrar. |
async def boot(container: ContainerResolverProtocol) -> None
Boot the LLM provider — validates API key presence and format.
| Parameter | Type | Description |
|---|---|---|
| `container` | ContainerResolverProtocol | The DI container resolver. |
Close client connections on application shutdown.
Return basic health information for the registered LLM client.
LLMProviderRegisteredHook
Section titled “LLMProviderRegisteredHook”Payload fired when an LLM provider is registered in the provider registry.
Attributes: provider: Identifier of the provider that was registered.
LLMRequestSentHook
Section titled “LLMRequestSentHook”Payload fired when an LLM request is dispatched to a provider.
Attributes:
provider: Provider identifier (e.g. "openai").
model: Model name targeted by the request (e.g. "gpt-4o").
LLMResponseReceivedHook
Section titled “LLMResponseReceivedHook”Payload fired when a complete LLM response is received from a provider.
Attributes: provider: Provider identifier that returned the response. model: Model name that produced the response.
LLMRoutingProvider
Section titled “LLMRoutingProvider”Provider that registers the multi-provider LLM router with the DI container.
Builds the LLMRouter from a LLMConfig, chooses the appropriate quota backend and inference logger, and registers everything as singletons.
Example
from lexigram.ai.llm.module import LLMModule from lexigram.ai.llm.routing import LLMConfig
app.use(LLMModule.configure(routing=LLMConfig.from_env()))
LLMRouterProtocol is now injectable:
Section titled “LLMRouterProtocol is now injectable:”class MyService: … def init(self, router: LLMRouterProtocol) -> None: … self.router = router
def __init__( config: LLMConfig | None = None, database_provider: DatabaseProviderProtocol | None = None, model_selector: ModelSelector | None = None ) -> None
Initialise the LLM routing provider.
| Parameter | Type | Description |
|---|---|---|
| `config` | LLMConfig | None | Routing configuration; defaults to ``LLMConfig.from_env()``. |
| `database_provider` | DatabaseProviderProtocol | None | Injected DB provider used when ``quota.backend`` or ``logging.backend`` is ``database``. |
| `model_selector` | ModelSelector | None | Optional model selector for capability-based routing. When provided, ``required_capabilities`` in route kwargs will filter providers whose models lack the requested capabilities. |
async def register(container: ContainerRegistrarProtocol) -> None
Build and register the LLMRouter with the DI container.
| Parameter | Type | Description |
|---|---|---|
| `container` | ContainerRegistrarProtocol | The Lexigram DI container registrar. |
async def boot(container: ContainerResolverProtocol) -> None
Boot phase — no-op for this provider.
| Parameter | Type | Description |
|---|---|---|
| `container` | ContainerResolverProtocol | The DI container resolver. |
Close all routing clients on application shutdown.
Return basic health information for the router.
| Parameter | Type | Description |
|---|---|---|
| `timeout` | float | Unused; retained for interface compatibility. |
| Type | Description |
|---|---|
| dict[str, Any] | A dict with ``status`` and ``providers`` keys. |
LogConfig
Section titled “LogConfig”Configuration for inference attempt logging.
Example
cfg = LogConfig(backend=“database”, max_entries=5000)
MistralClient
Section titled “MistralClient”Client for Mistral AI's LLM API.
Conforms to: LLMClientProtocol protocol via structural typing.
Supports Chat, Stream, and Embeddings with:
- High-performance European LLMs
- GDPR compliance and data sovereignty
- Function calling and JSON mode
def __init__(config: ClientConfig)
Initialize Mistral client.
| Parameter | Type | Description |
|---|---|---|
| `config` | ClientConfig | LLM configuration |
Get API key from config.
Get base URL from config.
Perform a lightweight health check against the Mistral API.
Calls the models endpoint to verify the API key is valid and the service is reachable.
| Parameter | Type | Description |
|---|---|---|
| `timeout` | float | Maximum seconds to wait for the response. |
| Type | Description |
|---|---|
| HealthCheckResult | HealthCheckResult. |
async def embed( model: str = 'mistral-embed', input_texts: list[str] | str | None = None, **kwargs ) -> list[list[float]]
Generate embeddings.
| Parameter | Type | Description |
|---|---|---|
| `model` | str | Model ID (default: "mistral-embed"). |
| `input_texts` | list[str] | str | None | Text or list of texts to embed. **kwargs: Additional parameters. |
| Type | Description |
|---|---|
| list[list[float]] | List of embedding vectors. |
Example
embeddings = await client.embed( … input_texts=[“Hello world”, “Bonjour monde”] … ) print(f”Embedding dimension: {len(embeddings[0])}”)
Close the HTTP client.
Example
await client.close()
MistralCounter
Section titled “MistralCounter”Token counter using mistral-common tokenizer (lazy-loaded).
Tokenizer is loaded on first use, not at construction time.
Initialize MistralCounter.
Backend identifier.
Count tokens in a text string.
def count_messages(messages: list[ChatMessage]) -> int
Count tokens in a list of chat messages.
ModelCapabilities
Section titled “ModelCapabilities”Model capabilities and constraints.
ModelPricing
Section titled “ModelPricing”Pricing information for a specific LLM model.
Attributes: model: Model identifier (e.g., “gpt-4-turbo”, “claude-3-opus”). prompt_per_1m: Cost per 1 million prompt tokens in USD. completion_per_1m: Cost per 1 million completion tokens in USD. provider: Provider name (e.g., “openai”, “anthropic”). last_updated: When pricing was last updated. source: Where pricing data came from (e.g., “json”, “api”, “static”).
Example
pricing = ModelPricing( … model=“gpt-4-turbo”, … prompt_per_1m=10.00, … completion_per_1m=30.00, … provider=“openai” … ) print(f”${pricing.prompt_per_1m} per 1M prompt tokens”)
ModelSelector
Section titled “ModelSelector”Intelligent model selector with fallback support.
Automatically selects the best model based on prompt characteristics and provides fallback chains for reliability.
Example
selector = ModelSelector( … default_model=“gpt-3.5-turbo”, … strategies=[ … SelectionStrategy( … name=“complex”, … model=“gpt-4-turbo”, … conditions={“min_tokens”: 1000} … ), … SelectionStrategy( … name=“simple”, … model=“claude-3-haiku-20240307”, … conditions={“max_tokens”: 500} … ) … ], … fallback_chain=[“gpt-4-turbo”, “gpt-3.5-turbo”] … )
Select model for a prompt
Section titled “Select model for a prompt”model = selector.select(“Long prompt here…”) print(model) ‘gpt-4-turbo’
Get next fallback on error
Section titled “Get next fallback on error”fallback = selector.get_fallback(“gpt-4-turbo”) print(fallback) ‘gpt-3.5-turbo’
def __init__( default_model: str | None = None, strategies: list[SelectionStrategy] | None = None, fallback_chain: list[str] | None = None, model_capabilities: dict[str, ModelCapabilities] | None = None, token_counter: TokenCounterProtocol | None = None )
Initialize model selector.
| Parameter | Type | Description |
|---|---|---|
| `default_model` | str | None | Default model to use |
| `strategies` | list[SelectionStrategy] | None | List of selection strategies |
| `fallback_chain` | list[str] | None | Ordered list of fallback models |
| `model_capabilities` | dict[str, ModelCapabilities] | None | Custom model capabilities |
| `token_counter` | TokenCounterProtocol | None | Token counter for prompt analysis |
Example
selector = ModelSelector( … default_model=“gpt-3.5-turbo”, … fallback_chain=[“gpt-4”, “claude-3-sonnet-20240229”] … )
def select( prompt: str, context: dict[str, Any] | None = None, required_capabilities: list[str] | None = None ) -> str
Select the best model for the given prompt.
| Parameter | Type | Description |
|---|---|---|
| `prompt` | str | The prompt text |
| `context` | dict[str, Any] | None | Additional context for selection |
| `required_capabilities` | list[str] | None | Required capabilities (e.g., ["supports_functions"]) |
| Type | Description |
|---|---|
| str | Selected model name |
Example
model = selector.select( … “Analyze this image…”, … required_capabilities=[“supports_vision”] … ) print(model) ‘gpt-4-turbo’
Get the next model in the fallback chain.
| Parameter | Type | Description |
|---|---|---|
| `failed_model` | str | The model that failed |
| Type | Description |
|---|---|
| str | None | Next fallback model, or None if no fallback available |
Example
fallback = selector.get_fallback(“gpt-4-turbo”) print(fallback) ‘gpt-3.5-turbo’
def get_capabilities(model: str) -> ModelCapabilities | None
Get capabilities for a model.
| Parameter | Type | Description |
|---|---|---|
| `model` | str | Model name |
| Type | Description |
|---|---|
| ModelCapabilities | None | Model capabilities or None if unknown |
Example
caps = selector.get_capabilities(“gpt-4-turbo”) print(caps.max_tokens) 128000
Estimate cost for a model call.
| Parameter | Type | Description |
|---|---|---|
| `model` | str | Model name |
| `input_tokens` | int | Number of input tokens |
| `output_tokens` | int | Number of output tokens |
| Type | Description |
|---|---|
| float | Estimated cost in USD |
Example
cost = selector.estimate_cost(“gpt-4-turbo”, 1000, 500) print(f”${cost:.4f}”) $0.0250
OllamaClient
Section titled “OllamaClient”Ollama LLM client for local models.
Conforms to: LLMClientProtocol protocol via structural typing.
Supports running LLMs locally with Ollama:
- Llama 3, Mistral, Phi, and other open models
- Streaming responses
- Zero API costs
- Full data privacy
Example
from lexigram.ai import ClientConfig config = ClientConfig( … provider=“ollama”, … model=“llama3:8b”, … api_base=“http://localhost:11434” … ) client = OllamaClient(config) completion = await client.complete([ … ChatMessage(role=“user”, content=“Hello!”) … ])
def __init__(config: ClientConfig)
Initialize Ollama client.
| Parameter | Type | Description |
|---|---|---|
| `config` | ClientConfig | LLM configuration |
| Exception | Description |
|---|---|
| ImportError | If ollama package is not installed |
Perform a lightweight health check against the Ollama daemon.
Calls list() to verify the daemon is running and reachable.
| Parameter | Type | Description |
|---|---|---|
| `timeout` | float | Maximum seconds to wait for the response. |
| Type | Description |
|---|---|
| HealthCheckResult | HealthCheckResult. |
Close Ollama client.
OpenAIClient
Section titled “OpenAIClient”OpenAI LLM client implementation.
Conforms to: LLMClientProtocol protocol via structural typing.
Supports GPT-4, GPT-3.5-Turbo, and other OpenAI models with:
- Streaming responses
- Function/tool calling
- Vision models
- Automatic retry with exponential backoff
- Error handling and rate limit management
Example
from lexigram.ai import ClientConfig config = ClientConfig(provider=“openai”, model=“gpt-4-turbo”) client = OpenAIClient(config) completion = await client.complete([ … ChatMessage(role=“user”, content=“Hello!”) … ])
def __init__(config: ClientConfig)
Initialize OpenAI client.
| Parameter | Type | Description |
|---|---|---|
| `config` | ClientConfig | LLM configuration |
| Exception | Description |
|---|---|
| ImportError | If openai package is not installed |
Close the OpenAI client and cleanup resources.
Perform health check.
| Type | Description |
|---|---|
| HealthCheckResult | Structured health check result. |
OpenRouterClient
Section titled “OpenRouterClient”Client for OpenRouter (OpenAI-compatible) API.
Conforms to: LLMClientProtocol protocol via structural typing.
def __init__(config: ClientConfig)
Initialize OpenRouter client.
| Parameter | Type | Description |
|---|---|---|
| `config` | ClientConfig | LLM configuration |
Get API key from config.
Get base URL from config.
Get default model from config.
Perform a lightweight health check against the OpenRouter API.
Calls the models listing endpoint to verify the API key is valid and the service is reachable.
| Parameter | Type | Description |
|---|---|---|
| `timeout` | float | Maximum seconds to wait for the response. |
| Type | Description |
|---|---|
| HealthCheckResult | HealthCheckResult. |
OutputFilter
Section titled “OutputFilter”Filter LLM output for sensitive information.
Prevents leaking of system prompts, internal data, etc.
Filter LLM output for leaks.
| Parameter | Type | Description |
|---|---|---|
| `output` | str | LLM output |
| `system_prompt` | str | System prompt (check if leaked) |
| Type | Description |
|---|---|
| str | Filtered output |
PricingManager
Section titled “PricingManager”Manages pricing data from multiple sources with caching.
Sources are queried in order until pricing is found. Typical hierarchy:
- JSON file (fastest, most reliable)
- API endpoints (for updates)
- Static fallback (hardcoded)
Attributes: sources: List of pricing sources in priority order. cache: Pricing cache instance. enable_fuzzy_match: Whether to enable fuzzy model name matching.
Example
Use defaults
Section titled “Use defaults”manager = PricingManager.from_defaults()
Custom configuration
Section titled “Custom configuration”manager = ( … PricingManager.builder() … .add_json_source(“pricing.json”) … .add_api_source(“https://api.example.com/pricing”) … .with_cache_ttl(3600) … .enable_fuzzy_matching() … .build() … )
pricing = await manager.get_pricing(“gpt-4-turbo”)
def __init__( sources: Sequence[AbstractPricingSource], cache_ttl: int = 86400, enable_fuzzy_match: bool = True )
Initialize pricing manager.
| Parameter | Type | Description |
|---|---|---|
| `sources` | Sequence[AbstractPricingSource] | List of pricing sources in priority order. |
| `cache_ttl` | int | Cache TTL in seconds (default: 24 hours). |
| `enable_fuzzy_match` | bool | Enable fuzzy model name matching (default: True). |
async def get_pricing( model: str, force_refresh: bool = False ) -> ModelPricing
Get pricing for a specific model.
Queries sources in order:
- Cache (if not force_refresh)
- Each source in priority order
- Fuzzy match if enabled
- Default fallback
| Parameter | Type | Description |
|---|---|---|
| `model` | str | Model identifier (e.g., "gpt-4-turbo"). |
| `force_refresh` | bool | Bypass cache and fetch fresh data. |
| Type | Description |
|---|---|
| ModelPricing | ModelPricing for the model. |
| Exception | Description |
|---|---|
| ValueError | If model not found in any source. |
List all available models.
| Parameter | Type | Description |
|---|---|---|
| `provider` | str | None | Filter by provider (optional). |
| Type | Description |
|---|---|
| list[str] | List of model names. |
Clear pricing cache.
def from_defaults(cls) -> PricingManager
Create manager with default configuration.
Uses LiteLLM API for dynamic, up-to-date pricing data. No static pricing files - always fetches current data.
| Type | Description |
|---|---|
| PricingManager | PricingManager with API source. |
Example
manager = PricingManager.from_defaults() pricing = await manager.get_pricing(“gpt-4”)
def from_json( cls, file_path: str | Path, cache_ttl: int = 86400 ) -> PricingManager
Create manager from JSON file only.
Useful for offline applications or when you want full control over pricing data.
| Parameter | Type | Description |
|---|---|---|
| `file_path` | str | Path | Path to JSON pricing file. |
| `cache_ttl` | int | Cache TTL in seconds (default: 24 hours). |
| Type | Description |
|---|---|
| PricingManager | PricingManager with JSON source only. |
Example
manager = PricingManager.from_json(“my_pricing.json”) pricing = await manager.get_pricing(“custom-model”)
def from_api( cls, endpoint: str, cache_ttl: int = 86400 ) -> PricingManager
Create manager from API endpoint only.
| Parameter | Type | Description |
|---|---|---|
| `endpoint` | str | API endpoint URL. |
| `cache_ttl` | int | Cache TTL in seconds (default: 24 hours). |
| Type | Description |
|---|---|
| PricingManager | PricingManager with API source only. |
Example
manager = PricingManager.from_api(“https://api.example.com/pricing”) pricing = await manager.get_pricing(“gpt-4”)
def builder(cls) -> PricingManagerBuilder
Create a builder for custom configuration.
| Type | Description |
|---|---|
| PricingManagerBuilder | PricingManagerBuilder instance. |
Example
manager = ( … PricingManager.builder() … .add_json_source(“custom.json”) … .add_api_source(“https://api.example.com”) … .with_cache_ttl(3600) … .build() … )
PricingManagerBuilder
Section titled “PricingManagerBuilder”Builder for PricingManager with validation.
Provides a fluent API for configuring pricing sources safely.
Example
manager = ( … PricingManager.builder() … .add_json_source(“pricing.json”) … .add_api_source(“https://api.example.com/pricing”) … .add_fallback({“custom-model”: ModelPricing(…)}) … .with_cache_ttl(3600) … .enable_fuzzy_matching() … .build() … )
Initialize builder.
def add_json_source(file_path: str | Path) -> PricingManagerBuilder
Add JSON file pricing source.
| Parameter | Type | Description |
|---|---|---|
| `file_path` | str | Path | Path to JSON file. |
| Type | Description |
|---|---|
| PricingManagerBuilder | Self for chaining. |
def add_api_source( endpoint: str, timeout: float = 10.0 ) -> PricingManagerBuilder
Add API endpoint pricing source.
| Parameter | Type | Description |
|---|---|---|
| `endpoint` | str | API endpoint URL. |
| `timeout` | float | Request timeout in seconds (default: 10). |
| Type | Description |
|---|---|
| PricingManagerBuilder | Self for chaining. |
def add_fallback(pricing_map: dict[str, ModelPricing]) -> PricingManagerBuilder
Add static fallback pricing.
| Parameter | Type | Description |
|---|---|---|
| `pricing_map` | dict[str, ModelPricing] | Dictionary of model to pricing. |
| Type | Description |
|---|---|
| PricingManagerBuilder | Self for chaining. |
def add_source(source: AbstractPricingSource) -> PricingManagerBuilder
Add custom pricing source.
| Parameter | Type | Description |
|---|---|---|
| `source` | AbstractPricingSource | Custom AbstractPricingSource implementation. |
| Type | Description |
|---|---|
| PricingManagerBuilder | Self for chaining. |
def with_cache_ttl(seconds: int) -> PricingManagerBuilder
Set cache TTL.
| Parameter | Type | Description |
|---|---|---|
| `seconds` | int | Cache TTL in seconds. |
| Type | Description |
|---|---|
| PricingManagerBuilder | Self for chaining. |
| Exception | Description |
|---|---|
| ValueError | If seconds is negative. |
def enable_fuzzy_matching(enabled: bool = True) -> PricingManagerBuilder
Enable or disable fuzzy model name matching.
| Parameter | Type | Description |
|---|---|---|
| `enabled` | bool | Whether to enable fuzzy matching (default: True). |
| Type | Description |
|---|---|
| PricingManagerBuilder | Self for chaining. |
def build() -> PricingManager
Build PricingManager instance.
| Type | Description |
|---|---|
| PricingManager | Configured PricingManager. |
| Exception | Description |
|---|---|
| ValueError | If no sources were added. |
ProviderConfig
Section titled “ProviderConfig”Configuration for a single provider in the routing cascade.
Every provider in the cascade has the same shape regardless of type.
Provider-specific fields (Azure deployment, Cloudflare account ID,
Bedrock region, Vertex project) go in extras.
Example
cfg = ProviderConfig( … name=“groq”, … model=“llama-3.3-70b-versatile”, … api_key=“gsk_…”, … )
ProviderInfo
Section titled “ProviderInfo”Information about an LLM provider.
Attributes: name: Provider identifier (e.g., “openai”, “anthropic”). client_class: LLMClientProtocol implementation class. default_models: List of default/recommended models. supports_streaming: Whether streaming is supported. supports_tools: Whether function/tool calling is supported. supports_vision: Whether vision/image inputs are supported. base_url: Default base URL for API (optional). docs_url: Documentation URL (optional). pricing_url: Pricing page URL (optional). description: Human-readable description.
ProviderRegistry
Section titled “ProviderRegistry”Registry for LLM providers.
Singleton registry that maintains information about all available LLM providers, both built-in and custom.
Initialize provider registry.
def register( name: str, client_class: type[object], default_models: list[str] | None = None, supports_streaming: bool = True, supports_tools: bool = False, supports_vision: bool = False, base_url: str | None = None, docs_url: str | None = None, pricing_url: str | None = None, description: str = '' ) -> ProviderInfo
Register a new LLM provider.
def get_provider(name: str) -> ProviderInfo
Get provider information.
List all registered provider names.
def search_providers( supports_streaming: bool | None = None, supports_tools: bool | None = None, supports_vision: bool | None = None ) -> list[ProviderInfo]
Search providers by capabilities.
Unregister a provider.
async def register_provider( name: str, client: LLMClientProtocol, models: list[ModelInfo] ) -> None
Register a provider following the ProviderRegistryProtocol.
async def get_client(provider: str) -> LLMClientProtocol | None
Get an initialized client for a provider.
List all models matching capabilities.
Get information about a specific model.
QuotaConfig
Section titled “QuotaConfig”Configuration for the quota tracking backend.
Example
cfg = QuotaConfig(backend=“database”)
RateLimiter
Section titled “RateLimiter”Rate limiter for LLM requests (RPM and TPM).
Manages multiple buckets for different models and providers.
Initialize rate limiter.
async def check( provider: str, model: str, tpm_limit: int | None = None, rpm_limit: int | None = None, estimated_tokens: int = 0 ) -> bool
Check if request is allowed under current limits.
| Parameter | Type | Description |
|---|---|---|
| `provider` | str | AI provider name |
| `model` | str | Model name |
| `tpm_limit` | int | None | Tokens Per Minute limit |
| `rpm_limit` | int | None | Requests Per Minute limit |
| `estimated_tokens` | int | Estimated tokens in request |
| Type | Description |
|---|---|
| bool | True if allowed, False if blocked |
RedisLLMCache
Section titled “RedisLLMCache”Redis-backed cache for distributed deployments.
Requires redis package to be installed.
| Parameter | Type | Description |
|---|---|---|
| `redis_url` | Redis connection URL. | |
| `ttl` | Time-to-live in seconds. | |
| `key_prefix` | Prefix for all cache keys. |
Example
cache = RedisLLMCache(redis_url=“redis://localhost:6379”) await cache.connect() result = await cache.get(“key”)
def __init__( cache_backend: CacheBackendProtocol, ttl: float = 3600, key_prefix: str = 'llm_cache:' )
Initialize Redis cache.
| Parameter | Type | Description |
|---|---|---|
| `cache_backend` | CacheBackendProtocol | The platform's cache backend. |
| `ttl` | float | Time-to-live in seconds. |
| `key_prefix` | str | Prefix for cache keys. |
Compatibility method for lifecycle-managed cache.
Compatibility method for lifecycle-managed cache.
Get value from Redis cache.
| Parameter | Type | Description |
|---|---|---|
| `key` | str | dict[str, Any] | Cache key. |
| Type | Description |
|---|---|
| Any | None | Cached value or None. |
Set value in Redis cache.
| Parameter | Type | Description |
|---|---|---|
| `key` | str | dict[str, Any] | Cache key. |
| `value` | Any | Value to cache. |
| `ttl` | float | None | Optional TTL override. |
async def get_or_compute( key: str | dict[str, Any], compute_fn: Callable[[], Any], ttl: float | None = None ) -> Any
Get from cache or compute and cache result.
| Parameter | Type | Description |
|---|---|---|
| `key` | str | dict[str, Any] | Cache key. |
| `compute_fn` | Callable[[], Any] | Function to compute value if cache miss. |
| `ttl` | float | None | Optional TTL override. |
| Type | Description |
|---|---|
| Any | Cached or computed value. |
Delete entry from cache.
| Parameter | Type | Description |
|---|---|---|
| `key` | str | dict[str, Any] | Cache key. |
| Type | Description |
|---|---|
| bool | True if deleted. |
Clear all cache entries (Warning: clears entire backend if not namespaced).
Get cache statistics.
| Type | Description |
|---|---|
| CacheStats | CacheStats object. |
ResponseFormatter
Section titled “ResponseFormatter”Format and convert LLM responses to various types.
Example
formatter = ResponseFormatter() completion = Completion(content=“42”, …) num = formatter.to_int(completion) print(num) 42
def to_json(completion: Completion) -> JSON
Convert response to JSON.
| Parameter | Type | Description |
|---|---|---|
| `completion` | Completion | LLM completion |
| Type | Description |
|---|---|
| JSON | Parsed JSON |
Example
data = formatter.to_json(completion)
def to_string( completion: Completion, strip: bool = True ) -> str
Convert response to string.
| Parameter | Type | Description |
|---|---|---|
| `completion` | Completion | LLM completion |
| `strip` | bool | Whether to strip whitespace |
| Type | Description |
|---|---|
| str | Response string |
Example
text = formatter.to_string(completion)
def to_int(completion: Completion) -> int
Convert response to integer.
| Parameter | Type | Description |
|---|---|---|
| `completion` | Completion | LLM completion |
| Type | Description |
|---|---|
| int | Parsed integer |
| Exception | Description |
|---|---|
| ParseError | If conversion fails |
Example
num = formatter.to_int(completion)
def to_float(completion: Completion) -> float
Convert response to float.
| Parameter | Type | Description |
|---|---|---|
| `completion` | Completion | LLM completion |
| Type | Description |
|---|---|
| float | Parsed float |
| Exception | Description |
|---|---|
| ParseError | If conversion fails |
Example
num = formatter.to_float(completion)
def to_bool(completion: Completion) -> bool
Convert response to boolean.
| Parameter | Type | Description |
|---|---|---|
| `completion` | Completion | LLM completion |
| Type | Description |
|---|---|
| bool | Parsed boolean |
Example
result = formatter.to_bool(completion)
def to_list( completion: Completion, separator: str = '\n' ) -> list[str]
Convert response to list of strings.
| Parameter | Type | Description |
|---|---|---|
| `completion` | Completion | LLM completion |
| `separator` | str | String separator (default: newline) |
| Type | Description |
|---|---|
| list[str] | List of strings |
Example
items = formatter.to_list(completion)
Concrete chat message role constants shared across AI packages.
SecureLLMClient
Section titled “SecureLLMClient”LLM client with injection protection and safety features.
def __init__( llm_provider: Annotated[LLMClientProtocol, Inject], system_prompt: str = 'You are a helpful assistant.', enable_output_filtering: bool = True, rate_limiter: Annotated[RateLimiter | None, Inject] = None, rpm_limit: int = 60 ) -> None
Initialize secure LLM client.
| Parameter | Type | Description |
|---|---|---|
| `llm_provider` | Annotated[LLMClientProtocol, Inject] | Underlying LLM provider (injected) |
| `system_prompt` | str | System prompt template |
| `enable_output_filtering` | bool | Enable output filtering |
async def chat( user_input: str, user_id: str, context: Sequence[dict[str, str]] | None = None, strict_validation: bool = True ) -> str
Send chat message with safety protections.
| Parameter | Type | Description |
|---|---|---|
| `user_input` | str | User message |
| `user_id` | str | User identifier (for rate limiting) |
| `context` | Sequence[dict[str, str]] | None | Previous conversation context |
| `strict_validation` | bool | Reject invalid input vs sanitize |
| Type | Description |
|---|---|
| str | LLM response |
| Exception | Description |
|---|---|
| ValueError | If input invalid (strict mode) |
Update system prompt.
| Parameter | Type | Description |
|---|---|---|
| `system_prompt` | str | New system prompt |
SecurePromptTemplate
Section titled “SecurePromptTemplate”Structured prompt template with injection protection.
Uses clear delimiters to separate system instructions from user input. Implements multi-layered injection detection.
Multi-layered injection detection.
| Parameter | Type | Description |
|---|---|---|
| `prompt` | str | Input to analyze |
| Type | Description |
|---|---|
| tuple[bool, list[str]] | Tuple of (is_malicious, reasons) |
Validate user input for injection attempts.
| Parameter | Type | Description |
|---|---|---|
| `user_input` | str | User input to validate |
| Type | Description |
|---|---|
| tuple[bool, str | None] | Tuple of (is_valid, error_message) |
Sanitize user input by removing dangerous patterns.
| Parameter | Type | Description |
|---|---|---|
| `user_input` | str | User input to sanitize |
| Type | Description |
|---|---|
| str | Sanitized input |
Format prompt with user input.
| Parameter | Type | Description |
|---|---|---|
| `user_input` | str | User input |
| `strict` | bool | If True, reject invalid input. If False, sanitize. |
| Type | Description |
|---|---|
| str | Formatted prompt |
| Exception | Description |
|---|---|
| ValueError | If input invalid and strict=True |
SelectionCriteria
Section titled “SelectionCriteria”Criteria for model selection.
SelectionStrategy
Section titled “SelectionStrategy”Strategy for selecting models based on conditions.
Example
strategy = SelectionStrategy( … name=“long_context”, … model=“gpt-4-turbo-preview”, … conditions={ … “min_tokens”: 2000, … “max_tokens”: 100000 … } … )
Check if this strategy matches the given context.
| Parameter | Type | Description |
|---|---|---|
| `context` | dict[str, Any] | Context dictionary with prompt info |
| Type | Description |
|---|---|
| bool | True if all conditions are met |
Example
context = {“token_count”: 2500, “has_code”: True} strategy.matches(context) True
StaticPricingSource
Section titled “StaticPricingSource”Pricing source from static dictionary.
Hardcoded pricing data as a fallback when other sources are unavailable. Useful for custom internal models or as ultimate fallback.
Attributes: pricing_map: Dictionary of model name to pricing.
Example
source = StaticPricingSource({ … “my-model”: ModelPricing( … model=“my-model”, … prompt_per_1m=5.0, … completion_per_1m=10.0, … provider=“custom” … ) … })
def __init__(pricing_map: dict[str, ModelPricing])
Initialize static pricing source.
| Parameter | Type | Description |
|---|---|---|
| `pricing_map` | dict[str, ModelPricing] | Dictionary mapping model names to pricing. |
async def get_pricing(model: str) -> ModelPricing | None
Get pricing for a specific model.
| Parameter | Type | Description |
|---|---|---|
| `model` | str | Model identifier. |
| Type | Description |
|---|---|
| ModelPricing | None | ModelPricing if found, None otherwise. |
async def get_all_pricing() -> dict[str, ModelPricing]
Get all pricing data.
| Type | Description |
|---|---|
| dict[str, ModelPricing] | All static pricing data. |
Get source name.
StreamChunk
Section titled “StreamChunk”A chunk of streamed completion.
Implements streaming semantics with DomainModel for validation.
Example
chunk = StreamChunk(delta=“Hello”, model=“gpt-4-turbo”, finish_reason=None)
StructuredOutputParser
Section titled “StructuredOutputParser”Schema-aware parser that validates LLM responses against a model.
Wraps extract_json_block, validate_against_model, and build_json_schema into a convenient class-based API.
| Parameter | Type | Description |
|---|---|---|
| `output_model` | Model class for validation. | |
| `strict` | Whether to enforce strict validation (default ``True``). |
Initialise with model class.
Parse and validate a completion into an output_model instance.
| Parameter | Type | Description |
|---|---|---|
| `completion` | Any | Completion object with ``.content`` attribute, or a string. |
| Type | Description |
|---|---|
| Any | Validated model instance. |
| Exception | Description |
|---|---|
| ParseError | When JSON cannot be extracted. |
| SchemaValidationError | When validation fails. |
Parse and validate an array of output_model instances.
| Parameter | Type | Description |
|---|---|---|
| `completion` | Any | Completion object with ``.content`` attribute. |
| Type | Description |
|---|---|
| list[Any] | List of validated model instances. |
| Exception | Description |
|---|---|
| ParseError | When JSON is not an array. |
| SchemaValidationError | When validation fails. |
Return JSON Schema dict for the output model.
Return a human-readable schema prompt string.
TextPart
Section titled “TextPart”A plain-text content part in a multimodal message.
Attributes:
text: The text content.
type: Discriminator field, always "text".
TiktokenCounter
Section titled “TiktokenCounter”Token counter using tiktoken (OpenAI/compatible models).
Implements TokenCounterProtocol using tiktoken for precise counting. tiktoken is a required dependency for this counter.
| Parameter | Type | Description |
|---|---|---|
| `model` | Model name (e.g. 'gpt-4', 'gpt-3.5-turbo'). | |
| `encoding_name` | Optional tiktoken encoding name override. |
Initialize TiktokenCounter.
| Parameter | Type | Description |
|---|---|---|
| `model` | str | Model name for token counting. |
| `encoding_name` | str | None | Optional tiktoken encoding name override. |
| Exception | Description |
|---|---|
| ImportError | If tiktoken is not installed. |
The model this counter is calibrated for.
Count tokens in a text string.
def count_messages(messages: list[ChatMessage]) -> int
Count tokens in a list of chat messages, including overhead.
TokenCount
Section titled “TokenCount”Token count result with metadata.
Attributes: total: Total number of tokens. prompt_tokens: Number of tokens in the prompt. completion_tokens: Number of tokens in the completion (if applicable). model: Model name used for counting. timestamp: When the count was performed.
TokenCounterRegistry
Section titled “TokenCounterRegistry”Registry mapping model-name patterns to TokenCounterProtocol backends.
Uses named backend keys and regex patterns for flexible model mapping.
Usage
registry = TokenCounterRegistry.with_defaults()counter = registry.for_model("gpt-4o")tokens = counter.count("Hello!")Create an empty registry.
def with_defaults(cls) -> TokenCounterRegistry
Create registry with all available tokenizer backends.
Registers:
- char_estimate (always available, fallback)
- tiktoken (if installed, for OpenAI/Anthropic models)
- huggingface (if installed, for HuggingFace models)
- mistral (if installed, for Mistral models)
| Type | Description |
|---|---|
| TokenCounterRegistry | TokenCounterRegistry pre-populated with default backends. |
def register( key: str, counter: TokenCounterProtocol ) -> None
Register a counter backend under a named key.
| Parameter | Type | Description |
|---|---|---|
| `key` | str | Backend name (e.g., 'tiktoken', 'huggingface', 'char_estimate'). |
| `counter` | TokenCounterProtocol | Counter implementing TokenCounterProtocol. |
Map a regex pattern of model names to a backend key.
| Parameter | Type | Description |
|---|---|---|
| `pattern` | str | Regex pattern matching model names (case-insensitive). |
| `counter_key` | str | Backend key (must be registered). |
def for_model(model: str) -> TokenCounterProtocol
Get the best counter for the given model name.
Tries exact regex match in _patterns first, falls back to ‘char_estimate’.
| Parameter | Type | Description |
|---|---|---|
| `model` | str | Model name. |
| Type | Description |
|---|---|
| TokenCounterProtocol | TokenCounterProtocol implementation. |
TokenUsage
Section titled “TokenUsage”Token usage statistics.
ToolCall
Section titled “ToolCall”Tool call request from LLM.
Functions
Section titled “Functions”complete_with_json
Section titled “complete_with_json”
async def complete_with_json( client: LLMClientProtocol, prompt: str, system_prompt: str | None = None, **kwargs: Any ) -> JSON
Complete and parse response as JSON.
| Parameter | Type | Description |
|---|---|---|
| `client` | LLMClientProtocol | LLM client |
| `prompt` | str | User prompt |
| `system_prompt` | str | None | Optional system prompt **kwargs: Additional completion arguments |
| Type | Description |
|---|---|
| JSON | Parsed JSON |
Example
data = await complete_with_json( … client, … “Generate a config with 3 fields” … )
complete_with_schema
Section titled “complete_with_schema”
async def complete_with_schema( client: LLMClientProtocol, prompt: str, schema: type[T], system_prompt: str | None = None, **kwargs: Any ) -> T
Complete with automatic schema parsing and validation.
| Parameter | Type | Description |
|---|---|---|
| `client` | LLMClientProtocol | LLM client |
| `prompt` | str | User prompt |
| `schema` | type[T] | Pydantic model for validation |
| `system_prompt` | str | None | Optional system prompt **kwargs: Additional completion arguments |
| Type | Description |
|---|---|
| T | Validated schema instance |
Example
from lexigram.ai.llm import OpenAIClient
client = OpenAIClient(api_key=“sk-…”) person = await complete_with_schema( … client, … “Extract person from: John Doe, age 30”, … schema=Person … )
create_assistant_template
Section titled “create_assistant_template”
def create_assistant_template() -> SecurePromptTemplate
Create template for general assistant.
| Type | Description |
|---|---|
| SecurePromptTemplate | Configured template |
create_balanced_selector
Section titled “create_balanced_selector”
def create_balanced_selector() -> ModelSelector
Create a balanced model selector.
create_cost_optimized_selector
Section titled “create_cost_optimized_selector”
def create_cost_optimized_selector(budget_per_1k_tokens: float = 2.0) -> ModelSelector
Create a cost-optimized model selector.
create_data_extraction_template
Section titled “create_data_extraction_template”
def create_data_extraction_template() -> SecurePromptTemplate
Create template for data extraction (high security).
| Type | Description |
|---|---|
| SecurePromptTemplate | Configured template |
create_json_mode_messages
Section titled “create_json_mode_messages”
def create_json_mode_messages( prompt: str, schema: type[DomainModel] | None = None, system_prompt: str | None = None ) -> list[dict[str, str]]
Create messages for JSON mode with optional schema.
| Parameter | Type | Description |
|---|---|---|
| `prompt` | str | User prompt |
| `schema` | type[DomainModel] | None | Optional Pydantic model for schema |
| `system_prompt` | str | None | Optional system prompt (default: JSON instruction) |
| Type | Description |
|---|---|
| list[dict[str, str]] | Messages list for LLM |
Example
messages = create_json_mode_messages( … “Extract person info”, … schema=Person … )
create_quality_optimized_selector
Section titled “create_quality_optimized_selector”
def create_quality_optimized_selector() -> ModelSelector
Create a quality-optimized model selector.
create_token_counter
Section titled “create_token_counter”
def create_token_counter( model: str = 'gpt-3.5-turbo', encoding_name: str | None = None ) -> TiktokenCounter
Factory function for creating token counters.
| Parameter | Type | Description |
|---|---|---|
| `model` | str | Model name. |
| `encoding_name` | str | None | Optional encoding name override. |
| Type | Description |
|---|---|
| TiktokenCounter | TiktokenCounter instance. |
Example
from lexigram.ai.llm import create_token_counter
counter = create_token_counter(“gpt-4”) count = counter.count(“Hello!”) print(count)
normalize_thinking_text
Section titled “normalize_thinking_text”
Extract thinking text from raw LLM output.
Tries each pattern in THINKING_PATTERNS order. Returns (clean_content, thinking_text_or_None). clean_content has thinking block removed and is stripped. thinking_text is the raw thinking content (stripped), or None if not found.
Pattern matching is by substring presence of start_marker (and end_marker after it), NOT by model name. The bare-closing-tag pattern (end_marker="", no start) matches only when start_marker is NOT found but end_marker IS found — this covers models that output …thinking…\nresponse.
Falls back: after removing a thinking block, if clean_content is empty but thinking
text was found, tries to extract from the first { or [ in the original text to
recover any JSON that may have been embedded.
| Parameter | Type | Description |
|---|---|---|
| `text` | str | Raw LLM response text, possibly containing inline thinking tags. |
| Type | Description |
|---|---|
| tuple[str, str | None] | A tuple of (clean_content, thinking_text_or_None). - clean_content: The response text with thinking stripped out, stripped of whitespace. - thinking_text_or_None: The thinking/reasoning text, or None if no thinking found. |
Exceptions
Section titled “Exceptions”ExtractionError
Section titled “ExtractionError”Base class for structured extraction errors in lexigram-ai-llm.
ExtractionMaxRetriesError
Section titled “ExtractionMaxRetriesError”Error raised when extraction max retries are exhausted.
ExtractionParseError
Section titled “ExtractionParseError”Error raised when extraction response cannot be parsed as JSON.
ExtractionValidationError
Section titled “ExtractionValidationError”Error raised when parsed extraction response fails schema validation.
InvalidRequestError
Section titled “InvalidRequestError”Error raised when a request to an LLM provider is invalid.
LLMAuthenticationError
Section titled “LLMAuthenticationError”Invalid API key or credentials — infrastructure error, raised not wrapped.
Raised as an exception (NOT wrapped in Result).
Indicates a misconfiguration the application cannot route around.
LLMContentFilterError
Section titled “LLMContentFilterError”Content blocked by provider safety filter — recoverable via reformulation.
Returned as Err from LLMClientProtocol.complete() / stream_chat().
The caller should reformulate the prompt or inform the user.
LLMError
Section titled “LLMError”Base exception for all LLM-domain errors in lexigram-ai-llm.
LLMModelNotFoundError
Section titled “LLMModelNotFoundError”Model unavailable or not found — recoverable via fallback routing.
Returned as Err from LLMClientProtocol.complete() / stream_chat().
The caller should route to a different model or provider.
LLMQuotaExceededError
Section titled “LLMQuotaExceededError”API quota or billing limit exceeded — recoverable by routing elsewhere.
Returned as Err from LLMClientProtocol.complete() / stream_chat().
The caller should route the request to a different provider or account.
LLMRateLimitError
Section titled “LLMRateLimitError”Rate limit exceeded — recoverable via backoff/retry.
Returned as Err from LLMClientProtocol.complete() / stream_chat().
The caller should implement exponential backoff or route to another provider.
ModelNotFoundError
Section titled “ModelNotFoundError”Model unavailable or not found — recoverable via fallback routing.
ParseError
Section titled “ParseError”Raised when response cannot be parsed.
ProviderConnectionError
Section titled “ProviderConnectionError”Error raised when connection to an LLM provider fails.
SchemaValidationError
Section titled “SchemaValidationError”Raised when parsed response fails validation.
StreamError
Section titled “StreamError”Error raised during LLM response streaming.
StructuredOutputError
Section titled “StructuredOutputError”Base exception for structured output errors.
TokenLimitError
Section titled “TokenLimitError”Error raised when the token limit for a request is exceeded.