API Reference
Protocols
Section titled “Protocols”LLMCacheProtocol
Section titled “LLMCacheProtocol”Protocol for LLM cache implementations.
Get value from cache.
| Parameter | Type | Description |
|---|---|---|
| `key` | str | dict[str, Any] | Cache key (string or structured dict). |
| Type | Description |
|---|---|
| Any | None | Cached value, or ``None`` if not present. |
Set value in cache.
| Parameter | Type | Description |
|---|---|---|
| `key` | str | dict[str, Any] | Cache key (string or structured dict). |
| `value` | Any | Value to store. |
| `ttl` | float | None | Optional time-to-live in seconds. |
Delete entry from cache.
| Parameter | Type | Description |
|---|---|---|
| `key` | str | dict[str, Any] | Cache key to remove. |
| Type | Description |
|---|---|
| bool | ``True`` if the key existed and was removed, ``False`` otherwise. |
Clear all entries.
Return cache statistics.
| Type | Description |
|---|---|
| dict[str, Any] | Mapping of statistic name to value. |
Classes
Section titled “Classes”APIPricingSource
Section titled “APIPricingSource”Pricing source from HTTP API endpoint.
Fetches pricing data from a remote API. Useful for getting the latest pricing updates, but requires network connectivity.
Attributes: endpoint: API endpoint URL. timeout: Request timeout in seconds.
Example
source = APIPricingSource("https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json")pricing = await source.get_pricing("gpt-4")Initialize API pricing source.
| Parameter | Type | Description |
|---|---|---|
| `endpoint` | str | URL to fetch pricing from. |
| `timeout` | float | Request timeout in seconds (default: 10). |
Get pricing for a specific model.
| Parameter | Type | Description |
|---|---|---|
| `model` | str | Model identifier. |
| Type | Description |
|---|---|
| ModelPricing | None | ModelPricing if found, None otherwise. |
Get all pricing data.
| Type | Description |
|---|---|
| dict[str, ModelPricing] | All pricing data from API. |
Get source name.
Clear cached pricing data to force refresh.
AbstractPricingSource
Section titled “AbstractPricingSource”Abstract base class for pricing data sources.
All pricing sources must implement get_pricing() to return ModelPricing for a given model name, or None if not found.
Get pricing for a specific model.
| Parameter | Type | Description |
|---|---|---|
| `model` | str | Model identifier (e.g., "gpt-4-turbo"). |
| Type | Description |
|---|---|
| ModelPricing | None | ModelPricing if found, None otherwise. |
Get all available pricing data.
| Type | Description |
|---|---|
| dict[str, ModelPricing] | Dictionary mapping model names to pricing. |
Get the name of this pricing source.
| Type | Description |
|---|---|
| str | Human-readable source name. |
AnthropicClient
Section titled “AnthropicClient”Anthropic Claude LLM client implementation.
Conforms to: LLMClientProtocol protocol via structural typing.
Supports Claude 3 (Opus, Sonnet, Haiku) models with:
- Streaming responses
- Tool calling
- Vision capabilities
- Automatic retry and error handling
Example
from lexigram.ai import ClientConfigconfig = ClientConfig(provider="anthropic", model="claude-3-sonnet-20240229")client = AnthropicClient(config)completion = await client.complete([ChatMessage(role="user", content="Hello!")])Initialize Anthropic client.
| Parameter | Type | Description |
|---|---|---|
| `config` | ClientConfig | LLM configuration |
| Exception | Description |
|---|---|
| ImportError | If anthropic package is not installed |
Close the Anthropic client.
Perform health check.
| Type | Description |
|---|---|
| HealthCheckResult | Structured health check result. |
CSVOutputParser
Section titled “CSVOutputParser”Parse LLM responses into lists of dictionaries (CSV format).
Extracts JSON array from the response and converts to list of dicts, where each dict represents a CSV row with column names as keys.
Example
parser = CSVOutputParser()result = parser.parse('[{"name": "John", "age": 30}, {"name": "Jane", "age": 25}]')assert len(result) == 2assert result[0]["name"] == "John"Parse text into a list of dictionaries.
| Parameter | Type | Description |
|---|---|---|
| `text` | str | Raw LLM response text that may contain JSON array. |
| Type | Description |
|---|---|
| list[dict[str, Any]] | List of dictionaries, each representing a CSV row. |
| Exception | Description |
|---|---|
| ParseError | When JSON cannot be extracted or is not an array. |
Parse raw CSV text (not JSON) into list of dictionaries.
| Parameter | Type | Description |
|---|---|---|
| `text` | str | Raw CSV text with header row. |
| Type | Description |
|---|---|
| list[dict[str, Any]] | List of dictionaries, each representing a CSV row. |
| Exception | Description |
|---|---|
| ParseError | When CSV cannot be parsed. |
Return format instructions for the LLM.
| Type | Description |
|---|---|
| str | Format instruction string telling the model to output a valid JSON array of objects. |
CacheEntry
Section titled “CacheEntry”Cache entry with metadata.
Attributes: key: Cache key. value: Cached value. created_at: When entry was created. expires_at: When entry expires (Unix timestamp). hits: Number of cache hits. size_bytes: Approximate size in bytes.
CacheStats
Section titled “CacheStats”Cache statistics.
Attributes: hits: Number of cache hits. misses: Number of cache misses. evictions: Number of evictions. total_entries: Current number of entries. total_size_bytes: Total cache size in bytes.
Calculate cache hit rate.
CharEstimateCounter
Section titled “CharEstimateCounter”Character-based token count estimator (~4 chars per token).
Always available without any optional dependencies. Suitable as a safe fallback counter.
| Parameter | Type | Description |
|---|---|---|
| `model` | Model name (used for identification only). |
Initialize CharEstimateCounter.
| Parameter | Type | Description |
|---|---|---|
| `model` | str | Model name for identification. |
The model this counter is calibrated for.
Count tokens using character estimation.
Count tokens in a list of chat messages.
ChatMessage
Section titled “ChatMessage”A single chat message.
Implements ChatMessageProtocol with DomainModel semantics for validation.
Example
msg = ChatMessage(role="user", content="Hello, how are you?")ClientConfig
Section titled “ClientConfig”Configuration for LLM clients.
Example
config = ClientConfig(provider="openai",model="gpt-4-turbo",api_key="sk-...",temperature=0.7,max_tokens=2000,)CohereClient
Section titled “CohereClient”Client for Cohere's enterprise NLP API.
Conforms to: LLMClientProtocol protocol via structural typing.
Supports Chat, Embeddings, and Reranking with:
- RAG-optimized models (Command R/R+)
- High-performance embeddings
- Native reranking support
Initialize Cohere client.
| Parameter | Type | Description |
|---|---|---|
| `config` | ClientConfig | LLM configuration |
Get API key from config.
Get base URL from config.
Generate embeddings.
| Parameter | Type | Description |
|---|---|---|
| `texts` | list[str] | str | Text or list of texts to embed. |
| `model` | str | Model ID (default: "embed-english-v3.0"). |
| `input_type` | str | Type of input ("search_document", "search_query", "classification", "clustering"). **kwargs: Additional parameters. |
| Type | Description |
|---|---|
| list[list[float]] | List of embedding vectors. |
Example
# Embed documentsdoc_embeddings = await client.embed(texts=["Doc 1", "Doc 2"],input_type="search_document")
# Embed queryquery_embedding = await client.embed(texts="What is AI?",input_type="search_query")Rerank documents for a query.
| Parameter | Type | Description |
|---|---|---|
| `query` | str | Search query. |
| `documents` | list[str] | list[dict[str, str]] | List of documents (strings or dicts with 'text' key). |
| `model` | str | Reranking model (default: "rerank-english-v3.0"). |
| `top_n` | int | None | Return top N results (default: all). **kwargs: Additional parameters. |
| Type | Description |
|---|---|
| list[dict[str, Any]] | List of ranked documents with scores. |
Example
results = await client.rerank(query="What is machine learning?",documents=["ML is a subset of AI...","Unrelated document...","Deep learning uses neural networks..."],top_n=2)for result in results:print(f"Score: {result['relevance_score']:.3f} - {result['document']['text']}")Perform a lightweight health check against the provider.
Close the HTTP client.
Completion
Section titled “Completion”LLM completion response.
Implements completion semantics with DomainModel for validation and additional fields.
Example
completion = Completion(content="Hello! I'm doing well, thank you.",model="gpt-4-turbo",usage=TokenUsage(prompt_tokens=10, completion_tokens=8, total_tokens=18))ConversationConfig
Section titled “ConversationConfig”Configuration for conversation management.
Example
config = ConversationConfig(max_tokens=4096,reserve_tokens=1000,trim_strategy="oldest")ConversationManager
Section titled “ConversationManager”Manage multi-turn conversations with automatic context window management.
This class handles:
- Message history management
- Automatic token counting
- Context window trimming
- System prompt handling
- Conversation statistics
Example
from lexigram.ai.llm import OpenAIClient, ConversationManager
client = OpenAIClient(api_key="sk-...", model="gpt-4")manager = ConversationManager(client=client,system_prompt="You are a helpful assistant.",max_tokens=4096)
# Add user message and get responseresponse = await manager.chat("What is Python?")print(response.content)
# Continue conversationresponse = await manager.chat("Tell me more about it")print(response.content)
# Get conversation historyhistory = manager.get_history()stats = manager.get_stats()print(f"Total messages: {stats.total_messages}")print(f"Total tokens: {stats.total_tokens}")Initialize conversation manager.
| Parameter | Type | Description |
|---|---|---|
| `client` | AbstractLLMClient | LLM client for completions |
| `system_prompt` | str | None | Optional system prompt (prepended to all conversations) |
| `max_tokens` | int | Maximum context window size |
| `reserve_tokens` | int | Tokens to reserve for completion |
| `trim_strategy` | str | Message trimming strategy ('oldest', 'middle', 'summary') |
| `metadata` | Metadata | None | Additional metadata for the conversation |
| `token_counter` | TokenCounterProtocol | None | Optional TokenCounterProtocol implementation. If not provided, uses CharEstimateCounter. |
Send a message and get a response.
| Parameter | Type | Description |
|---|---|---|
| `message` | str | Message content |
| `role` | Role | Message role (default: USER) **completion_kwargs: Additional kwargs for completion |
| Type | Description |
|---|---|
| Completion | Completion response from LLM |
Example
response = await manager.chat("Hello!")print(response.content)Add a message to conversation history without getting a response.
| Parameter | Type | Description |
|---|---|---|
| `role` | Role | Message role |
| `content` | str | Message content |
| `update_stats` | bool | Whether to update statistics |
Example
await manager.add_message(Role.USER, "Hello")await manager.add_message(Role.ASSISTANT, "Hi there!")Get conversation history.
| Parameter | Type | Description |
|---|---|---|
| `include_system` | bool | Include system message in history |
| `limit` | int | None | Maximum number of messages to return (most recent) |
| Type | Description |
|---|---|
| list[ChatMessage] | List of chat messages |
Example
history = manager.get_history(limit=10)for msg in history:print(f"{msg.role}: {msg.content}")Get conversation statistics.
| Type | Description |
|---|---|
| ConversationStats | Conversation statistics |
Example
stats = manager.get_stats()print(f"Total tokens: {stats.total_tokens}")Clear conversation history.
| Parameter | Type | Description |
|---|---|---|
| `keep_system` | bool | Keep system message when clearing |
Example
manager.clear_history()Update the system prompt.
| Parameter | Type | Description |
|---|---|---|
| `system_prompt` | str | New system prompt |
Example
manager.update_system_prompt("You are a Python expert.")Get current total token count.
| Type | Description |
|---|---|
| int | Total tokens in conversation |
Example
tokens = manager.get_token_count()print(f"Current tokens: {tokens}")Get available tokens for completion.
| Type | Description |
|---|---|
| int | Available tokens (max_tokens - current_tokens - reserve_tokens) Can be negative if context window is exceeded |
Example
available = manager.get_available_tokens()print(f"Available for completion: {available}")Export conversation history to dictionary.
| Type | Description |
|---|---|
| dict[str, Any] | Dictionary with conversation data (JSON-serializable) |
Example
data = manager.export_history()from lexigram import serialization as jsonwith open("conversation.json", "w") as f:json.dump(data, f)Create conversation manager from exported history.
| Parameter | Type | Description |
|---|---|---|
| `client` | AbstractLLMClient | LLM client |
| `history_data` | dict[str, Any] | Exported history data |
| Type | Description |
|---|---|
| ConversationManager | ConversationManager instance |
Example
from lexigram import serialization as jsonwith open("conversation.json") as f:data = json.load(f)manager = ConversationManager.from_history(client, data)ConversationStats
Section titled “ConversationStats”Statistics for a conversation.
Example
stats = ConversationStats(total_messages=10,total_tokens=2048,user_messages=5,assistant_messages=5)CostEstimate
Section titled “CostEstimate”Cost estimation result.
Attributes: prompt_cost: Cost for prompt tokens. completion_cost: Cost for completion tokens. total_cost: Total estimated cost. currency: Currency code (default: USD). model: Model name. rate_per_1k_prompt: Rate per 1000 prompt tokens. rate_per_1k_completion: Rate per 1000 completion tokens.
EnumOutputParser
Section titled “EnumOutputParser”Parse LLM responses into Enum members.
Extracts JSON from the response and maps it to an Enum member. Supports both string values and integer values.
Example
from enum import Enum
class Status(Enum):ACTIVE = "active"INACTIVE = "inactive"
parser = EnumOutputParser(Status)result = parser.parse('"active"')assert result == Status.ACTIVEInitialize with an Enum class.
| Parameter | Type | Description |
|---|---|---|
| `enum` | type[Enum] | Enum subclass to parse into. |
Parse text into an Enum member.
| Parameter | Type | Description |
|---|---|---|
| `text` | str | Raw LLM response text that may contain JSON with enum value. |
| Type | Description |
|---|---|
| Enum | Corresponding Enum member. |
| Exception | Description |
|---|---|
| ParseError | When JSON cannot be extracted or enum value is invalid. |
Return format instructions for the LLM.
| Type | Description |
|---|---|
| str | Format instruction string telling the model to output a valid enum value. |
FormatFixingParser
Section titled “FormatFixingParser”Parser that retries with LLM-assisted fixing on parse failure.
Wraps a base parser and, on parse failure, calls the LLM with a fixing prompt that includes the original output and the parse error. Retries are bounded by the retry_budget.
Example
parser = FormatFixingParser(base_parser=JSONOutputParser(),llm_client=llm_client,retry_budget=3)result = parser.parse('not valid json')Initialize the format fixing parser.
| Parameter | Type | Description |
|---|---|---|
| `base_parser` | Any | The underlying parser to use for parsing. |
| `llm_client` | Any | LLM client to use for fixing attempts. |
| `retry_budget` | int | Maximum number of fix attempts (default 3). |
| `guard_check` | Callable[[str], bool] | None | Optional guard function to validate malformed input before sending to LLM. Should return True if safe. |
Parse text, attempting fixes on failure.
| Parameter | Type | Description |
|---|---|---|
| `text` | str | Raw LLM response text to parse. |
| Type | Description |
|---|---|
| Any | Parsed output from the base parser. |
| Exception | Description |
|---|---|
| ParseError | When all fix attempts fail or guard check fails. |
Return format instructions from the base parser.
| Type | Description |
|---|---|
| str | Format instructions from the wrapped parser. |
FunctionCall
Section titled “FunctionCall”Function call request from LLM.
GenerationDefaults
Section titled “GenerationDefaults”Default generation parameters applied to every routing attempt.
Example
defaults = GenerationDefaults(temperature=0.3, max_tokens=2048)GroqClient
Section titled “GroqClient”Client for Groq's ultra-fast LLM inference API.
Conforms to: LLMClientProtocol protocol via structural typing.
Supports Chat, Stream, and Vision with:
- Ultra-fast LPU hardware synergy
- OpenAI-compatible API surface
- Blazing-fast token generation
Initialize Groq client.
| Parameter | Type | Description |
|---|---|---|
| `config` | ClientConfig | LLM configuration |
Get API key from config.
Get base URL from config.
Run a lightweight provider health probe.
List models available from the Groq API.
Close the HTTP client.
Example
await client.close()HuggingFaceCounter
Section titled “HuggingFaceCounter”Token counter using HuggingFace AutoTokenizer (lazy-loaded).
When constructed without a model, uses character estimation (~4 chars/token). When constructed with a model name, lazy-loads that model’s tokenizer on first use.
| Parameter | Type | Description |
|---|---|---|
| `model` | Optional HuggingFace model name. If None, uses char estimation fallback. |
Initialize HuggingFaceCounter.
| Parameter | Type | Description |
|---|---|---|
| `model` | str | None | Optional HuggingFace model name for tokenizer loading. |
Backend identifier.
Count tokens in a text string.
Count tokens in a list of chat messages.
ImageBase64Part
Section titled “ImageBase64Part”An image pre-encoded as base64 in a multimodal message.
Attributes:
data: Raw base64-encoded bytes (no data: prefix).
media_type: MIME type, e.g. "image/jpeg".
type: Discriminator field, always "image_base64".
ImageUrlPart
Section titled “ImageUrlPart”An image specified by URL in a multimodal message.
The framework passes the URL through to providers that support it natively (OpenAI, Anthropic, Gemini). For providers that require base64 (Ollama, Bedrock), the client fetches and converts.
Attributes:
url: Public or data-URI URL of the image.
detail: OpenAI vision detail level ("auto", "low", "high").
type: Discriminator field, always "image_url".
InstructorExtractor
Section titled “InstructorExtractor”Structured extraction from LLM completions using instructor library.
Extracts typed Pydantic models from LLM responses by:
- Building a ChatMessage list with extraction instructions
- Calling llm_client.complete() to get a Completion
- Parsing the completion text as JSON
- Validating against the response_model
- Retrying on validation/parse failures up to max_retries
Unlike direct instructor usage, this implementation uses the standard
LLMClientProtocol.complete() method, avoiding coupling to provider-specific
client patching mechanisms.
Example
from pydantic import BaseModel
class UserInfo(BaseModel): name: str age: int
extractor = InstructorExtractor(llm_client)result = await extractor.extract( prompt="Extract user info from: 'John is 30 years old'", response_model=UserInfo,)if result.is_ok(): user = result.unwrap() print(user.name, user.age)else: error = result.unwrap_err() # handle ExtractionErrorfrom pydantic import BaseModel
class UserInfo(BaseModel): name: str age: int
extractor = InstructorExtractor(llm_client)result = await extractor.extract( prompt="Extract user info from: 'John is 30 years old'", response_model=UserInfo,)if result.is_ok(): user = result.unwrap() print(user.name, user.age)else: error = result.unwrap_err() # handle ExtractionErrorInitialize InstructorExtractor.
| Parameter | Type | Description |
|---|---|---|
| `llm_client` | LLMClientProtocol | LLMClientProtocol instance for making LLM calls. |
| `mode` | str | Instructor patching mode (reserved for future provider-level integration; currently unused). |
| `max_retries` | int | Maximum number of retries on validation/parse failure. |
Extract a structured response_model instance from an LLM call.
| Parameter | Type | Description |
|---|---|---|
| `prompt` | str | User prompt for extraction. |
| `response_model` | type[T] | Pydantic BaseModel class to extract and validate. |
| `context` | list | None | Optional list of additional ChatMessage objects for context. **kwargs: Additional parameters passed to llm_client.complete(). |
| Type | Description |
|---|---|
| Result[T, ExtractionError] | ``Ok(instance)`` on successful extraction and validation. ``Err(ExtractionError)`` on parse, validation, or max retries failure. |
| Exception | Description |
|---|
JSONExtractor
Section titled “JSONExtractor”Extract and parse JSON from LLM responses.
JSONFilePricingSource
Section titled “JSONFilePricingSource”Pricing source from local JSON file.
This is the fastest and most reliable source as it doesn’t require network calls and works offline.
Attributes: file_path: Path to JSON pricing file. cache: In-memory cache of loaded pricing.
Example
source = JSONFilePricingSource(Path("custom_pricing.json"))pricing = await source.get_pricing("gpt-4-turbo")Initialize JSON file pricing source.
| Parameter | Type | Description |
|---|---|---|
| `file_path` | Path | Path to JSON file containing pricing data. |
Get pricing for a specific model.
| Parameter | Type | Description |
|---|---|---|
| `model` | str | Model identifier. |
| Type | Description |
|---|---|
| ModelPricing | None | ModelPricing if found, None otherwise. |
Get all pricing data.
| Type | Description |
|---|---|
| dict[str, ModelPricing] | All pricing data from JSON file. |
Get source name.
Clear cached pricing data to force reload.
JSONOutputParser
Section titled “JSONOutputParser”Parse LLM responses into JSON dicts.
Handles common LLM output patterns like markdown code fences, prose before/after JSON, and malformed JSON.
Example
parser = JSONOutputParser()result = parser.parse('{"key": "value"}')assert result == {"key": "value"}Parse text into a JSON dict.
| Parameter | Type | Description |
|---|---|---|
| `text` | str | Raw LLM response text that may contain JSON. |
| Type | Description |
|---|---|
| dict[str, Any] | Parsed JSON as a dict. |
| Exception | Description |
|---|---|
| ParseError | When JSON cannot be extracted or parsed. |
Return format instructions for the LLM.
| Type | Description |
|---|---|
| str | Format instruction string telling the model to output valid JSON. |
LLMCache
Section titled “LLMCache”In-memory cache for LLM responses with TTL.
Implements LRU eviction when max_size is reached.
| Parameter | Type | Description |
|---|---|---|
| `ttl` | Time-to-live in seconds (default: 1 hour). | |
| `max_size` | Maximum number of entries (default: 1000). | |
| `max_size_bytes` | Maximum cache size in bytes (default: 100MB). |
Example
cache = LLMCache(ttl=3600, max_size=500)result = await cache.get("key")await cache.set("key", "value")Initialize LLM cache.
| Parameter | Type | Description |
|---|---|---|
| `ttl` | float | Time-to-live in seconds. |
| `max_size` | int | Maximum number of entries. |
| `max_size_bytes` | int | Maximum total size in bytes. |
Get value from cache.
| Parameter | Type | Description |
|---|---|---|
| `key` | str | dict[str, Any] | Cache key (string or dict). |
| Type | Description |
|---|---|
| Any | None | Cached value or None if not found/expired. |
Set value in cache.
| Parameter | Type | Description |
|---|---|---|
| `key` | str | dict[str, Any] | Cache key (string or dict). |
| `value` | Any | Value to cache. |
| `ttl` | float | None | Optional TTL override. |
Get from cache or compute and cache result.
| Parameter | Type | Description |
|---|---|---|
| `key` | str | dict[str, Any] | Cache key. |
| `compute_fn` | Callable[[], Any] | Function to compute value if cache miss. |
| `ttl` | float | None | Optional TTL override. |
| Type | Description |
|---|---|
| Any | Cached or computed value. |
Example
result = await cache.get_or_compute(key="greeting",compute_fn=lambda: llm.complete("Say hello"))Delete entry from cache.
| Parameter | Type | Description |
|---|---|---|
| `key` | str | dict[str, Any] | Cache key to delete. |
| Type | Description |
|---|---|
| bool | True if entry was deleted. |
Clear all cache entries.
Get cache statistics.
| Type | Description |
|---|---|
| CacheStats | CacheStats object. |
LLMCompletionEvent
Section titled “LLMCompletionEvent”Emitted when an LLM completion is received.
Distinct from LLMCallStartedHook (which intercepts); this is the immutable record that a completion happened.
Consumed by: cost accounting, audit, safety review.
LLMConfig
Section titled “LLMConfig”Root configuration object for the LLM routing system.
All providers are opt-in: a provider joins the cascade only when its
credential environment variable is set. Use from_env to build
from LEX_AI_LLM__ environment variables.
Example
config = LLMConfig( providers=[ ProviderConfig(name="groq", model="llama-3.3-70b-versatile", api_key="gsk_..."), ProviderConfig(name="gemini", model="gemini-2.5-flash", api_key="AIza..."), ], defaults=GenerationDefaults(temperature=0.3),)config = LLMConfig( providers=[ ProviderConfig(name="groq", model="llama-3.3-70b-versatile", api_key="gsk_..."), ProviderConfig(name="gemini", model="gemini-2.5-flash", api_key="AIza..."), ], defaults=GenerationDefaults(temperature=0.3),)Environment variables (prefix LEX_AI_LLM__)
Global:
LEX_AI_LLM__STRATEGY sequential | parallel_race | cost_optimized | latency_optimized LEX_AI_LLM__DEFAULTS__TEMPERATURE float (default 0.2) LEX_AI_LLM__DEFAULTS__MAX_TOKENS int (default: provider default) LEX_AI_LLM__QUOTA__BACKEND memory | database (default memory) LEX_AI_LLM__LOG__BACKEND memory | database (default memory) LEX_AI_LLM__LOG__MAX_ENTRIES int (default 1000)
Per-provider (pattern: LEX_AI_LLM__PROVIDERS__{NAME}__{FIELD}):
__{NAME}__API_KEY str API key -- activates key-auth providers __{NAME}__BASE_URL str Endpoint -- activates local/custom providers __{NAME}__MODEL str Model override (has per-provider defaults) __{NAME}__TIMEOUT int Request timeout in seconds (default 30) __{NAME}__ENABLED bool Explicit enable/disable (default true)
Supported provider names and their activation:
OPENAI API_KEY required default model: gpt-4o ANTHROPIC API_KEY required default model: claude-3-5-sonnet-20241022 GROQ API_KEY required default model: llama-3.3-70b-versatile GEMINI API_KEY required default model: gemini-2.5-flash MISTRAL API_KEY required default model: mistral-large-latest COHERE API_KEY required default model: command-r-plus OPENROUTER API_KEY required default model: openai/gpt-4o-mini DEEPSEEK API_KEY required default model: deepseek-chat TOGETHER API_KEY required default model: meta-llama/Llama-3-8b-chat-hf FIREWORKS API_KEY required default model: accounts/fireworks/models/llama-v3-70b-instruct OLLAMA BASE_URL required default model: llama3.2 (default base: http://localhost:11434) OPENAI_COMPATIBLE BASE_URL + MODEL required (generic OpenAI-compatible: LM Studio, VLLM, etc.)
Azure-specific extras (activated by AZURE__API_KEY + AZURE__BASE_URL):
LEX_AI_LLM__PROVIDERS__AZURE__EXTRAS__AZURE_RESOURCE LEX_AI_LLM__PROVIDERS__AZURE__EXTRAS__AZURE_DEPLOYMENT LEX_AI_LLM__PROVIDERS__AZURE__EXTRAS__AZURE_API_VERSION
Cloudflare-specific extras (activated by CLOUDFLARE__EXTRAS__CF_ACCOUNT_ID):
LEX_AI_LLM__PROVIDERS__CLOUDFLARE__EXTRAS__CF_ACCOUNT_ID <- activates LEX_AI_LLM__PROVIDERS__CLOUDFLARE__EXTRAS__CF_API_TOKEN LEX_AI_LLM__PROVIDERS__CLOUDFLARE__MODEL
AWS Bedrock extras (activated by BEDROCK__EXTRAS__AWS_REGION):
LEX_AI_LLM__PROVIDERS__BEDROCK__EXTRAS__AWS_REGION <- activates LEX_AI_LLM__PROVIDERS__BEDROCK__MODEL LEX_AI_LLM__PROVIDERS__BEDROCK__EXTRAS__AWS_ACCESS_KEY_ID LEX_AI_LLM__PROVIDERS__BEDROCK__EXTRAS__AWS_SECRET_ACCESS_KEY
Google Vertex AI extras (activated by VERTEX__EXTRAS__VERTEX_PROJECT):
LEX_AI_LLM__PROVIDERS__VERTEX__EXTRAS__VERTEX_PROJECT <- activates LEX_AI_LLM__PROVIDERS__VERTEX__EXTRAS__VERTEX_LOCATION LEX_AI_LLM__PROVIDERS__VERTEX__MODEL LEX_AI_LLM__PROVIDERS__VERTEX__EXTRAS__VERTEX_CREDENTIALS_FILEGlobal:
LEX_AI_LLM__STRATEGY sequential | parallel_race | cost_optimized | latency_optimized LEX_AI_LLM__DEFAULTS__TEMPERATURE float (default 0.2) LEX_AI_LLM__DEFAULTS__MAX_TOKENS int (default: provider default) LEX_AI_LLM__QUOTA__BACKEND memory | database (default memory) LEX_AI_LLM__LOG__BACKEND memory | database (default memory) LEX_AI_LLM__LOG__MAX_ENTRIES int (default 1000)
Per-provider (pattern: LEX_AI_LLM__PROVIDERS__{NAME}__{FIELD}):
__{NAME}__API_KEY str API key -- activates key-auth providers __{NAME}__BASE_URL str Endpoint -- activates local/custom providers __{NAME}__MODEL str Model override (has per-provider defaults) __{NAME}__TIMEOUT int Request timeout in seconds (default 30) __{NAME}__ENABLED bool Explicit enable/disable (default true)
Supported provider names and their activation:
OPENAI API_KEY required default model: gpt-4o ANTHROPIC API_KEY required default model: claude-3-5-sonnet-20241022 GROQ API_KEY required default model: llama-3.3-70b-versatile GEMINI API_KEY required default model: gemini-2.5-flash MISTRAL API_KEY required default model: mistral-large-latest COHERE API_KEY required default model: command-r-plus OPENROUTER API_KEY required default model: openai/gpt-4o-mini DEEPSEEK API_KEY required default model: deepseek-chat TOGETHER API_KEY required default model: meta-llama/Llama-3-8b-chat-hf FIREWORKS API_KEY required default model: accounts/fireworks/models/llama-v3-70b-instruct OLLAMA BASE_URL required default model: llama3.2 (default base: http://localhost:11434) OPENAI_COMPATIBLE BASE_URL + MODEL required (generic OpenAI-compatible: LM Studio, VLLM, etc.)
Azure-specific extras (activated by AZURE__API_KEY + AZURE__BASE_URL):
LEX_AI_LLM__PROVIDERS__AZURE__EXTRAS__AZURE_RESOURCE LEX_AI_LLM__PROVIDERS__AZURE__EXTRAS__AZURE_DEPLOYMENT LEX_AI_LLM__PROVIDERS__AZURE__EXTRAS__AZURE_API_VERSION
Cloudflare-specific extras (activated by CLOUDFLARE__EXTRAS__CF_ACCOUNT_ID):
LEX_AI_LLM__PROVIDERS__CLOUDFLARE__EXTRAS__CF_ACCOUNT_ID <- activates LEX_AI_LLM__PROVIDERS__CLOUDFLARE__EXTRAS__CF_API_TOKEN LEX_AI_LLM__PROVIDERS__CLOUDFLARE__MODEL
AWS Bedrock extras (activated by BEDROCK__EXTRAS__AWS_REGION):
LEX_AI_LLM__PROVIDERS__BEDROCK__EXTRAS__AWS_REGION <- activates LEX_AI_LLM__PROVIDERS__BEDROCK__MODEL LEX_AI_LLM__PROVIDERS__BEDROCK__EXTRAS__AWS_ACCESS_KEY_ID LEX_AI_LLM__PROVIDERS__BEDROCK__EXTRAS__AWS_SECRET_ACCESS_KEY
Google Vertex AI extras (activated by VERTEX__EXTRAS__VERTEX_PROJECT):
LEX_AI_LLM__PROVIDERS__VERTEX__EXTRAS__VERTEX_PROJECT <- activates LEX_AI_LLM__PROVIDERS__VERTEX__EXTRAS__VERTEX_LOCATION LEX_AI_LLM__PROVIDERS__VERTEX__MODEL LEX_AI_LLM__PROVIDERS__VERTEX__EXTRAS__VERTEX_CREDENTIALS_FILEBuild a routing config from LEX_AI_LLM__ environment variables.
LLMModule
Section titled “LLMModule”LLM client and model-management integration.
Call configure to register an LLMClientProtocol implementation and optional model manager for injection.
Usage
from lexigram.ai.llm.config import ClientConfig
@module( imports=[ LLMModule.configure( ClientConfig(provider="openai", model="gpt-4o") ) ])class AppModule(Module): passfrom lexigram.ai.llm.config import ClientConfig
@module( imports=[ LLMModule.configure( ClientConfig(provider="openai", model="gpt-4o") ) ])class AppModule(Module): passMulti-provider routing
from lexigram.ai.llm import LLMModule
@module( imports=[LLMModule.configure(routing=LLMConfig())])class AppModule(Module): passfrom lexigram.ai.llm import LLMModule
@module( imports=[LLMModule.configure(routing=LLMConfig())])class AppModule(Module): passCreate an LLMModule with a single configured provider.
| Parameter | Type | Description |
|---|---|---|
| `config` | ClientConfig | Any | None | ClientConfig or ``None`` to read configuration from environment variables. |
| `routing` | LLMConfig | Any | None | Optional LLMConfig enabling the multi-provider routing layer instead of the single-provider client. |
| `enable_model_manager` | bool | Register LLMModelManager for local model lifecycle control. |
| `enable_streaming` | bool | Enable streaming response support. Defaults to ``True``; set to ``False`` to restrict to non-streaming clients only. |
| Type | Description |
|---|---|
| DynamicModule | A DynamicModule descriptor. |
Create an LLMModule suitable for unit and integration testing.
Uses a no-op or stub LLM client with minimal external dependencies. Streaming is disabled by default to simplify test assertions.
| Parameter | Type | Description |
|---|---|---|
| `config` | ClientConfig | Any | None | Optional ClientConfig override. Uses safe test defaults when ``None``. |
| Type | Description |
|---|---|
| DynamicModule | A DynamicModule descriptor. |
LLMProvider
Section titled “LLMProvider”Provider that registers LLM services with the Lexigram DI container.
Registers an LLMClientProtocol, optional LLM response cache, and an LLMModelManager so all three are injectable throughout the application.
Example
from lexigram.ai.llm.di.provider import LLMProviderfrom lexigram.ai.llm.config import ClientConfig
app.use(LLMProvider(ClientConfig(provider="openai", model="gpt-4o")))
# LLMClientProtocol is now injectable:class MyService:def __init__(self, llm: LLMClientProtocol) -> None:self.llm = llmInitialize the LLM Provider.
| Parameter | Type | Description |
|---|---|---|
| `config` | ClientConfig | None | LLM client configuration; defaults to ClientConfig() (reads env). |
| `enable_model_manager` | bool | Register LLMModelManager for local model control. |
| `enable_streaming` | bool | Enable streaming response support. |
| `name` | str | Provider name used for identification. |
| `cache_backend` | CacheBackendProtocol | None | Injected cache backend for optional response caching. |
Register LLM services with the DI container.
| Parameter | Type | Description |
|---|---|---|
| `container` | ContainerRegistrarProtocol | The Lexigram DI container registrar. |
Boot the LLM provider — validates API key presence and format.
| Parameter | Type | Description |
|---|---|---|
| `container` | ContainerResolverProtocol | The DI container resolver. |
Close client connections on application shutdown.
Return basic health information for the registered LLM client.
LLMProviderRegisteredHook
Section titled “LLMProviderRegisteredHook”Payload fired when an LLM provider is registered in the provider registry.
Attributes: provider: Identifier of the provider that was registered.
LLMRequestSentHook
Section titled “LLMRequestSentHook”Payload fired when an LLM request is dispatched to a provider.
Attributes:
provider: Provider identifier (e.g. "openai").
model: Model name targeted by the request (e.g. "gpt-4o").
LLMResponseReceivedHook
Section titled “LLMResponseReceivedHook”Payload fired when a complete LLM response is received from a provider.
Attributes: provider: Provider identifier that returned the response. model: Model name that produced the response.
LLMRoutingProvider
Section titled “LLMRoutingProvider”Provider that registers the multi-provider LLM router with the DI container.
Builds the LLMRouter from a LLMConfig, chooses the appropriate quota backend and inference logger, and registers everything as singletons.
Example
from lexigram.ai.llm.module import LLMModulefrom lexigram.ai.llm.routing import LLMConfig
app.use(LLMModule.configure(routing=LLMConfig.from_env()))
# LLMRouterProtocol is now injectable:class MyService:def __init__(self, router: LLMRouterProtocol) -> None:self.router = routerInitialise the LLM routing provider.
| Parameter | Type | Description |
|---|---|---|
| `config` | LLMConfig | None | Routing configuration; defaults to ``LLMConfig.from_env()``. |
| `database_provider` | DatabaseProviderProtocol | None | Injected DB provider used when ``quota.backend`` or ``logging.backend`` is ``database``. |
| `model_selector` | ModelSelector | None | Optional model selector for capability-based routing. When provided, ``required_capabilities`` in route kwargs will filter providers whose models lack the requested capabilities. |
Build and register the LLMRouter with the DI container.
| Parameter | Type | Description |
|---|---|---|
| `container` | ContainerRegistrarProtocol | The Lexigram DI container registrar. |
Boot phase — no-op for this provider.
| Parameter | Type | Description |
|---|---|---|
| `container` | ContainerResolverProtocol | The DI container resolver. |
Close all routing clients on application shutdown.
Return basic health information for the router.
| Parameter | Type | Description |
|---|---|---|
| `timeout` | float | Unused; retained for interface compatibility. |
| Type | Description |
|---|---|
| HealthCheckResult | A dict with ``status`` and ``providers`` keys. |
LogConfig
Section titled “LogConfig”Configuration for inference attempt logging.
Example
cfg = LogConfig(backend="database", max_entries=5000)MistralClient
Section titled “MistralClient”Client for Mistral AI's LLM API.
Conforms to: LLMClientProtocol protocol via structural typing.
Supports Chat, Stream, and Embeddings with:
- High-performance European LLMs
- GDPR compliance and data sovereignty
- Function calling and JSON mode
Initialize Mistral client.
| Parameter | Type | Description |
|---|---|---|
| `config` | ClientConfig | LLM configuration |
Get API key from config.
Get base URL from config.
Perform a lightweight health check against the Mistral API.
Calls the models endpoint to verify the API key is valid and the service is reachable.
| Parameter | Type | Description |
|---|---|---|
| `timeout` | float | Maximum seconds to wait for the response. |
| Type | Description |
|---|---|
| HealthCheckResult | HealthCheckResult. |
Generate embeddings.
| Parameter | Type | Description |
|---|---|---|
| `model` | str | Model ID (default: "mistral-embed"). |
| `input_texts` | list[str] | str | None | Text or list of texts to embed. **kwargs: Additional parameters. |
| Type | Description |
|---|---|
| list[list[float]] | List of embedding vectors. |
Example
embeddings = await client.embed(input_texts=["Hello world", "Bonjour monde"])print(f"Embedding dimension: {len(embeddings[0])}")Close the HTTP client.
Example
await client.close()MistralCounter
Section titled “MistralCounter”Token counter using mistral-common tokenizer (lazy-loaded).
Tokenizer is loaded on first use, not at construction time.
Initialize MistralCounter.
Backend identifier.
Count tokens in a text string.
Count tokens in a list of chat messages.
ModelCapabilities
Section titled “ModelCapabilities”Model capabilities and constraints.
ModelPricing
Section titled “ModelPricing”Pricing information for a specific LLM model.
Attributes: model: Model identifier (e.g., “gpt-4-turbo”, “claude-3-opus”). prompt_per_1m: Cost per 1 million prompt tokens in USD. completion_per_1m: Cost per 1 million completion tokens in USD. provider: Provider name (e.g., “openai”, “anthropic”). last_updated: When pricing was last updated. source: Where pricing data came from (e.g., “json”, “api”, “static”).
Example
pricing = ModelPricing(model="gpt-4-turbo",prompt_per_1m=10.00,completion_per_1m=30.00,provider="openai")print(f"${pricing.prompt_per_1m} per 1M prompt tokens")Custom serializer to handle datetime objects.
ModelSelector
Section titled “ModelSelector”Intelligent model selector with fallback support.
Automatically selects the best model based on prompt characteristics and provides fallback chains for reliability.
Example
selector = ModelSelector(default_model="gpt-3.5-turbo",strategies=[SelectionStrategy(name="complex",model="gpt-4-turbo",conditions={"min_tokens": 1000}),SelectionStrategy(name="simple",model="claude-3-haiku-20240307",conditions={"max_tokens": 500})],fallback_chain=["gpt-4-turbo", "gpt-3.5-turbo"])
# Select model for a promptmodel = selector.select("Long prompt here...")print(model)'gpt-4-turbo'>>>>>> # Get next fallback on error>>> fallback = selector.get_fallback("gpt-4-turbo")>>> print(fallback)'gpt-3.5-turbo'Initialize model selector.
| Parameter | Type | Description |
|---|---|---|
| `default_model` | str | None | Default model to use |
| `strategies` | list[SelectionStrategy] | None | List of selection strategies |
| `fallback_chain` | list[str] | None | Ordered list of fallback models |
| `model_capabilities` | dict[str, ModelCapabilities] | None | Custom model capabilities |
| `token_counter` | TokenCounterProtocol | None | Token counter for prompt analysis |
Example
selector = ModelSelector(default_model="gpt-3.5-turbo",fallback_chain=["gpt-4", "claude-3-sonnet-20240229"])Select the best model for the given prompt.
| Parameter | Type | Description |
|---|---|---|
| `prompt` | str | The prompt text |
| `context` | dict[str, Any] | None | Additional context for selection |
| `required_capabilities` | list[str] | None | Required capabilities (e.g., ["supports_functions"]) |
| Type | Description |
|---|---|
| str | Selected model name |
Example
model = selector.select("Analyze this image...",required_capabilities=["supports_vision"])print(model)'gpt-4-turbo'Get the next model in the fallback chain.
| Parameter | Type | Description |
|---|---|---|
| `failed_model` | str | The model that failed |
| Type | Description |
|---|---|
| str | None | Next fallback model, or None if no fallback available |
Example
fallback = selector.get_fallback("gpt-4-turbo")print(fallback)'gpt-3.5-turbo'Get capabilities for a model.
| Parameter | Type | Description |
|---|---|---|
| `model` | str | Model name |
| Type | Description |
|---|---|
| ModelCapabilities | None | Model capabilities or None if unknown |
Example
caps = selector.get_capabilities("gpt-4-turbo")print(caps.max_tokens)128000Estimate cost for a model call.
| Parameter | Type | Description |
|---|---|---|
| `model` | str | Model name |
| `input_tokens` | int | Number of input tokens |
| `output_tokens` | int | Number of output tokens |
| Type | Description |
|---|---|
| float | Estimated cost in USD |
Example
cost = selector.estimate_cost("gpt-4-turbo", 1000, 500)print(f"${cost:.4f}")$0.0250OllamaClient
Section titled “OllamaClient”Ollama LLM client for local models.
Conforms to: LLMClientProtocol protocol via structural typing.
Supports running LLMs locally with Ollama:
- Llama 3, Mistral, Phi, and other open models
- Streaming responses
- Zero API costs
- Full data privacy
Example
from lexigram.ai import ClientConfigconfig = ClientConfig(provider="ollama",model="llama3:8b",api_base="http://localhost:11434")client = OllamaClient(config)completion = await client.complete([ChatMessage(role="user", content="Hello!")])Initialize Ollama client.
| Parameter | Type | Description |
|---|---|---|
| `config` | ClientConfig | LLM configuration |
| Exception | Description |
|---|---|
| ImportError | If ollama package is not installed |
Perform a lightweight health check against the Ollama daemon.
Calls list() to verify the daemon is running and reachable.
| Parameter | Type | Description |
|---|---|---|
| `timeout` | float | Maximum seconds to wait for the response. |
| Type | Description |
|---|---|
| HealthCheckResult | HealthCheckResult. |
Close Ollama client.
OpenAIClient
Section titled “OpenAIClient”OpenAI LLM client implementation.
Conforms to: LLMClientProtocol protocol via structural typing.
Supports GPT-4, GPT-3.5-Turbo, and other OpenAI models with:
- Streaming responses
- Function/tool calling
- Vision models
- Automatic retry with exponential backoff
- Error handling and rate limit management
Example
from lexigram.ai import ClientConfigconfig = ClientConfig(provider="openai", model="gpt-4-turbo")client = OpenAIClient(config)completion = await client.complete([ChatMessage(role="user", content="Hello!")])Initialize OpenAI client.
| Parameter | Type | Description |
|---|---|---|
| `config` | ClientConfig | LLM configuration |
| Exception | Description |
|---|---|
| ImportError | If openai package is not installed |
Close the OpenAI client and cleanup resources.
Perform health check.
| Type | Description |
|---|---|
| HealthCheckResult | Structured health check result. |
OpenRouterClient
Section titled “OpenRouterClient”Client for OpenRouter (OpenAI-compatible) API.
Conforms to: LLMClientProtocol protocol via structural typing.
Initialize OpenRouter client.
| Parameter | Type | Description |
|---|---|---|
| `config` | ClientConfig | LLM configuration |
Get API key from config.
Get base URL from config.
Get default model from config.
Perform a lightweight health check against the OpenRouter API.
Calls the models listing endpoint to verify the API key is valid and the service is reachable.
| Parameter | Type | Description |
|---|---|---|
| `timeout` | float | Maximum seconds to wait for the response. |
| Type | Description |
|---|---|
| HealthCheckResult | HealthCheckResult. |
OutputFilter
Section titled “OutputFilter”Filter LLM output for sensitive information.
Prevents leaking of system prompts, internal data, etc.
Filter LLM output for leaks.
| Parameter | Type | Description |
|---|---|---|
| `output` | str | LLM output |
| `system_prompt` | str | System prompt (check if leaked) |
| Type | Description |
|---|---|
| str | Filtered output |
ParserRegistry
Section titled “ParserRegistry”Registry for managing output parsers by name.
Provides a central registry for looking up parsers by name, similar to LangChain’s parser registry.
Example
registry = ParserRegistry()registry.register("json", JSONOutputParser())parser = registry.get("json")assert parser is not NoneInitialize an empty registry.
Register a parser with a name.
| Parameter | Type | Description |
|---|---|---|
| `name` | str | Unique name for the parser. |
| `parser` | Any | Parser instance to register. |
Get a parser by name.
| Parameter | Type | Description |
|---|---|---|
| `name` | str | Name of the parser to retrieve. |
| Type | Description |
|---|---|
| Any | The registered parser. |
| Exception | Description |
|---|---|
| KeyError | If no parser is registered with that name. |
Get a parser by name, returning None if not found.
| Parameter | Type | Description |
|---|---|---|
| `name` | str | Name of the parser to retrieve. |
| Type | Description |
|---|---|
| Any | None | The registered parser, or None if not found. |
List all registered parser names.
| Type | Description |
|---|---|
| list[str] | List of registered parser names. |
Unregister a parser by name.
| Parameter | Type | Description |
|---|---|---|
| `name` | str | Name of the parser to unregister. |
| Exception | Description |
|---|---|
| KeyError | If no parser is registered with that name. |
Create a registry with default parsers pre-registered.
| Type | Description |
|---|---|
| ParserRegistry | A new ParserRegistry with default parsers. |
PricingManager
Section titled “PricingManager”Manages pricing data from multiple sources with caching.
Sources are queried in order until pricing is found. Typical hierarchy:
- JSON file (fastest, most reliable)
- API endpoints (for updates)
- Static fallback (hardcoded)
Attributes: sources: List of pricing sources in priority order. cache: Pricing cache instance. enable_fuzzy_match: Whether to enable fuzzy model name matching.
Example
# Use defaultsmanager = PricingManager.from_defaults()
# Custom configurationmanager = (PricingManager.builder().add_json_source("pricing.json").add_api_source("https://api.example.com/pricing").with_cache_ttl(3600).enable_fuzzy_matching().build())
pricing = await manager.get_pricing("gpt-4-turbo")Initialize pricing manager.
| Parameter | Type | Description |
|---|---|---|
| `sources` | Sequence[AbstractPricingSource] | List of pricing sources in priority order. |
| `cache_ttl` | int | Cache TTL in seconds (default: 24 hours). |
| `enable_fuzzy_match` | bool | Enable fuzzy model name matching (default: True). |
Get pricing for a specific model.
Queries sources in order:
- Cache (if not force_refresh)
- Each source in priority order
- Fuzzy match if enabled
- Default fallback
| Parameter | Type | Description |
|---|---|---|
| `model` | str | Model identifier (e.g., "gpt-4-turbo"). |
| `force_refresh` | bool | Bypass cache and fetch fresh data. |
| Type | Description |
|---|---|
| ModelPricing | ModelPricing for the model. |
| Exception | Description |
|---|---|
| ValueError | If model not found in any source. |
List all available models.
| Parameter | Type | Description |
|---|---|---|
| `provider` | str | None | Filter by provider (optional). |
| Type | Description |
|---|---|
| list[str] | List of model names. |
Clear pricing cache.
Create manager with default configuration.
Uses LiteLLM API for dynamic, up-to-date pricing data. No static pricing files - always fetches current data.
| Type | Description |
|---|---|
| PricingManager | PricingManager with API source. |
Example
manager = PricingManager.from_defaults()pricing = await manager.get_pricing("gpt-4")Create manager from JSON file only.
Useful for offline applications or when you want full control over pricing data.
| Parameter | Type | Description |
|---|---|---|
| `file_path` | str | Path | Path to JSON pricing file. |
| `cache_ttl` | int | Cache TTL in seconds (default: 24 hours). |
| Type | Description |
|---|---|
| PricingManager | PricingManager with JSON source only. |
Example
manager = PricingManager.from_json("my_pricing.json")pricing = await manager.get_pricing("custom-model")Create manager from API endpoint only.
| Parameter | Type | Description |
|---|---|---|
| `endpoint` | str | API endpoint URL. |
| `cache_ttl` | int | Cache TTL in seconds (default: 24 hours). |
| Type | Description |
|---|---|
| PricingManager | PricingManager with API source only. |
Example
manager = PricingManager.from_api("https://api.example.com/pricing")pricing = await manager.get_pricing("gpt-4")Create a builder for custom configuration.
| Type | Description |
|---|---|
| PricingManagerBuilder | PricingManagerBuilder instance. |
Example
manager = (PricingManager.builder().add_json_source("custom.json").add_api_source("https://api.example.com").with_cache_ttl(3600).build())PricingManagerBuilder
Section titled “PricingManagerBuilder”Builder for PricingManager with validation.
Provides a fluent API for configuring pricing sources safely.
Example
manager = (PricingManager.builder().add_json_source("pricing.json").add_api_source("https://api.example.com/pricing").add_fallback({"custom-model": ModelPricing(...)}).with_cache_ttl(3600).enable_fuzzy_matching().build())Initialize builder.
Add JSON file pricing source.
| Parameter | Type | Description |
|---|---|---|
| `file_path` | str | Path | Path to JSON file. |
| Type | Description |
|---|---|
| PricingManagerBuilder | Self for chaining. |
Add API endpoint pricing source.
| Parameter | Type | Description |
|---|---|---|
| `endpoint` | str | API endpoint URL. |
| `timeout` | float | Request timeout in seconds (default: 10). |
| Type | Description |
|---|---|
| PricingManagerBuilder | Self for chaining. |
Add static fallback pricing.
| Parameter | Type | Description |
|---|---|---|
| `pricing_map` | dict[str, ModelPricing] | Dictionary of model to pricing. |
| Type | Description |
|---|---|
| PricingManagerBuilder | Self for chaining. |
Add custom pricing source.
| Parameter | Type | Description |
|---|---|---|
| `source` | AbstractPricingSource | Custom AbstractPricingSource implementation. |
| Type | Description |
|---|---|
| PricingManagerBuilder | Self for chaining. |
Set cache TTL.
| Parameter | Type | Description |
|---|---|---|
| `seconds` | int | Cache TTL in seconds. |
| Type | Description |
|---|---|
| PricingManagerBuilder | Self for chaining. |
| Exception | Description |
|---|---|
| ValueError | If seconds is negative. |
Enable or disable fuzzy model name matching.
| Parameter | Type | Description |
|---|---|---|
| `enabled` | bool | Whether to enable fuzzy matching (default: True). |
| Type | Description |
|---|---|
| PricingManagerBuilder | Self for chaining. |
Build PricingManager instance.
| Type | Description |
|---|---|
| PricingManager | Configured PricingManager. |
| Exception | Description |
|---|---|
| ValueError | If no sources were added. |
ProviderConfig
Section titled “ProviderConfig”Configuration for a single provider in the routing cascade.
Every provider in the cascade has the same shape regardless of type.
Provider-specific fields (Azure deployment, Cloudflare account ID,
Bedrock region, Vertex project) go in extras.
Example
cfg = ProviderConfig(name="groq",model="llama-3.3-70b-versatile",api_key="gsk_...",)ProviderInfo
Section titled “ProviderInfo”Information about an LLM provider.
Attributes: name: Provider identifier (e.g., “openai”, “anthropic”). client_class: LLMClientProtocol implementation class. default_models: List of default/recommended models. supports_streaming: Whether streaming is supported. supports_tools: Whether function/tool calling is supported. supports_vision: Whether vision/image inputs are supported. base_url: Default base URL for API (optional). docs_url: Documentation URL (optional). pricing_url: Pricing page URL (optional). description: Human-readable description.
ProviderRegistry
Section titled “ProviderRegistry”Registry for LLM providers.
Singleton registry that maintains information about all available LLM providers, both built-in and custom.
Initialize provider registry.
Register a new LLM provider.
Get provider information.
List all registered provider names.
Search providers by capabilities.
Unregister a provider.
Register a provider following the ProviderRegistryProtocol.
Get an initialized client for a provider.
List all models matching capabilities.
Get information about a specific model.
PydanticOutputParser
Section titled “PydanticOutputParser”Parse LLM responses into Pydantic models.
Uses the existing structured parser’s validation logic to parse and validate against a Pydantic model.
Example
from pydantic import BaseModel
class User(BaseModel):name: strage: int
parser = PydanticOutputParser(User)result = parser.parse('{"name": "John", "age": 30}')assert result.name == "John"Initialize with a Pydantic model class.
| Parameter | Type | Description |
|---|---|---|
| `model` | type[BaseModel] | Pydantic BaseModel subclass to parse into. |
Parse text into a Pydantic model instance.
| Parameter | Type | Description |
|---|---|---|
| `text` | str | Raw LLM response text that may contain JSON. |
| Type | Description |
|---|---|
| BaseModel | Validated Pydantic model instance. |
| Exception | Description |
|---|---|
| ParseError | When JSON cannot be extracted. |
| SchemaValidationError | When validation fails. |
Return format instructions for the LLM.
| Type | Description |
|---|---|
| str | Format instruction string telling the model to output valid JSON that matches the Pydantic model schema. |
QuotaConfig
Section titled “QuotaConfig”Configuration for the quota tracking backend.
Example
cfg = QuotaConfig(backend="database")RateLimiter
Section titled “RateLimiter”Rate limiter for LLM requests (RPM and TPM).
Manages multiple buckets for different models and providers.
Initialize rate limiter.
Check if request is allowed under current limits.
| Parameter | Type | Description |
|---|---|---|
| `provider` | str | AI provider name |
| `model` | str | Model name |
| `tpm_limit` | int | None | Tokens Per Minute limit |
| `rpm_limit` | int | None | Requests Per Minute limit |
| `estimated_tokens` | int | Estimated tokens in request |
| Type | Description |
|---|---|
| bool | True if allowed, False if blocked |
RedisLLMCache
Section titled “RedisLLMCache”Redis-backed cache for distributed deployments.
Requires redis package to be installed.
| Parameter | Type | Description |
|---|---|---|
| `redis_url` | Redis connection URL. | |
| `ttl` | Time-to-live in seconds. | |
| `key_prefix` | Prefix for all cache keys. |
Example
cache = RedisLLMCache(redis_url="redis://localhost:6379")await cache.connect()result = await cache.get("key")Initialize Redis cache.
| Parameter | Type | Description |
|---|---|---|
| `cache_backend` | CacheBackendProtocol | The platform's cache backend. |
| `ttl` | float | Time-to-live in seconds. |
| `key_prefix` | str | Prefix for cache keys. |
Compatibility method for lifecycle-managed cache.
Compatibility method for lifecycle-managed cache.
Get value from Redis cache.
| Parameter | Type | Description |
|---|---|---|
| `key` | str | dict[str, Any] | Cache key. |
| Type | Description |
|---|---|
| Any | None | Cached value or None. |
Set value in Redis cache.
| Parameter | Type | Description |
|---|---|---|
| `key` | str | dict[str, Any] | Cache key. |
| `value` | Any | Value to cache. |
| `ttl` | float | None | Optional TTL override. |
Get from cache or compute and cache result.
| Parameter | Type | Description |
|---|---|---|
| `key` | str | dict[str, Any] | Cache key. |
| `compute_fn` | Callable[[], Any] | Function to compute value if cache miss. |
| `ttl` | float | None | Optional TTL override. |
| Type | Description |
|---|---|
| Any | Cached or computed value. |
Delete entry from cache.
| Parameter | Type | Description |
|---|---|---|
| `key` | str | dict[str, Any] | Cache key. |
| Type | Description |
|---|---|
| bool | True if deleted. |
Clear all cache entries (Warning: clears entire backend if not namespaced).
Get cache statistics.
| Type | Description |
|---|---|
| CacheStats | CacheStats object. |
ResponseFormatter
Section titled “ResponseFormatter”Format and convert LLM responses to various types.
Example
formatter = ResponseFormatter()completion = Completion(content="42", ...)num = formatter.to_int(completion)print(num)42Convert response to JSON.
| Parameter | Type | Description |
|---|---|---|
| `completion` | Completion | LLM completion |
| Type | Description |
|---|---|
| JSON | Parsed JSON |
Example
data = formatter.to_json(completion)Convert response to string.
| Parameter | Type | Description |
|---|---|---|
| `completion` | Completion | LLM completion |
| `strip` | bool | Whether to strip whitespace |
| Type | Description |
|---|---|
| str | Response string |
Example
text = formatter.to_string(completion)Convert response to integer.
| Parameter | Type | Description |
|---|---|---|
| `completion` | Completion | LLM completion |
| Type | Description |
|---|---|
| int | Parsed integer |
| Exception | Description |
|---|---|
| ParseError | If conversion fails |
Example
num = formatter.to_int(completion)Convert response to float.
| Parameter | Type | Description |
|---|---|---|
| `completion` | Completion | LLM completion |
| Type | Description |
|---|---|
| float | Parsed float |
| Exception | Description |
|---|---|
| ParseError | If conversion fails |
Example
num = formatter.to_float(completion)Convert response to boolean.
| Parameter | Type | Description |
|---|---|---|
| `completion` | Completion | LLM completion |
| Type | Description |
|---|---|
| bool | Parsed boolean |
Example
result = formatter.to_bool(completion)Convert response to list of strings.
| Parameter | Type | Description |
|---|---|---|
| `completion` | Completion | LLM completion |
| `separator` | str | String separator (default: newline) |
| Type | Description |
|---|---|
| list[str] | List of strings |
Example
items = formatter.to_list(completion)Concrete chat message role constants shared across AI packages.
RunnableBranch
Section titled “RunnableBranch”Route input to different runnables based on a predicate.
Like LangChain’s RunnableBranch, this evaluates predicates to select which branch to execute.
Synchronously route to matching branch.
| Parameter | Type | Description |
|---|---|---|
| `input` | Any | Input to evaluate against predicates. |
| Type | Description |
|---|---|
| Any | Output from the first matching branch, or default if no match. |
Asynchronously route to matching branch.
| Parameter | Type | Description |
|---|---|---|
| `input` | Any | Input to evaluate against predicates. |
| Type | Description |
|---|---|
| Any | Output from the first matching branch, or default if no match. |
RunnableLambda
Section titled “RunnableLambda”Wrap a function as a runnable.
Accepts sync or async functions and wraps them to satisfy RunnableProtocol. Failures become Err(RunnableError(…)).
| Parameter | Type | Description |
|---|---|---|
| `func` | A sync or async function to wrap. |
Synchronously invoke the wrapped function.
| Parameter | Type | Description |
|---|---|---|
| `input` | Any | Input to the function. |
| Type | Description |
|---|---|
| Any | Function output or Err on failure. |
Asynchronously invoke the wrapped function.
| Parameter | Type | Description |
|---|---|---|
| `input` | Any | Input to the function. |
| Type | Description |
|---|---|
| Any | Function output or Err on failure. |
RunnableMixin
Section titled “RunnableMixin”Mixin that adds pipe operator to runnables.
Provides the | operator that composes runnables into RunnableSequence.
Analogous to LangChain’s RunnableBinding.
Process input synchronously. Override in subclass.
Process input asynchronously. Override in subclass.
RunnableParallel
Section titled “RunnableParallel”Run multiple runnables concurrently.
Each runnable receives the same input and results are returned as a dict.
Synchronously invoke all runnables.
| Parameter | Type | Description |
|---|---|---|
| `input` | Any | Input to pass to all runnables. |
| Type | Description |
|---|---|
| dict[str, Any] | Dict mapping names to outputs. |
Asynchronously invoke all runnables concurrently.
| Parameter | Type | Description |
|---|---|---|
| `input` | Any | Input to pass to all runnables. |
| Type | Description |
|---|---|
| dict[str, Any] | Dict mapping names to outputs. |
RunnablePassthrough
Section titled “RunnablePassthrough”Pass input through with optional key assignment.
Returns the input unchanged but can assign it to a key in the output dict. Useful for combining with RunnableParallel.
Return input, optionally wrapped in dict with named key.
| Parameter | Type | Description |
|---|---|---|
| `input` | Any | Input to pass through. |
| Type | Description |
|---|---|
| Any | Input as-is, or dict with named key if name is set. |
Return input, optionally wrapped in dict with named key.
| Parameter | Type | Description |
|---|---|---|
| `input` | Any | Input to pass through. |
| Type | Description |
|---|---|
| Any | Input as-is, or dict with named key if name is set. |
RunnableSequence
Section titled “RunnableSequence”Chain multiple runnables in sequence.
The output of each runnable becomes the input to the next. Short-circuits on Err results.
Synchronously invoke the chain.
| Parameter | Type | Description |
|---|---|---|
| `input` | Any | Input to the first runnable. |
| Type | Description |
|---|---|
| Any | Output from the last runnable, or Err if any step fails. |
Asynchronously invoke the chain.
| Parameter | Type | Description |
|---|---|---|
| `input` | Any | Input to the first runnable. |
| Type | Description |
|---|---|
| Any | Output from the last runnable, or Err if any step fails. |
SecureLLMClient
Section titled “SecureLLMClient”LLM client with injection protection and safety features.
Initialize secure LLM client.
| Parameter | Type | Description |
|---|---|---|
| `llm_provider` | Annotated[LLMClientProtocol, Inject] | Underlying LLM provider (injected) |
| `system_prompt` | str | System prompt template |
| `enable_output_filtering` | bool | Enable output filtering |
Send chat message with safety protections.
| Parameter | Type | Description |
|---|---|---|
| `user_input` | str | User message |
| `user_id` | str | User identifier (for rate limiting) |
| `context` | Sequence[dict[str, str]] | None | Previous conversation context |
| `strict_validation` | bool | Reject invalid input vs sanitize |
| Type | Description |
|---|---|
| str | LLM response |
| Exception | Description |
|---|---|
| ValueError | If input invalid (strict mode) |
Update system prompt.
| Parameter | Type | Description |
|---|---|---|
| `system_prompt` | str | New system prompt |
SecurePromptTemplate
Section titled “SecurePromptTemplate”Structured prompt template with injection protection.
Uses clear delimiters to separate system instructions from user input. Implements multi-layered injection detection.
Multi-layered injection detection.
| Parameter | Type | Description |
|---|---|---|
| `prompt` | str | Input to analyze |
| Type | Description |
|---|---|
| tuple[bool, list[str]] | Tuple of (is_malicious, reasons) |
Validate user input for injection attempts.
| Parameter | Type | Description |
|---|---|---|
| `user_input` | str | User input to validate |
| Type | Description |
|---|---|
| tuple[bool, str | None] | Tuple of (is_valid, error_message) |
Sanitize user input by removing dangerous patterns.
| Parameter | Type | Description |
|---|---|---|
| `user_input` | str | User input to sanitize |
| Type | Description |
|---|---|
| str | Sanitized input |
Format prompt with user input.
| Parameter | Type | Description |
|---|---|---|
| `user_input` | str | User input |
| `strict` | bool | If True, reject invalid input. If False, sanitize. |
| Type | Description |
|---|---|
| str | Formatted prompt |
| Exception | Description |
|---|---|
| ValueError | If input invalid and strict=True |
SelectionCriteria
Section titled “SelectionCriteria”Criteria for model selection.
SelectionStrategy
Section titled “SelectionStrategy”Strategy for selecting models based on conditions.
Example
strategy = SelectionStrategy(name="long_context",model="gpt-4-turbo-preview",conditions={"min_tokens": 2000,"max_tokens": 100000})Check if this strategy matches the given context.
| Parameter | Type | Description |
|---|---|---|
| `context` | dict[str, Any] | Context dictionary with prompt info |
| Type | Description |
|---|---|
| bool | True if all conditions are met |
Example
context = {"token_count": 2500, "has_code": True}strategy.matches(context)TrueStaticPricingSource
Section titled “StaticPricingSource”Pricing source from static dictionary.
Hardcoded pricing data as a fallback when other sources are unavailable. Useful for custom internal models or as ultimate fallback.
Attributes: pricing_map: Dictionary of model name to pricing.
Example
source = StaticPricingSource({"my-model": ModelPricing(model="my-model",prompt_per_1m=5.0,completion_per_1m=10.0,provider="custom")})Initialize static pricing source.
| Parameter | Type | Description |
|---|---|---|
| `pricing_map` | dict[str, ModelPricing] | Dictionary mapping model names to pricing. |
Get pricing for a specific model.
| Parameter | Type | Description |
|---|---|---|
| `model` | str | Model identifier. |
| Type | Description |
|---|---|
| ModelPricing | None | ModelPricing if found, None otherwise. |
Get all pricing data.
| Type | Description |
|---|---|
| dict[str, ModelPricing] | All static pricing data. |
Get source name.
StreamChunk
Section titled “StreamChunk”A chunk of streamed completion.
Implements streaming semantics with DomainModel for validation.
Example
chunk = StreamChunk(delta="Hello", model="gpt-4-turbo", finish_reason=None)StructuredOutputParser
Section titled “StructuredOutputParser”Schema-aware parser that validates LLM responses against a model.
Wraps extract_json_block, validate_against_model, and build_json_schema into a convenient class-based API.
| Parameter | Type | Description |
|---|---|---|
| `output_model` | Model class for validation. | |
| `strict` | Whether to enforce strict validation (default ``True``). |
Initialise with model class.
Parse and validate a completion into an output_model instance.
| Parameter | Type | Description |
|---|---|---|
| `completion` | Any | Completion object with ``.content`` attribute, or a string. |
| Type | Description |
|---|---|
| Any | Validated model instance. |
| Exception | Description |
|---|---|
| ParseError | When JSON cannot be extracted. |
| SchemaValidationError | When validation fails. |
Parse and validate an array of output_model instances.
| Parameter | Type | Description |
|---|---|---|
| `completion` | Any | Completion object with ``.content`` attribute. |
| Type | Description |
|---|---|
| list[Any] | List of validated model instances. |
| Exception | Description |
|---|---|
| ParseError | When JSON is not an array. |
| SchemaValidationError | When validation fails. |
Return JSON Schema dict for the output model.
Return a human-readable schema prompt string.
TextPart
Section titled “TextPart”A plain-text content part in a multimodal message.
Attributes:
text: The text content.
type: Discriminator field, always "text".
TiktokenCounter
Section titled “TiktokenCounter”Token counter using tiktoken (OpenAI/compatible models).
Implements TokenCounterProtocol using tiktoken for precise counting. tiktoken is a required dependency for this counter.
| Parameter | Type | Description |
|---|---|---|
| `model` | Model name (e.g. 'gpt-4', 'gpt-3.5-turbo'). | |
| `encoding_name` | Optional tiktoken encoding name override. |
Initialize TiktokenCounter.
| Parameter | Type | Description |
|---|---|---|
| `model` | str | Model name for token counting. |
| `encoding_name` | str | None | Optional tiktoken encoding name override. |
| Exception | Description |
|---|---|
| ImportError | If tiktoken is not installed. |
The model this counter is calibrated for.
Count tokens in a text string.
Count tokens in a list of chat messages, including overhead.
TokenCount
Section titled “TokenCount”Token count result with metadata.
Attributes: total: Total number of tokens. prompt_tokens: Number of tokens in the prompt. completion_tokens: Number of tokens in the completion (if applicable). model: Model name used for counting. timestamp: When the count was performed.
TokenCounterRegistry
Section titled “TokenCounterRegistry”Registry mapping model-name patterns to TokenCounterProtocol backends.
Uses named backend keys and regex patterns for flexible model mapping.
Usage
registry = TokenCounterRegistry.with_defaults()counter = registry.for_model("gpt-4o")tokens = counter.count("Hello!")registry = TokenCounterRegistry.with_defaults()counter = registry.for_model("gpt-4o")tokens = counter.count("Hello!")Create an empty registry.
Create registry with all available tokenizer backends.
Registers:
- char_estimate (always available, fallback)
- tiktoken (if installed, for OpenAI/Anthropic models)
- huggingface (if installed, for HuggingFace models)
- mistral (if installed, for Mistral models)
| Type | Description |
|---|---|
| TokenCounterRegistry | TokenCounterRegistry pre-populated with default backends. |
Register a counter backend under a named key.
| Parameter | Type | Description |
|---|---|---|
| `key` | str | Backend name (e.g., 'tiktoken', 'huggingface', 'char_estimate'). |
| `counter` | TokenCounterProtocol | Counter implementing TokenCounterProtocol. |
Map a regex pattern of model names to a backend key.
| Parameter | Type | Description |
|---|---|---|
| `pattern` | str | Regex pattern matching model names (case-insensitive). |
| `counter_key` | str | Backend key (must be registered). |
Get the best counter for the given model name.
Tries exact regex match in _patterns first, falls back to ‘char_estimate’.
| Parameter | Type | Description |
|---|---|---|
| `model` | str | Model name. |
| Type | Description |
|---|---|
| TokenCounterProtocol | TokenCounterProtocol implementation. |
TokenUsage
Section titled “TokenUsage”Token usage statistics.
ToolCall
Section titled “ToolCall”Tool call request from LLM.
Functions
Section titled “Functions”complete_with_json
Section titled “complete_with_json”Complete and parse response as JSON.
| Parameter | Type | Description |
|---|---|---|
| `client` | LLMClientProtocol | LLM client |
| `prompt` | str | User prompt |
| `system_prompt` | str | None | Optional system prompt **kwargs: Additional completion arguments |
| Type | Description |
|---|---|
| JSON | Parsed JSON |
Example
data = await complete_with_json(client,"Generate a config with 3 fields")complete_with_schema
Section titled “complete_with_schema”Complete with automatic schema parsing and validation.
| Parameter | Type | Description |
|---|---|---|
| `client` | LLMClientProtocol | LLM client |
| `prompt` | str | User prompt |
| `schema` | type[T] | Pydantic model for validation |
| `system_prompt` | str | None | Optional system prompt **kwargs: Additional completion arguments |
| Type | Description |
|---|---|
| T | Validated schema instance |
Example
from lexigram.ai.llm import OpenAIClient
client = OpenAIClient(api_key="sk-...")person = await complete_with_schema(client,"Extract person from: John Doe, age 30",schema=Person)create_assistant_template
Section titled “create_assistant_template”Create template for general assistant.
| Type | Description |
|---|---|
| SecurePromptTemplate | Configured template |
create_balanced_selector
Section titled “create_balanced_selector”Create a balanced model selector.
create_cost_optimized_selector
Section titled “create_cost_optimized_selector”Create a cost-optimized model selector.
create_data_extraction_template
Section titled “create_data_extraction_template”Create template for data extraction (high security).
| Type | Description |
|---|---|
| SecurePromptTemplate | Configured template |
create_json_mode_messages
Section titled “create_json_mode_messages”Create messages for JSON mode with optional schema.
| Parameter | Type | Description |
|---|---|---|
| `prompt` | str | User prompt |
| `schema` | type[DomainModel] | None | Optional Pydantic model for schema |
| `system_prompt` | str | None | Optional system prompt (default: JSON instruction) |
| Type | Description |
|---|---|
| list[dict[str, str]] | Messages list for LLM |
Example
messages = create_json_mode_messages("Extract person info",schema=Person)create_quality_optimized_selector
Section titled “create_quality_optimized_selector”Create a quality-optimized model selector.
create_token_counter
Section titled “create_token_counter”Factory function for creating token counters.
| Parameter | Type | Description |
|---|---|---|
| `model` | str | Model name. |
| `encoding_name` | str | None | Optional encoding name override. |
| Type | Description |
|---|---|
| TiktokenCounter | TiktokenCounter instance. |
Example
from lexigram.ai.llm import create_token_counter
counter = create_token_counter("gpt-4")count = counter.count("Hello!")print(count)normalize_thinking_text
Section titled “normalize_thinking_text”Extract thinking text from raw LLM output.
Tries each pattern in THINKING_PATTERNS order. Returns (clean_content, thinking_text_or_None). clean_content has thinking block removed and is stripped. thinking_text is the raw thinking content (stripped), or None if not found.
Pattern matching is by substring presence of start_marker (and end_marker after it), NOT by model name. The bare-closing-tag pattern (end_marker="", no start) matches only when start_marker is NOT found but end_marker IS found — this covers models that output …thinking…\nresponse.
Falls back: after removing a thinking block, if clean_content is empty but thinking
text was found, tries to extract from the first { or [ in the original text to
recover any JSON that may have been embedded.
| Parameter | Type | Description |
|---|---|---|
| `text` | str | Raw LLM response text, possibly containing inline thinking tags. |
| Type | Description |
|---|---|
| tuple[str, str | None] | A tuple of (clean_content, thinking_text_or_None). - clean_content: The response text with thinking stripped out, stripped of whitespace. - thinking_text_or_None: The thinking/reasoning text, or None if no thinking found. |
Exceptions
Section titled “Exceptions”ExtractionError
Section titled “ExtractionError”Base class for structured extraction errors in lexigram-ai-llm.
ExtractionMaxRetriesError
Section titled “ExtractionMaxRetriesError”Error raised when extraction max retries are exhausted.
ExtractionParseError
Section titled “ExtractionParseError”Error raised when extraction response cannot be parsed as JSON.
ExtractionValidationError
Section titled “ExtractionValidationError”Error raised when parsed extraction response fails schema validation.
InvalidRequestError
Section titled “InvalidRequestError”Error raised when a request to an LLM provider is invalid.
LLMAuthenticationError
Section titled “LLMAuthenticationError”Invalid API key or credentials — infrastructure error, raised not wrapped.
Raised as an exception (NOT wrapped in Result).
Indicates a misconfiguration the application cannot route around.
LLMContentFilterError
Section titled “LLMContentFilterError”Content blocked by provider safety filter — recoverable via reformulation.
Returned as Err from LLMClientProtocol.complete() / stream_chat().
The caller should reformulate the prompt or inform the user.
LLMError
Section titled “LLMError”Base exception for all LLM-domain errors in lexigram-ai-llm.
LLMModelNotFoundError
Section titled “LLMModelNotFoundError”Model unavailable or not found — recoverable via fallback routing.
Returned as Err from LLMClientProtocol.complete() / stream_chat().
The caller should route to a different model or provider.
LLMQuotaExceededError
Section titled “LLMQuotaExceededError”API quota or billing limit exceeded — recoverable by routing elsewhere.
Returned as Err from LLMClientProtocol.complete() / stream_chat().
The caller should route the request to a different provider or account.
LLMRateLimitError
Section titled “LLMRateLimitError”Rate limit exceeded — recoverable via backoff/retry.
Returned as Err from LLMClientProtocol.complete() / stream_chat().
The caller should implement exponential backoff or route to another provider.
ModelNotFoundError
Section titled “ModelNotFoundError”Model unavailable or not found — recoverable via fallback routing.
ParseError
Section titled “ParseError”Raised when response cannot be parsed.
ProviderConnectionError
Section titled “ProviderConnectionError”Error raised when connection to an LLM provider fails.
SchemaValidationError
Section titled “SchemaValidationError”Raised when parsed response fails validation.
StreamError
Section titled “StreamError”Error raised during LLM response streaming.
StructuredOutputError
Section titled “StructuredOutputError”Base exception for structured output errors.
TokenLimitError
Section titled “TokenLimitError”Error raised when the token limit for a request is exceeded.