Skip to content
GitHubDiscord

AI Llm (lexigram-ai-llm)

LLM client layer for the Lexigram Framework — OpenAI, Anthropic, Ollama, Cohere, Groq, Mistral


LLM client layer for the Lexigram Framework. Provides typed, async-first clients for 18 providers, multi-provider routing, thinking/reasoning control, structured extraction, streaming, embeddings, and model management — all wired through the DI container via LLMModule. Zero-config usage starts with sensible defaults.

Terminal window
uv add lexigram-ai-llm
# Optional extras
uv add "lexigram-ai-llm[openai,anthropic,ollama]"
from lexigram import Application
from lexigram.di.module import Module, module
from lexigram.ai.llm import LLMModule
from lexigram.ai.llm.config import ClientConfig
@module(imports=[
LLMModule.configure(
ClientConfig(provider="anthropic", model="claude-sonnet-4-6")
)
])
class AppModule(Module):
pass
app = Application(modules=[AppModule])
if __name__ == "__main__":
app.run()

Zero-config usage: Call LLMModule.configure() with no arguments to use defaults.

application.yaml
ai_llm:
provider: "anthropic"
model: "claude-sonnet-4-6"
api_key: "${LEX_AI_LLM__API_KEY}"
temperature: 0.7
max_tokens: null
Section titled “Option 2 — Profiles + Environment Variables (recommended)”
Terminal window
export LEX_AI_LLM__PROVIDER=anthropic
# Environment variables for each field
from lexigram.ai.llm.config import ClientConfig
from lexigram.ai.llm import LLMModule
config = ClientConfig(
provider="anthropic",
model="claude-sonnet-4-6",
)
LLMModule.configure(config)
FieldDefaultEnv varDescription
enabledTrueLEX_AI_LLM__ENABLEDEnable the LLM subsystem
provideropenaiLEX_AI_LLM__PROVIDERLLM provider
modelgpt-4-turboLEX_AI_LLM__MODELModel name
api_keyNoneLEX_AI_LLM__API_KEYProvider API key
api_baseNoneLEX_AI_LLM__API_BASECustom endpoint (Azure, local, proxy)
temperature0.7LEX_AI_LLM__TEMPERATURESampling temperature (0.0–2.0)
max_tokensNoneLEX_AI_LLM__MAX_TOKENSResponse token limit
timeout60.0LEX_AI_LLM__TIMEOUTRequest timeout in seconds
enable_cacheFalseLEX_AI_LLM__ENABLE_CACHECache responses
cache_ttl3600LEX_AI_LLM__CACHE_TTLCache TTL in seconds
thinkingNoneReasoning/thinking control configuration
MethodDescription
LLMModule.configure(config)Single-provider client
LLMModule.configure(routing=LLMConfig())Multi-provider routing cascade
LLMModule.stub()No-op client for tests
  • 18 providers: OpenAI, Anthropic, Google Gemini, Azure, Ollama, Groq, Mistral, Cohere, and more
  • Multi-provider routing: Sequential, cost-optimized, and latency-optimized strategies
  • Thinking/reasoning control: Extended thinking with token budget and suppression
  • Structured extraction: JSON schema and Pydantic model extraction
  • Streaming: Async streaming response support
  • Embeddings: Text embedding client with same provider
  • Caching: Response-level caching with configurable TTL
async with Application.boot(modules=[LLMModule.stub()]) as app:
# your test code
...
FileWhat it contains
src/lexigram/ai/llm/module.pyLLMModule.configure() and LLMModule.stub()
src/lexigram/ai/llm/config.pyClientConfig
src/lexigram/ai/llm/routing/config.pyLLMConfig, ProviderConfig for routing
src/lexigram/ai/llm/di/provider.pyLLMProvider — registers and boots the client
src/lexigram/ai/llm/clients/Provider implementations
src/lexigram/ai/llm/thinking/ThinkingConfig handling and suppression
src/lexigram/ai/llm/exceptions.pyFull exception hierarchy