Vector (lexigram-vector)
Vector storage infrastructure for the Lexigram Framework with Qdrant, ChromaDB, PGVector, Pinecone, and in-memory backends. Provides embedding clients, vector search, hybrid retrieval, reranking, and Named DI multi-store support.
Install
Section titled “Install”uv add lexigram lexigram-vector
# With Qdrant supportuv add qdrant-client
# With ChromaDB supportuv add chromadb
# With PGVector supportuv add pgvector # Requires lexigram-sql for database access
# With Pinecone supportuv add pinecone-client
# With embedding supportuv add openai # or anthropic, cohere, etc.Quick Start
Section titled “Quick Start”from __future__ import annotations
import asyncio
from lexigram.app import Applicationfrom lexigram.contracts.data.vector.protocols import VectorStoreProtocolfrom lexigram.di.module import Module, modulefrom lexigram.vector import VectorModulefrom lexigram.vector.config import QdrantConfig, VectorConfig
@module( imports=[ VectorModule.configure( VectorConfig( backend="qdrant", qdrant=QdrantConfig( url="http://localhost:6333", ), ) ) ])class AppModule(Module): pass
async def main() -> None: async with Application.boot(modules=[AppModule]) as app: store = await app.container.resolve(VectorStoreProtocol)
# Create a collection await store.create_collection( name="documents", dimension=1536, )
# Upsert vectors await store.upsert( collection_name="documents", ids=["doc1", "doc2"], vectors=[[0.1] * 1536, [0.2] * 1536], metadata=[ {"title": "Document 1", "category": "tech"}, {"title": "Document 2", "category": "science"}, ], )
# Search results = await store.search( collection_name="documents", query_vector=[0.15] * 1536, limit=5, )
for result in results: print(f"{result.id}: {result.metadata['title']} (score: {result.score})")
if __name__ == "__main__": asyncio.run(main())What It Provides
Section titled “What It Provides”lexigram-vector ships with:
- Multi-backend vector storage — Qdrant, ChromaDB, PGVector, Pinecone, and in-memory backends
- Embedding client — OpenAI-compatible async client for generating embeddings (
OpenAICompatibleEmbeddingClient) - Embedding cache — In-memory and persistent caching to reduce embedding API calls (
EmbeddingCache,InMemoryEmbeddingCache) - Vector search — Similarity search with metadata filtering and distance metrics
- Hybrid retrieval — BM25 + vector search with reciprocal rank fusion (
HybridRetriever,BM25Retriever,RRFReranker) - Reranking — Cross-encoder reranking, diversity reranking, and similarity reranking for improved relevance (
CrossEncoderReranker,DiversityReranker,RerankerPipeline) - Metadata filtering — Structured filtering on metadata fields with backend-specific filter compilers
- Named DI multi-store — Multiple vector stores registered as
Annotated[VectorStoreProtocol, Named("rag")] - Collection management — Create, delete, list collections with automatic schema inference
- Batch operations — Efficient batch upsert, delete, and search with configurable batch sizes
- Distance metrics — Cosine, Euclidean, and dot product similarity metrics
- Index types — HNSW, IVFFlat, and backend-specific index configuration
Architecture note: This package provides infrastructure and data-layer
functionality for vector storage and retrieval. While it is commonly used by AI
and RAG features (lexigram-ai-rag), it is a general-purpose vector database
abstraction suitable for any use case requiring semantic search, similarity
matching, or high-dimensional data storage.
Configuration
Section titled “Configuration”Zero-config usage: Call
VectorModule.configure()with no arguments to start with all built-in defaults — no config file or environment variables needed. See the Config reference below for all default values.
from lexigram.vector import VectorModule
app.add_module(VectorModule.configure()) # all defaultsOption 1 — YAML file (use when config lives in a single explicit file)
Section titled “Option 1 — YAML file (use when config lives in a single explicit file)”Declare config in a YAML file loaded at a fixed, explicit path. LEX_*
environment variables override YAML values at startup. Use this for local
development, simple self-hosted setups, or when you control exactly which
file is loaded. For multi-environment deployments (staging, production)
prefer Option 2, which automatically selects the right profile file.
config_section = "vector" is already set on this class — section= can be
omitted in all calls. Pass an explicit section= only to override the
default (e.g. when this config is nested under a non-standard key).
# application.yaml — copy example.yaml for a fully-annotated starting pointvector: backend: "qdrant" # memory, pgvector, pinecone, qdrant, chroma default_dimension: 1536 # 1536 = OpenAI text-embedding-3-small upsert_batch_size: 100 qdrant: url: "http://localhost:6333" api_key: null # LEX_VECTOR__QDRANT__API_KEYThen load and wire it in your composition root:
from lexigram.vector.config import VectorConfigfrom lexigram.vector import VectorModule
config = VectorConfig.from_yaml("application.yaml")app.add_module(VectorModule.configure(config))Environment variables override YAML values and use the LEX_VECTOR__ prefix:
LEX_VECTOR__BACKEND=qdrantOption 2 — Profiles + Environment Variables (recommended for production, staging, Docker, CI/CD)
Section titled “Option 2 — Profiles + Environment Variables (recommended for production, staging, Docker, CI/CD)”Loads a base application.yaml, then overlays an environment-specific
file (application.production.yaml, application.staging.yaml, etc.)
based on the LEX_PROFILE environment variable. LEX_* env vars are
applied last as the final override layer. Use this in production,
staging, Docker, Kubernetes, and CI/CD pipelines — set
LEX_PROFILE=production and the right profile file loads automatically.
section is optional: specify it (e.g. section="vector") when this
package’s config is nested inside a shared application.yaml; omit it
when the file is dedicated to this package alone.
# Set LEX_VECTOR__* env vars before starting the processexport LEX_VECTOR__ENABLED=truefrom lexigram.vector.config import VectorConfigfrom lexigram.vector import VectorModule
config = VectorConfig.from_env_profile()app.add_module(VectorModule.configure(config))Loading order:
application.yaml(base) →application.{profile}.yaml(overlay, ifLEX_PROFILEis set) →LEX_*environment variables (final override). Missing files are silently skipped so this is safe to call in all environments.
Option 3 — Python (use when config is dynamic or computed at boot)
Section titled “Option 3 — Python (use when config is dynamic or computed at boot)”Build config in code at boot time. Use this when settings are derived at runtime — e.g. secrets fetched from a vault, per-tenant configurations, or when you need multiple module instances with different settings.
from lexigram.vector import VectorModulefrom lexigram.vector.config import QdrantConfig, VectorConfig
app.add_module(VectorModule.configure( VectorConfig( backend="qdrant", qdrant=QdrantConfig( url="http://localhost:6333", ), )))Config reference
Section titled “Config reference”VectorConfig
Section titled “VectorConfig”Top-level configuration loaded from application.yaml’s vector: key or from
LEX_VECTOR__* environment variables.
| Field | Default | Env var | Description |
|---|---|---|---|
enabled | True | LEX_VECTOR__ENABLED | Enable the vector store subsystem |
backend | "memory" | LEX_VECTOR__BACKEND | Vector store backend ("memory", "qdrant", "chroma", "pgvector", "pinecone") |
default_distance_metric | DistanceMetric.COSINE | LEX_VECTOR__DEFAULT_DISTANCE_METRIC | Default distance metric for new collections |
default_index_type | IndexType.HNSW | LEX_VECTOR__DEFAULT_INDEX_TYPE | Default index type for new collections |
default_dimension | 1536 | LEX_VECTOR__DEFAULT_DIMENSION | Default vector dimension (matches OpenAI text-embedding-ada-002) |
upsert_batch_size | 100 | LEX_VECTOR__UPSERT_BATCH_SIZE | Number of vectors per upsert batch |
max_retries | 3 | LEX_VECTOR__MAX_RETRIES | Maximum number of retries for operations |
retry_delay | 1.0 | LEX_VECTOR__RETRY_DELAY | Delay between retries in seconds |
pgvector | PgVectorConfig() | — | PGVector-specific settings |
pinecone | PineconeConfig() | — | Pinecone-specific settings |
qdrant | QdrantConfig() | — | Qdrant-specific settings |
memory | MemoryConfig() | — | In-memory-specific settings |
backends | [] | — | List of NamedVectorConfig entries for multi-store support |
When backends is non-empty, each entry is registered under
Annotated[VectorStoreProtocol, Named(entry.name)]. The first entry (or the
one with primary=True) also receives the unnamed VectorStoreProtocol
binding for backward compatibility.
Backend-Specific Configuration
Section titled “Backend-Specific Configuration”QdrantConfig
Section titled “QdrantConfig”| Field | Default | Env var | Description |
|---|---|---|---|
url | "http://localhost:6333" | LEX_VECTOR__QDRANT__URL | Qdrant server URL |
api_key | None | LEX_VECTOR__QDRANT__API_KEY | Qdrant API key (optional) |
grpc_port | 6334 | LEX_VECTOR__QDRANT__GRPC_PORT | gRPC port for Qdrant |
prefer_grpc | True | LEX_VECTOR__QDRANT__PREFER_GRPC | Whether to prefer gRPC over HTTP |
timeout | 30.0 | LEX_VECTOR__QDRANT__TIMEOUT | Request timeout in seconds |
PgVectorConfig
Section titled “PgVectorConfig”| Field | Default | Env var | Description |
|---|---|---|---|
database | "primary" | LEX_VECTOR__PGVECTOR__DATABASE | Name of the database backend from db.backends to use |
schema | "public" | LEX_VECTOR__PGVECTOR__SCHEMA | Database schema for vector tables |
default_lists | 100 | LEX_VECTOR__PGVECTOR__DEFAULT_LISTS | Default number of lists for IVFFlat index |
default_probes | 10 | LEX_VECTOR__PGVECTOR__DEFAULT_PROBES | Default number of probes for IVFFlat index |
default_ef_search | 40 | LEX_VECTOR__PGVECTOR__DEFAULT_EF_SEARCH | Default ef_search for HNSW index |
table_prefix | "vec_" | LEX_VECTOR__PGVECTOR__TABLE_PREFIX | Prefix for vector storage tables |
create_extension | True | LEX_VECTOR__PGVECTOR__CREATE_EXTENSION | Whether to create pgvector extension if missing |
Note: PGVector requires lexigram-sql and a configured DatabaseProviderProtocol.
The database field refers to a named database backend from db.backends.
PineconeConfig
Section titled “PineconeConfig”| Field | Default | Env var | Description |
|---|---|---|---|
api_key | "" | LEX_VECTOR__PINECONE__API_KEY | Pinecone API key (required) |
environment | "" | LEX_VECTOR__PINECONE__ENVIRONMENT | Pinecone environment (e.g., "us-west1-gcp") |
index_name | "" | LEX_VECTOR__PINECONE__INDEX_NAME | Name of the Pinecone index |
namespace | "" | LEX_VECTOR__PINECONE__NAMESPACE | Default namespace for the index |
timeout | 30.0 | LEX_VECTOR__PINECONE__TIMEOUT | Request timeout in seconds |
pool_threads | 4 | LEX_VECTOR__PINECONE__POOL_THREADS | Number of threads for the connection pool |
MemoryConfig
Section titled “MemoryConfig”| Field | Default | Env var | Description |
|---|---|---|---|
max_collections | 100 | LEX_VECTOR__MEMORY__MAX_COLLECTIONS | Maximum number of collections in memory |
max_vectors_per_collection | 100,000 | LEX_VECTOR__MEMORY__MAX_VECTORS_PER_COLLECTION | Maximum number of vectors per collection |
NamedVectorConfig
Section titled “NamedVectorConfig”Configuration for a single named vector store backend (used in multi-store setups):
| Field | Description |
|---|---|
name | Unique backend identifier (used as the Named() DI key) |
primary | Whether this backend also receives the unnamed VectorStoreProtocol binding |
backend | Vector store driver for this named backend |
pgvector | PgVectorConfig for this backend |
pinecone | PineconeConfig for this backend |
qdrant | QdrantConfig for this backend |
memory | MemoryConfig for this backend |
Example multi-store setup:
from lexigram.vector.config import ( NamedVectorConfig, PgVectorConfig, QdrantConfig, VectorConfig,)
VectorModule.configure( VectorConfig( backends=[ NamedVectorConfig( name="primary", primary=True, backend="qdrant", qdrant=QdrantConfig( url="http://qdrant-primary:6333", ), ), NamedVectorConfig( name="rag", backend="pgvector", pgvector=PgVectorConfig( database="rag", schema="vectors", ), ), ] ))Inject named stores:
from typing import Annotatedfrom lexigram.contracts.data.vector.protocols import VectorStoreProtocolfrom lexigram.di.named import Named
class MyService: def __init__( self, store: VectorStoreProtocol, # primary rag: Annotated[VectorStoreProtocol, Named("rag")], ) -> None: self.store = store self.rag = ragModule Factory Methods
Section titled “Module Factory Methods”VectorModule.configure(config=None, enable_reranking=False)
Section titled “VectorModule.configure(config=None, enable_reranking=False)”Create a VectorModule with explicit configuration.
Args:
config:VectorConfiginstance,dictof config values, orNoneto use environment variable defaultsenable_reranking: Enable cross-encoder reranking of retrieval results (default:False)
Returns: DynamicModule that registers VectorStoreProtocol and VectorCollectionProtocol
Exports: VectorStoreProtocol, VectorCollectionProtocol
Example:
from lexigram.vector import VectorModulefrom lexigram.vector.config import QdrantConfig, VectorConfig
@module( imports=[ VectorModule.configure( VectorConfig( backend="qdrant", qdrant=QdrantConfig(url="http://localhost:6333"), ), enable_reranking=True, ) ])class AppModule(Module): passVectorModule.stub(config=None)
Section titled “VectorModule.stub(config=None)”Create a VectorModule suitable for unit and integration testing. Uses an
in-memory backend with no external service dependencies.
Args:
config: OptionalVectorConfigoverride (uses safe in-memory defaults whenNone)
Returns: DynamicModule that registers VectorStoreProtocol and VectorCollectionProtocol
Example:
from lexigram.vector import VectorModule
async def test_vector_search(): async with Application.boot( modules=[VectorModule.stub()] ) as app: store = await app.container.resolve(VectorStoreProtocol) # Test with in-memory backendVector Store Operations
Section titled “Vector Store Operations”Create Collection
Section titled “Create Collection”from lexigram.contracts.data.vector.enums import DistanceMetric, IndexType
await store.create_collection( name="documents", dimension=1536, distance_metric=DistanceMetric.COSINE, index_type=IndexType.HNSW,)Upsert Vectors
Section titled “Upsert Vectors”await store.upsert( collection_name="documents", ids=["doc1", "doc2", "doc3"], vectors=[ [0.1] * 1536, [0.2] * 1536, [0.3] * 1536, ], metadata=[ {"title": "Doc 1", "category": "tech"}, {"title": "Doc 2", "category": "science"}, {"title": "Doc 3", "category": "tech"}, ],)Search with Filters
Section titled “Search with Filters”results = await store.search( collection_name="documents", query_vector=[0.15] * 1536, limit=10, filter={"category": "tech"}, # Metadata filter)
for result in results: print(f"{result.id}: score={result.score}, metadata={result.metadata}")Delete Vectors
Section titled “Delete Vectors”await store.delete( collection_name="documents", ids=["doc1", "doc2"],)Embedding Client
Section titled “Embedding Client”Generate embeddings with an OpenAI-compatible client:
from lexigram.vector.embedding.client import OpenAICompatibleEmbeddingClientfrom lexigram.vector.embedding.config import EmbeddingClientConfig
client = OpenAICompatibleEmbeddingClient( config=EmbeddingClientConfig( api_key="your-api-key", model="text-embedding-ada-002", ))
# Single textembedding = await client.embed("Hello, world!")
# Batchembeddings = await client.embed_batch(["Text 1", "Text 2", "Text 3"])Embedding Cache
Section titled “Embedding Cache”Cache embeddings to reduce API calls:
from lexigram.vector.embedding.cache import InMemoryEmbeddingCache
cache = InMemoryEmbeddingCache(max_size=10000)
# Cache hit on second callembedding1 = await client.embed("Hello, world!")await cache.set("Hello, world!", embedding1)
embedding2 = await cache.get("Hello, world!") # From cacheHybrid Retrieval
Section titled “Hybrid Retrieval”Combine BM25 and vector search with reciprocal rank fusion:
from lexigram.vector.search.hybrid import HybridRetriever, HybridSearchConfig
retriever = HybridRetriever( vector_store=store, config=HybridSearchConfig( collection_name="documents", alpha=0.5, # Weight between BM25 (0.0) and vector (1.0) k=60, # RRF parameter ),)
results = await retriever.search( query="machine learning", query_vector=[0.1] * 1536, limit=10,)Reranking
Section titled “Reranking”Rerank search results for improved relevance:
from lexigram.vector.search.reranking import ( CrossEncoderReranker, RerankerPipeline, RerankingConfig,)
reranker = CrossEncoderReranker( config=RerankingConfig( model_name="cross-encoder/ms-marco-MiniLM-L-6-v2", ))
# Rerank resultsreranked = await reranker.rerank( query="machine learning", results=search_results, limit=5,)Compose multiple rerankers in a pipeline:
from lexigram.vector.search.reranking import ( CrossEncoderReranker, DiversityReranker, RerankerPipeline,)
pipeline = RerankerPipeline( rerankers=[ CrossEncoderReranker(config=cross_encoder_config), DiversityReranker(lambda_param=0.5), ])
reranked = await pipeline.rerank(query="machine learning", results=results)Adapters
Section titled “Adapters”Use adapters to bridge vector stores with other abstractions:
VectorStoreAdapter
Section titled “VectorStoreAdapter”from lexigram.vector.adapters.vector_store import VectorStoreAdapter
adapter = VectorStoreAdapter(store=store)
# Higher-level operationsawait adapter.index_documents( collection_name="documents", documents=[ {"id": "doc1", "text": "Document 1", "metadata": {...}}, {"id": "doc2", "text": "Document 2", "metadata": {...}}, ], embedding_fn=client.embed_batch,)DocumentVectorStoreAdapter
Section titled “DocumentVectorStoreAdapter”from lexigram.vector.adapters.document_store import DocumentVectorStoreAdapter
adapter = DocumentVectorStoreAdapter( vector_store=store, document_store=nosql_store,)
# Store documents in NoSQL and vectors in vector storeawait adapter.index_with_metadata( collection_name="documents", documents=[...], embedding_fn=client.embed_batch,)Integration with AI and RAG
Section titled “Integration with AI and RAG”While lexigram-vector is a general-purpose vector storage layer, it integrates
seamlessly with AI and RAG features:
lexigram-ai-rag— UsesVectorStoreProtocolfor retrieval-augmented generationlexigram-ai— Uses embedding clients and vector stores for semantic search and memorylexigram-cache— Can useSemanticCacheProtocol(backed by vector stores) for semantic caching
The Named DI system allows you to use different vector stores for different
purposes (e.g., primary for semantic search, rag for retrieval, cache for
semantic caching).
Key Source Files
Section titled “Key Source Files”src/lexigram/vector/module.py—VectorModule.configure(),.stub()src/lexigram/vector/config.py—VectorConfig,NamedVectorConfig, backend configssrc/lexigram/vector/di/provider.py—VectorProviderboot and registrationsrc/lexigram/vector/di/factories.py— Factory functions for creating vector storessrc/lexigram/vector/backends/qdrant/— Qdrant backend implementationsrc/lexigram/vector/backends/pgvector/— PGVector backend implementationsrc/lexigram/vector/backends/pinecone/— Pinecone backend implementationsrc/lexigram/vector/backends/chroma.py— ChromaDB backend implementationsrc/lexigram/vector/backends/memory.py— In-memory backend implementationsrc/lexigram/vector/embedding/client.py—OpenAICompatibleEmbeddingClientsrc/lexigram/vector/embedding/cache.py— Embedding cache implementationssrc/lexigram/vector/search/hybrid.py— Hybrid retrieval and BM25src/lexigram/vector/search/reranking.py— Reranking strategiessrc/lexigram/vector/adapters/vector_store.py—VectorStoreAdaptersrc/lexigram/vector/adapters/document_store.py—DocumentVectorStoreAdapter