Skip to content
GitHubDiscord

Vector (lexigram-vector)

Vector storage infrastructure for the Lexigram Framework with Qdrant, ChromaDB, PGVector, Pinecone, and in-memory backends. Provides embedding clients, vector search, hybrid retrieval, reranking, and Named DI multi-store support.

Terminal window
uv add lexigram lexigram-vector
# With Qdrant support
uv add qdrant-client
# With ChromaDB support
uv add chromadb
# With PGVector support
uv add pgvector # Requires lexigram-sql for database access
# With Pinecone support
uv add pinecone-client
# With embedding support
uv add openai # or anthropic, cohere, etc.
from __future__ import annotations
import asyncio
from lexigram.app import Application
from lexigram.contracts.data.vector.protocols import VectorStoreProtocol
from lexigram.di.module import Module, module
from lexigram.vector import VectorModule
from lexigram.vector.config import QdrantConfig, VectorConfig
@module(
imports=[
VectorModule.configure(
VectorConfig(
backend="qdrant",
qdrant=QdrantConfig(
url="http://localhost:6333",
),
)
)
]
)
class AppModule(Module):
pass
async def main() -> None:
async with Application.boot(modules=[AppModule]) as app:
store = await app.container.resolve(VectorStoreProtocol)
# Create a collection
await store.create_collection(
name="documents",
dimension=1536,
)
# Upsert vectors
await store.upsert(
collection_name="documents",
ids=["doc1", "doc2"],
vectors=[[0.1] * 1536, [0.2] * 1536],
metadata=[
{"title": "Document 1", "category": "tech"},
{"title": "Document 2", "category": "science"},
],
)
# Search
results = await store.search(
collection_name="documents",
query_vector=[0.15] * 1536,
limit=5,
)
for result in results:
print(f"{result.id}: {result.metadata['title']} (score: {result.score})")
if __name__ == "__main__":
asyncio.run(main())

lexigram-vector ships with:

  • Multi-backend vector storage — Qdrant, ChromaDB, PGVector, Pinecone, and in-memory backends
  • Embedding client — OpenAI-compatible async client for generating embeddings (OpenAICompatibleEmbeddingClient)
  • Embedding cache — In-memory and persistent caching to reduce embedding API calls (EmbeddingCache, InMemoryEmbeddingCache)
  • Vector search — Similarity search with metadata filtering and distance metrics
  • Hybrid retrieval — BM25 + vector search with reciprocal rank fusion (HybridRetriever, BM25Retriever, RRFReranker)
  • Reranking — Cross-encoder reranking, diversity reranking, and similarity reranking for improved relevance (CrossEncoderReranker, DiversityReranker, RerankerPipeline)
  • Metadata filtering — Structured filtering on metadata fields with backend-specific filter compilers
  • Named DI multi-store — Multiple vector stores registered as Annotated[VectorStoreProtocol, Named("rag")]
  • Collection management — Create, delete, list collections with automatic schema inference
  • Batch operations — Efficient batch upsert, delete, and search with configurable batch sizes
  • Distance metrics — Cosine, Euclidean, and dot product similarity metrics
  • Index types — HNSW, IVFFlat, and backend-specific index configuration

Architecture note: This package provides infrastructure and data-layer functionality for vector storage and retrieval. While it is commonly used by AI and RAG features (lexigram-ai-rag), it is a general-purpose vector database abstraction suitable for any use case requiring semantic search, similarity matching, or high-dimensional data storage.

Zero-config usage: Call VectorModule.configure() with no arguments to start with all built-in defaults — no config file or environment variables needed. See the Config reference below for all default values.

from lexigram.vector import VectorModule
app.add_module(VectorModule.configure()) # all defaults

Option 1 — YAML file (use when config lives in a single explicit file)

Section titled “Option 1 — YAML file (use when config lives in a single explicit file)”

Declare config in a YAML file loaded at a fixed, explicit path. LEX_* environment variables override YAML values at startup. Use this for local development, simple self-hosted setups, or when you control exactly which file is loaded. For multi-environment deployments (staging, production) prefer Option 2, which automatically selects the right profile file.

config_section = "vector" is already set on this class — section= can be omitted in all calls. Pass an explicit section= only to override the default (e.g. when this config is nested under a non-standard key).

# application.yaml — copy example.yaml for a fully-annotated starting point
vector:
backend: "qdrant" # memory, pgvector, pinecone, qdrant, chroma
default_dimension: 1536 # 1536 = OpenAI text-embedding-3-small
upsert_batch_size: 100
qdrant:
url: "http://localhost:6333"
api_key: null # LEX_VECTOR__QDRANT__API_KEY

Then load and wire it in your composition root:

from lexigram.vector.config import VectorConfig
from lexigram.vector import VectorModule
config = VectorConfig.from_yaml("application.yaml")
app.add_module(VectorModule.configure(config))

Environment variables override YAML values and use the LEX_VECTOR__ prefix:

Terminal window
LEX_VECTOR__BACKEND=qdrant

Section titled “Option 2 — Profiles + Environment Variables (recommended for production, staging, Docker, CI/CD)”

Loads a base application.yaml, then overlays an environment-specific file (application.production.yaml, application.staging.yaml, etc.) based on the LEX_PROFILE environment variable. LEX_* env vars are applied last as the final override layer. Use this in production, staging, Docker, Kubernetes, and CI/CD pipelines — set LEX_PROFILE=production and the right profile file loads automatically.

section is optional: specify it (e.g. section="vector") when this package’s config is nested inside a shared application.yaml; omit it when the file is dedicated to this package alone.

Terminal window
# Set LEX_VECTOR__* env vars before starting the process
export LEX_VECTOR__ENABLED=true
from lexigram.vector.config import VectorConfig
from lexigram.vector import VectorModule
config = VectorConfig.from_env_profile()
app.add_module(VectorModule.configure(config))

Loading order: application.yaml (base) → application.{profile}.yaml (overlay, if LEX_PROFILE is set) → LEX_* environment variables (final override). Missing files are silently skipped so this is safe to call in all environments.


Option 3 — Python (use when config is dynamic or computed at boot)

Section titled “Option 3 — Python (use when config is dynamic or computed at boot)”

Build config in code at boot time. Use this when settings are derived at runtime — e.g. secrets fetched from a vault, per-tenant configurations, or when you need multiple module instances with different settings.

from lexigram.vector import VectorModule
from lexigram.vector.config import QdrantConfig, VectorConfig
app.add_module(VectorModule.configure(
VectorConfig(
backend="qdrant",
qdrant=QdrantConfig(
url="http://localhost:6333",
),
)
))

Top-level configuration loaded from application.yaml’s vector: key or from LEX_VECTOR__* environment variables.

FieldDefaultEnv varDescription
enabledTrueLEX_VECTOR__ENABLEDEnable the vector store subsystem
backend"memory"LEX_VECTOR__BACKENDVector store backend ("memory", "qdrant", "chroma", "pgvector", "pinecone")
default_distance_metricDistanceMetric.COSINELEX_VECTOR__DEFAULT_DISTANCE_METRICDefault distance metric for new collections
default_index_typeIndexType.HNSWLEX_VECTOR__DEFAULT_INDEX_TYPEDefault index type for new collections
default_dimension1536LEX_VECTOR__DEFAULT_DIMENSIONDefault vector dimension (matches OpenAI text-embedding-ada-002)
upsert_batch_size100LEX_VECTOR__UPSERT_BATCH_SIZENumber of vectors per upsert batch
max_retries3LEX_VECTOR__MAX_RETRIESMaximum number of retries for operations
retry_delay1.0LEX_VECTOR__RETRY_DELAYDelay between retries in seconds
pgvectorPgVectorConfig()—PGVector-specific settings
pineconePineconeConfig()—Pinecone-specific settings
qdrantQdrantConfig()—Qdrant-specific settings
memoryMemoryConfig()—In-memory-specific settings
backends[]—List of NamedVectorConfig entries for multi-store support

When backends is non-empty, each entry is registered under Annotated[VectorStoreProtocol, Named(entry.name)]. The first entry (or the one with primary=True) also receives the unnamed VectorStoreProtocol binding for backward compatibility.

FieldDefaultEnv varDescription
url"http://localhost:6333"LEX_VECTOR__QDRANT__URLQdrant server URL
api_keyNoneLEX_VECTOR__QDRANT__API_KEYQdrant API key (optional)
grpc_port6334LEX_VECTOR__QDRANT__GRPC_PORTgRPC port for Qdrant
prefer_grpcTrueLEX_VECTOR__QDRANT__PREFER_GRPCWhether to prefer gRPC over HTTP
timeout30.0LEX_VECTOR__QDRANT__TIMEOUTRequest timeout in seconds
FieldDefaultEnv varDescription
database"primary"LEX_VECTOR__PGVECTOR__DATABASEName of the database backend from db.backends to use
schema"public"LEX_VECTOR__PGVECTOR__SCHEMADatabase schema for vector tables
default_lists100LEX_VECTOR__PGVECTOR__DEFAULT_LISTSDefault number of lists for IVFFlat index
default_probes10LEX_VECTOR__PGVECTOR__DEFAULT_PROBESDefault number of probes for IVFFlat index
default_ef_search40LEX_VECTOR__PGVECTOR__DEFAULT_EF_SEARCHDefault ef_search for HNSW index
table_prefix"vec_"LEX_VECTOR__PGVECTOR__TABLE_PREFIXPrefix for vector storage tables
create_extensionTrueLEX_VECTOR__PGVECTOR__CREATE_EXTENSIONWhether to create pgvector extension if missing

Note: PGVector requires lexigram-sql and a configured DatabaseProviderProtocol. The database field refers to a named database backend from db.backends.

FieldDefaultEnv varDescription
api_key""LEX_VECTOR__PINECONE__API_KEYPinecone API key (required)
environment""LEX_VECTOR__PINECONE__ENVIRONMENTPinecone environment (e.g., "us-west1-gcp")
index_name""LEX_VECTOR__PINECONE__INDEX_NAMEName of the Pinecone index
namespace""LEX_VECTOR__PINECONE__NAMESPACEDefault namespace for the index
timeout30.0LEX_VECTOR__PINECONE__TIMEOUTRequest timeout in seconds
pool_threads4LEX_VECTOR__PINECONE__POOL_THREADSNumber of threads for the connection pool
FieldDefaultEnv varDescription
max_collections100LEX_VECTOR__MEMORY__MAX_COLLECTIONSMaximum number of collections in memory
max_vectors_per_collection100,000LEX_VECTOR__MEMORY__MAX_VECTORS_PER_COLLECTIONMaximum number of vectors per collection

Configuration for a single named vector store backend (used in multi-store setups):

FieldDescription
nameUnique backend identifier (used as the Named() DI key)
primaryWhether this backend also receives the unnamed VectorStoreProtocol binding
backendVector store driver for this named backend
pgvectorPgVectorConfig for this backend
pineconePineconeConfig for this backend
qdrantQdrantConfig for this backend
memoryMemoryConfig for this backend

Example multi-store setup:

from lexigram.vector.config import (
NamedVectorConfig,
PgVectorConfig,
QdrantConfig,
VectorConfig,
)
VectorModule.configure(
VectorConfig(
backends=[
NamedVectorConfig(
name="primary",
primary=True,
backend="qdrant",
qdrant=QdrantConfig(
url="http://qdrant-primary:6333",
),
),
NamedVectorConfig(
name="rag",
backend="pgvector",
pgvector=PgVectorConfig(
database="rag",
schema="vectors",
),
),
]
)
)

Inject named stores:

from typing import Annotated
from lexigram.contracts.data.vector.protocols import VectorStoreProtocol
from lexigram.di.named import Named
class MyService:
def __init__(
self,
store: VectorStoreProtocol, # primary
rag: Annotated[VectorStoreProtocol, Named("rag")],
) -> None:
self.store = store
self.rag = rag

VectorModule.configure(config=None, enable_reranking=False)

Section titled “VectorModule.configure(config=None, enable_reranking=False)”

Create a VectorModule with explicit configuration.

Args:

  • config: VectorConfig instance, dict of config values, or None to use environment variable defaults
  • enable_reranking: Enable cross-encoder reranking of retrieval results (default: False)

Returns: DynamicModule that registers VectorStoreProtocol and VectorCollectionProtocol

Exports: VectorStoreProtocol, VectorCollectionProtocol

Example:

from lexigram.vector import VectorModule
from lexigram.vector.config import QdrantConfig, VectorConfig
@module(
imports=[
VectorModule.configure(
VectorConfig(
backend="qdrant",
qdrant=QdrantConfig(url="http://localhost:6333"),
),
enable_reranking=True,
)
]
)
class AppModule(Module):
pass

Create a VectorModule suitable for unit and integration testing. Uses an in-memory backend with no external service dependencies.

Args:

  • config: Optional VectorConfig override (uses safe in-memory defaults when None)

Returns: DynamicModule that registers VectorStoreProtocol and VectorCollectionProtocol

Example:

from lexigram.vector import VectorModule
async def test_vector_search():
async with Application.boot(
modules=[VectorModule.stub()]
) as app:
store = await app.container.resolve(VectorStoreProtocol)
# Test with in-memory backend
from lexigram.contracts.data.vector.enums import DistanceMetric, IndexType
await store.create_collection(
name="documents",
dimension=1536,
distance_metric=DistanceMetric.COSINE,
index_type=IndexType.HNSW,
)
await store.upsert(
collection_name="documents",
ids=["doc1", "doc2", "doc3"],
vectors=[
[0.1] * 1536,
[0.2] * 1536,
[0.3] * 1536,
],
metadata=[
{"title": "Doc 1", "category": "tech"},
{"title": "Doc 2", "category": "science"},
{"title": "Doc 3", "category": "tech"},
],
)
results = await store.search(
collection_name="documents",
query_vector=[0.15] * 1536,
limit=10,
filter={"category": "tech"}, # Metadata filter
)
for result in results:
print(f"{result.id}: score={result.score}, metadata={result.metadata}")
await store.delete(
collection_name="documents",
ids=["doc1", "doc2"],
)

Generate embeddings with an OpenAI-compatible client:

from lexigram.vector.embedding.client import OpenAICompatibleEmbeddingClient
from lexigram.vector.embedding.config import EmbeddingClientConfig
client = OpenAICompatibleEmbeddingClient(
config=EmbeddingClientConfig(
api_key="your-api-key",
model="text-embedding-ada-002",
)
)
# Single text
embedding = await client.embed("Hello, world!")
# Batch
embeddings = await client.embed_batch(["Text 1", "Text 2", "Text 3"])

Cache embeddings to reduce API calls:

from lexigram.vector.embedding.cache import InMemoryEmbeddingCache
cache = InMemoryEmbeddingCache(max_size=10000)
# Cache hit on second call
embedding1 = await client.embed("Hello, world!")
await cache.set("Hello, world!", embedding1)
embedding2 = await cache.get("Hello, world!") # From cache

Combine BM25 and vector search with reciprocal rank fusion:

from lexigram.vector.search.hybrid import HybridRetriever, HybridSearchConfig
retriever = HybridRetriever(
vector_store=store,
config=HybridSearchConfig(
collection_name="documents",
alpha=0.5, # Weight between BM25 (0.0) and vector (1.0)
k=60, # RRF parameter
),
)
results = await retriever.search(
query="machine learning",
query_vector=[0.1] * 1536,
limit=10,
)

Rerank search results for improved relevance:

from lexigram.vector.search.reranking import (
CrossEncoderReranker,
RerankerPipeline,
RerankingConfig,
)
reranker = CrossEncoderReranker(
config=RerankingConfig(
model_name="cross-encoder/ms-marco-MiniLM-L-6-v2",
)
)
# Rerank results
reranked = await reranker.rerank(
query="machine learning",
results=search_results,
limit=5,
)

Compose multiple rerankers in a pipeline:

from lexigram.vector.search.reranking import (
CrossEncoderReranker,
DiversityReranker,
RerankerPipeline,
)
pipeline = RerankerPipeline(
rerankers=[
CrossEncoderReranker(config=cross_encoder_config),
DiversityReranker(lambda_param=0.5),
]
)
reranked = await pipeline.rerank(query="machine learning", results=results)

Use adapters to bridge vector stores with other abstractions:

from lexigram.vector.adapters.vector_store import VectorStoreAdapter
adapter = VectorStoreAdapter(store=store)
# Higher-level operations
await adapter.index_documents(
collection_name="documents",
documents=[
{"id": "doc1", "text": "Document 1", "metadata": {...}},
{"id": "doc2", "text": "Document 2", "metadata": {...}},
],
embedding_fn=client.embed_batch,
)
from lexigram.vector.adapters.document_store import DocumentVectorStoreAdapter
adapter = DocumentVectorStoreAdapter(
vector_store=store,
document_store=nosql_store,
)
# Store documents in NoSQL and vectors in vector store
await adapter.index_with_metadata(
collection_name="documents",
documents=[...],
embedding_fn=client.embed_batch,
)

While lexigram-vector is a general-purpose vector storage layer, it integrates seamlessly with AI and RAG features:

  • lexigram-ai-rag — Uses VectorStoreProtocol for retrieval-augmented generation
  • lexigram-ai — Uses embedding clients and vector stores for semantic search and memory
  • lexigram-cache — Can use SemanticCacheProtocol (backed by vector stores) for semantic caching

The Named DI system allows you to use different vector stores for different purposes (e.g., primary for semantic search, rag for retrieval, cache for semantic caching).

  • src/lexigram/vector/module.py — VectorModule.configure(), .stub()
  • src/lexigram/vector/config.py — VectorConfig, NamedVectorConfig, backend configs
  • src/lexigram/vector/di/provider.py — VectorProvider boot and registration
  • src/lexigram/vector/di/factories.py — Factory functions for creating vector stores
  • src/lexigram/vector/backends/qdrant/ — Qdrant backend implementation
  • src/lexigram/vector/backends/pgvector/ — PGVector backend implementation
  • src/lexigram/vector/backends/pinecone/ — Pinecone backend implementation
  • src/lexigram/vector/backends/chroma.py — ChromaDB backend implementation
  • src/lexigram/vector/backends/memory.py — In-memory backend implementation
  • src/lexigram/vector/embedding/client.py — OpenAICompatibleEmbeddingClient
  • src/lexigram/vector/embedding/cache.py — Embedding cache implementations
  • src/lexigram/vector/search/hybrid.py — Hybrid retrieval and BM25
  • src/lexigram/vector/search/reranking.py — Reranking strategies
  • src/lexigram/vector/adapters/vector_store.py — VectorStoreAdapter
  • src/lexigram/vector/adapters/document_store.py — DocumentVectorStoreAdapter