rag_system.openrouter_embeddings module
Embeddings client for RAG system.
Uses the native Google Gemini API only, via the shared key pool in gemini_embed_pool. Provides both async (OpenRouterEmbeddings) and synchronous (SyncOpenRouterEmbeddings, ChromaDB-compatible) interfaces.
- class rag_system.openrouter_embeddings.OpenRouterEmbeddings(api_key=None, model='google/gemini-embedding-001', dimensions=None, timeout=30.0, gemini_api_key=None, gemini_only=True)[source]
Bases:
objectAsync embeddings client that calls the Gemini API via the shared key pool.
Despite the historical name, this client targets Google’s Gemini embeddings endpoint using keys drawn from the shared pool rather than OpenRouter: it batches inputs (bounded by
MAX_BATCH_SIZE/MAX_BATCH_CHARS), retries with backoff, and exposesembed_text()/embed_texts()returning densenumpy.ndarrayvectors of widthdimensions(default 3072). Instantiated across the codebase wherever embeddings are needed – the vector tool classifier (classifiers.vector_classifier), the tool/skill/ dangerous-command embedding refreshers underclassifiers/, andtools.search_tools; the file-RAG manager uses the sync siblingSyncOpenRouterEmbeddings.- Parameters:
- DEFAULT_MODEL = 'google/gemini-embedding-001'
- MAX_BATCH_SIZE = 50
- MAX_BATCH_CHARS = 50000
- __init__(api_key=None, model='google/gemini-embedding-001', dimensions=None, timeout=30.0, gemini_api_key=None, gemini_only=True)[source]
Initialize the instance.
- async embed_text(text)[source]
Embed a single string into one dense vector.
Thin convenience wrapper that wraps
textin a one-element list, delegates toembed_texts()(which handles batching, retries, and the Gemini-then-OpenRouter-then-paid fallback chain), and returns the lone resulting vector. Performs no network I/O of its own beyond whatembed_texts()does. Called by the vector classifier (classifiers/vector_classifier.py) and the search tool (tools/search_tools.py) to embed an incoming query before a similarity lookup.
- async embed_texts(texts)[source]
Embed one or more texts into dense vectors, batching as needed.
Top-level async entry point for embedding. It coerces
textsto a list via_normalize_embed_texts_input()(so a bare string is treated as a single document rather than iterated character by character), splits the input into size- and char-bounded batches with_create_batches(), and embeds each batch via_embed_batch()— which drives the Gemini API through the shared key pool and falls back to OpenRouter and the paid Gemini key on sustained rate limits. Called byembed_text()here, and by the classifier embedding refresh helpers (classifiers/tool_embedding_batch.py,classifiers/update_skill_embeddings.py) when rebuilding routing vectors.
- async embed_text_for_search(text, task_type='QUESTION_ANSWERING')[source]
Embed a single text using the Gemini API only, with a task type.
Intended for pre-computing a query embedding before passing it to
FileRAGManager.search(query_embedding=...). Retries on transient errors with exponential back-off.
- async close()[source]
Close the underlying httpx async client and release its connections.
Calls
acloseon the sharedhttpx.AsyncClientcreated in__init__(), freeing pooled sockets. Invoked directly by callers that manage the client’s lifetime, and automatically by__aexit__()when the instance is used as an async context manager.
- async __aenter__()[source]
Enter the async context manager, returning this client unchanged.
Lets the embeddings client be used with
async withso its httpx connections are guaranteed to be closed on exit via__aexit__(). Invoked by the Python runtime at the start of anasync withblock.- Returns:
This same instance.
- Return type:
- async __aexit__(exc_type, exc_val, exc_tb)[source]
Exit the async context manager, closing the httpx client.
Delegates to
close()to release the pooled connections regardless of whether theasync withblock exited normally or via an exception. Invoked by the Python runtime at the end of anasync withblock. Does not suppress exceptions.- Parameters:
exc_type – Exception type if the block raised, else
None.exc_val – Exception instance if the block raised, else
None.exc_tb – Traceback if the block raised, else
None.
- class rag_system.openrouter_embeddings.SyncOpenRouterEmbeddings(api_key=None, model='google/gemini-embedding-001', dimensions=None, timeout=30.0, gemini_api_key=None, gemini_only=True, document_task_type=None, query_task_type=None)[source]
Bases:
objectSynchronous wrapper used by ChromaDB’s embedding function interface.
Uses Gemini API via the shared key pool. Batches are dispatched concurrently via a ThreadPoolExecutor when there are multiple batches.
- Parameters:
- MAX_BATCH_SIZE = 50
- MAX_BATCH_CHARS = 50000
- MAX_EMBED_WORKERS = 20
- __init__(api_key=None, model='google/gemini-embedding-001', dimensions=None, timeout=30.0, gemini_api_key=None, gemini_only=True, document_task_type=None, query_task_type=None)[source]
Initialize the instance.
- Parameters:
api_key (
Optional[str]) – Unused; kept for backward compatibility.model (
str) – The model value.timeout (
float) – Maximum wait time in seconds.gemini_api_key (
Optional[str]) – Unused; pool is used instead.gemini_only (
bool) – Unused; always Gemini API.document_task_type (
Optional[str]) – Optional GeminitaskTypefor corpus (e.g.RETRIEVAL_DOCUMENT); used byembed_documents.query_task_type (
Optional[str]) – Optional GeminitaskTypefor queries (e.g.RETRIEVAL_QUERY); used byembed_query.
- name()[source]
Return the stable identifier ChromaDB uses for this embedder.
Part of the ChromaDB
EmbeddingFunctioncontract; the value (derived in__init__()from the model name) lets ChromaDB detect when a collection’s embedding function changes. Pure getter with no I/O.- Returns:
The embedder’s name, e.g.
openrouter_google_gemini-embedding-001.- Return type:
- dimension()[source]
Return the fixed embedding dimensionality reported to ChromaDB.
Part of the ChromaDB
EmbeddingFunctioncontract, used to validate that stored vectors match the collection’s expected width. Returns the constant 3072 produced by the Gemini embedding model. Pure getter with no I/O.- Returns:
The vector length (3072).
- Return type:
- __call__(input)[source]
Embed a list of texts via the legacy ChromaDB callable interface.
Implements the original ChromaDB
EmbeddingFunctionprotocol where the embedder itself is invoked as a function. Treats inputs as corpus documents, applyingdocument_task_type(matchingembed_documents()), and delegates the actual batching and HTTP work to_embed_inputs(). Invoked by older ChromaDB versions and any call site that calls the embedder object directly.
- embed_documents(input)[source]
Embed corpus documents for the ChromaDB upsert path.
The modern (ChromaDB >= 0.6) entry point used when adding documents to a collection. Applies
document_task_typeso vectors are optimized for the retrieval-corpus side, then delegates to_embed_inputs(). Reached via the vector-store compatibility layer (vector_store.ChromaCompatCollection), which prefers this method over__call__()when present.
- embed_query(input)[source]
Embed query texts for the ChromaDB query path.
The modern (ChromaDB >= 0.6) entry point used when searching a collection. Applies
query_task_typeso vectors are optimized for the query side of asymmetric retrieval, then delegates to_embed_inputs(). Reached via the vector-store compatibility layer (vector_store.ChromaCompatCollection) when issuing a similarity search.