rag_system.openrouter_embeddings module

Embeddings client for RAG system.

Uses the native Google Gemini API only, via the shared key pool in gemini_embed_pool. Provides both async (OpenRouterEmbeddings) and synchronous (SyncOpenRouterEmbeddings, ChromaDB-compatible) interfaces.

class rag_system.openrouter_embeddings.OpenRouterEmbeddings(api_key=None, model='google/gemini-embedding-001', dimensions=None, timeout=30.0, gemini_api_key=None, gemini_only=True)[source]

Bases: object

Async embeddings client that calls the Gemini API via the shared key pool.

Despite the historical name, this client targets Google’s Gemini embeddings endpoint using keys drawn from the shared pool rather than OpenRouter: it batches inputs (bounded by MAX_BATCH_SIZE / MAX_BATCH_CHARS), retries with backoff, and exposes embed_text()/embed_texts() returning dense numpy.ndarray vectors of width dimensions (default 3072). Instantiated across the codebase wherever embeddings are needed – the vector tool classifier (classifiers.vector_classifier), the tool/skill/ dangerous-command embedding refreshers under classifiers/, and tools.search_tools; the file-RAG manager uses the sync sibling SyncOpenRouterEmbeddings.

Parameters:
  • api_key (str | None)

  • model (str)

  • dimensions (int | None)

  • timeout (float)

  • gemini_api_key (str | None)

  • gemini_only (bool)

DEFAULT_MODEL = 'google/gemini-embedding-001'
MAX_BATCH_SIZE = 50
MAX_BATCH_CHARS = 50000
__init__(api_key=None, model='google/gemini-embedding-001', dimensions=None, timeout=30.0, gemini_api_key=None, gemini_only=True)[source]

Initialize the instance.

Parameters:
  • api_key (Optional[str]) – Unused; kept for backward compatibility.

  • model (str) – The model value.

  • dimensions (Optional[int]) – The dimensions value.

  • timeout (float) – Maximum wait time in seconds.

  • gemini_api_key (Optional[str]) – Unused; pool is used instead.

  • gemini_only (bool) – Always True; embeddings use Gemini API only.

async embed_text(text)[source]

Embed a single string into one dense vector.

Thin convenience wrapper that wraps text in a one-element list, delegates to embed_texts() (which handles batching, retries, and the Gemini-then-OpenRouter-then-paid fallback chain), and returns the lone resulting vector. Performs no network I/O of its own beyond what embed_texts() does. Called by the vector classifier (classifiers/vector_classifier.py) and the search tool (tools/search_tools.py) to embed an incoming query before a similarity lookup.

Parameters:

text (str) – The text to embed.

Returns:

A single float32 embedding vector of length self.dimensions.

Return type:

ndarray

async embed_texts(texts)[source]

Embed one or more texts into dense vectors, batching as needed.

Top-level async entry point for embedding. It coerces texts to a list via _normalize_embed_texts_input() (so a bare string is treated as a single document rather than iterated character by character), splits the input into size- and char-bounded batches with _create_batches(), and embeds each batch via _embed_batch() — which drives the Gemini API through the shared key pool and falls back to OpenRouter and the paid Gemini key on sustained rate limits. Called by embed_text() here, and by the classifier embedding refresh helpers (classifiers/tool_embedding_batch.py, classifiers/update_skill_embeddings.py) when rebuilding routing vectors.

Parameters:

texts (Union[str, Sequence[str]]) – A list of strings, or a single string (treated as one document — not iterated by character).

Returns:

One float32 vector per input text, in input order. Returns an empty list when texts is empty.

Return type:

List[ndarray]

Embed a single text using the Gemini API only, with a task type.

Intended for pre-computing a query embedding before passing it to FileRAGManager.search(query_embedding=...). Retries on transient errors with exponential back-off.

Return type:

List[float]

Parameters:
async close()[source]

Close the underlying httpx async client and release its connections.

Calls aclose on the shared httpx.AsyncClient created in __init__(), freeing pooled sockets. Invoked directly by callers that manage the client’s lifetime, and automatically by __aexit__() when the instance is used as an async context manager.

async __aenter__()[source]

Enter the async context manager, returning this client unchanged.

Lets the embeddings client be used with async with so its httpx connections are guaranteed to be closed on exit via __aexit__(). Invoked by the Python runtime at the start of an async with block.

Returns:

This same instance.

Return type:

OpenRouterEmbeddings

async __aexit__(exc_type, exc_val, exc_tb)[source]

Exit the async context manager, closing the httpx client.

Delegates to close() to release the pooled connections regardless of whether the async with block exited normally or via an exception. Invoked by the Python runtime at the end of an async with block. Does not suppress exceptions.

Parameters:
  • exc_type – Exception type if the block raised, else None.

  • exc_val – Exception instance if the block raised, else None.

  • exc_tb – Traceback if the block raised, else None.

class rag_system.openrouter_embeddings.SyncOpenRouterEmbeddings(api_key=None, model='google/gemini-embedding-001', dimensions=None, timeout=30.0, gemini_api_key=None, gemini_only=True, document_task_type=None, query_task_type=None)[source]

Bases: object

Synchronous wrapper used by ChromaDB’s embedding function interface.

Uses Gemini API via the shared key pool. Batches are dispatched concurrently via a ThreadPoolExecutor when there are multiple batches.

Parameters:
  • api_key (str | None)

  • model (str)

  • dimensions (int | None)

  • timeout (float)

  • gemini_api_key (str | None)

  • gemini_only (bool)

  • document_task_type (str | None)

  • query_task_type (str | None)

MAX_BATCH_SIZE = 50
MAX_BATCH_CHARS = 50000
MAX_EMBED_WORKERS = 20
__init__(api_key=None, model='google/gemini-embedding-001', dimensions=None, timeout=30.0, gemini_api_key=None, gemini_only=True, document_task_type=None, query_task_type=None)[source]

Initialize the instance.

Parameters:
  • api_key (Optional[str]) – Unused; kept for backward compatibility.

  • model (str) – The model value.

  • dimensions (Optional[int]) – The dimensions value.

  • timeout (float) – Maximum wait time in seconds.

  • gemini_api_key (Optional[str]) – Unused; pool is used instead.

  • gemini_only (bool) – Unused; always Gemini API.

  • document_task_type (Optional[str]) – Optional Gemini taskType for corpus (e.g. RETRIEVAL_DOCUMENT); used by embed_documents.

  • query_task_type (Optional[str]) – Optional Gemini taskType for queries (e.g. RETRIEVAL_QUERY); used by embed_query.

name()[source]

Return the stable identifier ChromaDB uses for this embedder.

Part of the ChromaDB EmbeddingFunction contract; the value (derived in __init__() from the model name) lets ChromaDB detect when a collection’s embedding function changes. Pure getter with no I/O.

Returns:

The embedder’s name, e.g. openrouter_google_gemini-embedding-001.

Return type:

str

dimension()[source]

Return the fixed embedding dimensionality reported to ChromaDB.

Part of the ChromaDB EmbeddingFunction contract, used to validate that stored vectors match the collection’s expected width. Returns the constant 3072 produced by the Gemini embedding model. Pure getter with no I/O.

Returns:

The vector length (3072).

Return type:

int

__call__(input)[source]

Embed a list of texts via the legacy ChromaDB callable interface.

Implements the original ChromaDB EmbeddingFunction protocol where the embedder itself is invoked as a function. Treats inputs as corpus documents, applying document_task_type (matching embed_documents()), and delegates the actual batching and HTTP work to _embed_inputs(). Invoked by older ChromaDB versions and any call site that calls the embedder object directly.

Parameters:

input (List[str]) – Texts to embed.

Returns:

One embedding (list of floats) per input text.

Return type:

List[List[float]]

embed_documents(input)[source]

Embed corpus documents for the ChromaDB upsert path.

The modern (ChromaDB >= 0.6) entry point used when adding documents to a collection. Applies document_task_type so vectors are optimized for the retrieval-corpus side, then delegates to _embed_inputs(). Reached via the vector-store compatibility layer (vector_store.ChromaCompatCollection), which prefers this method over __call__() when present.

Parameters:

input (List[str]) – Document texts to embed.

Returns:

One embedding per document, in input order.

Return type:

List[List[float]]

embed_query(input)[source]

Embed query texts for the ChromaDB query path.

The modern (ChromaDB >= 0.6) entry point used when searching a collection. Applies query_task_type so vectors are optimized for the query side of asymmetric retrieval, then delegates to _embed_inputs(). Reached via the vector-store compatibility layer (vector_store.ChromaCompatCollection) when issuing a similarity search.

Parameters:

input (List[str]) – Query texts to embed.

Returns:

One embedding per query, in input order.

Return type:

List[List[float]]