gemini_embed_pool

Shared Gemini API key pool for rate-limit distribution.

All embedding calls use next_gemini_embed_key() and all flash-lite generation calls use next_gemini_flash_key() for round-robin key selection. Both accessors draw from the same underlying key pool but maintain independent rotation cycles so they don’t interfere with each other.

Supports GEMINI_EMBED_KEY_POOL env var (comma-separated keys) with fallback to the default hardcoded pool.

Embedding fallback cascade

When the free pool starts 429-ing (consecutive failures exceed PAID_KEY_FALLBACK_THRESHOLD), the embed call sites cascade in this order:

  1. Free-tier pool (round-robin via next_gemini_embed_key()).

  2. OpenRouter (openrouter_embed_batch() / ..._sync) — engaged via set_openrouter_only() so subsequent calls go straight here.

  3. Paid tier-3 Gemini key (gemini_embed_paid_fallback() / ..._sync) — used as the absolute last resort when both the free pool and OpenRouter have failed. Its key is returned by get_paid_fallback_key() (env override GEMINI_EMBED_PAID_KEY, default _DEFAULT_PAID_KEY).

When openrouter_only is pinned, callers must still fall through to the paid-key helper if OpenRouter fails on a specific call before raising.

OpenRouter embed fallback (openrouter_embed_batch / openrouter_embed_batch_sync) retries transient network errors and HTTP 429 / 5xx with exponential backoff. Tune with env: OPENROUTER_EMBED_MAX_ATTEMPTS (default 24), OPENROUTER_EMBED_RETRY_BASE_DELAY, OPENROUTER_EMBED_RETRY_MAX_DELAY.

Daily quota tracking

Quotas are tracked per model class ("embed" vs "generate"). Embedding RPD and generation RPD are separate quotas on the same key, so a key spent for embeddings can still serve flash-lite generation and vice versa.

Each key’s usage is tracked in Redis and two in-memory spent sets. Keys that receive a daily-quota 429 (PerDay in the quotaId) are excluded from rotation for the relevant model class until midnight Pacific Time. A background probe every 2 hours detects keys exhausted by external usage.

gemini_embed_pool.get_paid_fallback_key()[source]

Return the paid tier-3 Gemini API key used as the absolute last-resort fallback.

Reads the GEMINI_EMBED_PAID_KEY env override and falls back to _DEFAULT_PAID_KEY. This key is only reached once the free pool and OpenRouter have both failed, and is also handed out by _next_active_key() when every free key is daily-spent.

Called within this module by _next_active_key(), gemini_embed_paid_fallback(), and gemini_embed_paid_fallback_sync(), and externally by gemini_kg_bulk_client.py and classifiers/build_tool_index.py for their own paid-key fallbacks.

Returns:

The paid key, or None if neither the env var nor the default is set.

Return type:

str | None

async gemini_embed_pool.reload_pool()[source]

Merge donated keys from Redis into the live pool and rebuild cycles.

Returns the new total pool size. Safe to call multiple times.

Return type:

int

gemini_embed_pool.init_quota_tracking(redis_client)[source]

Provide the async Redis client for daily quota persistence.

Must be called once at service startup before any embedding calls are made. Call reload_pool() afterwards (from an async context) to merge donated keys.

Return type:

None

Parameters:

redis_client (Any)

gemini_embed_pool.next_gemini_embed_key()[source]

Thread-safe round-robin selection from the Gemini embedding key pool.

Skips keys that have been marked as daily-spent for embeddings.

Return type:

str

gemini_embed_pool.next_gemini_flash_key()[source]

Thread-safe round-robin selection for Gemini flash-lite generation calls.

Uses the same key pool as embeddings but an independent cycle so embed and generation rotations don’t interfere. Skips keys that have been marked as daily-spent for generation.

Return type:

str

gemini_embed_pool.is_daily_quota_429(resp)[source]

Return True if resp is a 429 caused by a daily (RPD) quota limit.

Parses the structured QuotaFailure violations in the response body and looks for a quotaId containing PerDay.

Return type:

bool

Parameters:

resp (httpx.Response)

gemini_embed_pool.is_daily_quota_429_for_model(resp)[source]

Identify which model class a daily-quota 429 belongs to.

Returns "embed" or "generate" based on the quotaDimensions.model field, or None if the response is not a daily 429.

Return type:

Optional[Literal['embed', 'generate']]

Parameters:

resp (httpx.Response)

async gemini_embed_pool.record_key_usage(api_key)[source]

Increment the daily request counter for api_key in Redis (best-effort).

Bumps gemini_key_daily_usage:<suffix>:count and (re)sets its TTL to the next midnight Pacific via a non-transactional pipeline, giving the diagnostics in get_pool_status() a rough per-key call count that self-expires daily. No-ops when the async Redis client is unwired, and swallows any Redis error rather than disrupting the embed call.

Called within this module by probe_all_keys() and gemini_embed_paid_fallback(), and externally by the embed transport in openrouter_client/transport.py after each Gemini POST.

Parameters:

api_key (str) – The Gemini key whose usage to record.

Return type:

None

async gemini_embed_pool.mark_key_daily_spent(api_key, model_class='embed')[source]

Mark api_key as daily-spent for model_class.

Updates both the in-memory set and the Redis flag.

Return type:

None

Parameters:
  • api_key (str)

  • model_class (Literal['embed', 'generate'])

async gemini_embed_pool.sync_spent_keys_from_redis()[source]

Rebuild both in-memory daily-spent sets from the per-key flags in Redis.

Lets a worker pick up exhaustion decisions made by other processes: for every pooled key it reads gemini_key_daily_usage:<suffix>:spent:embed and ...:spent:generate (plus the legacy suffix-less ...:spent key, which counts for both classes) and replaces _spent_keys_embed / _spent_keys_generate accordingly under _spent_keys_lock. Calls _maybe_reset_day() first so a quota-day rollover is honored, and no-ops when the async Redis client is unwired; Redis errors are swallowed.

Called within this module at the start of probe_all_keys(); no external callers were found.

Return type:

None

async gemini_embed_pool.get_pool_status()[source]

Return a snapshot of pool health (active vs spent keys, usage) for diagnostics.

Calls _maybe_reset_day() to honor a quota-day rollover, then reports the total key count, active and daily-spent counts split per model class (embed vs generate), and the current quota day. When the async Redis client is wired it also reads each key’s gemini_key_daily_usage:<suffix>:count and includes a ...<suffix>-keyed usage_counts map; Redis failures are tolerated and simply omit those counts.

Called externally by the donate_embed_key tool (tools/donate_embed_key.py) to surface pool status to users.

Returns:

Pool health fields including total_keys, embed_active / embed_spent, generate_active / generate_spent, quota_day, and (when Redis is available) usage_counts.

Return type:

dict[str, Any]

async gemini_embed_pool.probe_all_keys()[source]

Probe each non-spent key for both embedding and generation daily exhaustion.

Sends a minimal request to each endpoint and marks the key spent for the relevant model class on a daily 429.

Return type:

None

exception gemini_embed_pool.OpenRouterEmbedParseError[source]

Bases: RuntimeError

OpenRouter returned HTTP 200 but the body was not the expected {"data": [{"index": int, "embedding": [..]}...]} shape.

Treated as non-retriable by the outer retry loops in openrouter_client and rag_system.openrouter_embeddings, since the underlying provider is returning a malformed success payload (typically an upstream error surfaced as a 200) and immediate retries will hit the same issue.

gemini_embed_pool.get_openrouter_api_key()[source]

Return the OpenRouter API key for the embed fallback, from env or the default.

Prefers OPENROUTER_API_KEY, then the legacy API_KEY env var, and finally _DEFAULT_OPENROUTER_KEY so the OpenRouter embed path works out-of-the-box. The result becomes the Bearer token for the embeddings endpoint.

Called within this module by openrouter_embed_batch() and openrouter_embed_batch_sync() when no explicit api_key is passed.

Returns:

The resolved OpenRouter key (effectively always non-None given the baked-in default).

Return type:

str | None

async gemini_embed_pool.openrouter_embed_batch(texts, *, model='google/gemini-embedding-001', api_key=None, dimensions=3072)[source]

Embed texts via the OpenRouter /embeddings endpoint (async).

Retries on transient network failures and retriable HTTP codes (429, 5xx) until success or OPENROUTER_EMBED_MAX_ATTEMPTS is exhausted.

Empty or whitespace-only strings are not sent to the API; those positions receive zero vectors of length dimensions.

Return type:

list[list[float]]

Parameters:
gemini_embed_pool.openrouter_embed_batch_sync(texts, *, model='google/gemini-embedding-001', api_key=None, dimensions=3072)[source]

Embed texts via the OpenRouter /embeddings endpoint (sync).

Synchronous version for callers that cannot use await (e.g. ChromaDB embedding functions). Same retry policy as openrouter_embed_batch().

Empty or whitespace-only strings are not sent to the API; those positions receive zero vectors of length dimensions.

Return type:

list[list[float]]

Parameters:
async gemini_embed_pool.gemini_embed_paid_fallback(texts, *, model='google/gemini-embedding-001', dimensions=3072, task_type=None)[source]

Last-resort embedding via the paid tier-3 Gemini key (async).

Single attempt, no retries — caller decides what to do on failure. Records key usage in Redis (when wired) and marks the paid key daily-spent for embeddings on a daily 429.

Empty/whitespace-only inputs receive zero vectors of length dimensions.

Raises RuntimeError when no paid key is configured, when the call fails non-2xx, or when the paid key 429s.

Return type:

list[list[float]]

Parameters:
gemini_embed_pool.gemini_embed_paid_fallback_sync(texts, *, model='google/gemini-embedding-001', dimensions=3072, task_type=None)[source]

Sync mirror of gemini_embed_paid_fallback().

Used by the ChromaDB sync embedding paths. Daily-quota tracking in Redis is best-effort (no-op when the sync Redis client is unavailable), since the paid key is only hit on the slow last-resort path.

Return type:

list[list[float]]

Parameters:
gemini_embed_pool.is_openrouter_only()[source]

Return the in-memory OpenRouter-only flag without touching Redis.

A cheap, synchronous read of the process-local _openrouter_only state set by set_openrouter_only() / cleared by clear_openrouter_only(). Unlike check_openrouter_only(), it never consults Redis, so it cannot observe the flag’s TTL expiry or activations from other processes.

No in-repo callers were found outside this module; it serves as a public no-I/O accessor for the circuit-breaker state.

Returns:

True if OpenRouter-only mode is currently flagged in this process.

Return type:

bool

async gemini_embed_pool.check_openrouter_only()[source]

Return whether OpenRouter-only mode is active.

When Redis is wired (init_quota_tracking), reads the TTL key so the mode expires when Redis lapses or is cleared. When Redis is not wired, relies on in-memory state set by set_openrouter_only() only.

Return type:

bool

gemini_embed_pool.check_openrouter_only_sync()[source]

Sync variant: check Redis before each embedding call.

Used by SyncOpenRouterEmbeddings (ChromaDB path). Lazy-creates a sync Redis client from config on first use.

Return type:

bool

async gemini_embed_pool.set_openrouter_only()[source]

Activate OpenRouter-only mode for embedding calls.

Sets both the in-memory flag and a Redis key with a TTL of _OPENROUTER_ONLY_TTL (30 minutes).

Return type:

None

async gemini_embed_pool.clear_openrouter_only()[source]

Deactivate OpenRouter-only mode, restoring the free Gemini pool as primary.

Clears the in-memory _openrouter_only flag and deletes the embed:openrouter_only Redis key (when the async client is wired) so the embed cascade resumes using the free pool first. Acts as the manual / auto counterpart to set_openrouter_only(); Redis errors are swallowed.

Called within this module by _auto_clear_openrouter_only_on_parse_error(), and externally by the embedding-refresh jobs classifiers/update_tool_embeddings.py and classifiers/update_changed_tool_embeddings.py at the end of a run.

Return type:

None

gemini_embed_pool.clear_openrouter_only_sync()[source]

Sync variant: deactivate OpenRouter-only mode.

Mirrors clear_openrouter_only() for code paths that cannot await, using the same sync Redis client that check_openrouter_only_sync() initializes on first use.

Return type:

None

async gemini_embed_pool.embed_batch_via_gemini(texts, model='google/gemini-embedding-001', *, chunk_size=50)[source]

Embed a batch of texts via the native Gemini API using the shared key pool.

Returns one embedding vector per input text, in order. Empty or whitespace-only texts are replaced with zero vectors. Retries on transient errors.

Cascade on failure (after PAID_KEY_FALLBACK_THRESHOLD consecutive non-daily 429s on the free pool):

  1. OpenRouter via openrouter_embed_batch() (and pin openrouter_only for the next 30 minutes).

  2. Paid tier-3 Gemini key via gemini_embed_paid_fallback() (last resort).

When openrouter_only is already pinned, OpenRouter is tried first; if it fails on a specific batch, the paid Gemini key is tried before raising.

Return type:

list[list[float]]

Parameters:
async gemini_embed_pool.batch_check_keys_usage(redis_client, api_keys)[source]

Fetch the daily usage counts for many API keys in one pipelined Redis MGET.

Reads spent:api_key:<suffix> for every supplied key (keyed by the last 8 characters for privacy) in a single round trip and maps each full key to its integer count, defaulting missing entries to 0. Note this uses a different key namespace from the gemini_key_daily_usage:* counters written by record_key_usage(), so it reflects a separately maintained tally.

No production callers were found in the repo; it is currently exercised by tests/test_context_whitelisting.py.

Parameters:
  • redis_client (Any) – An async Redis client supporting mget.

  • api_keys (list[str]) – The full API keys to look up.

Returns:

A mapping from each input API key to its usage count.

Return type:

dict[str, int]