gemini_embed_pool
Shared Gemini API key pool for rate-limit distribution.
All embedding calls use next_gemini_embed_key() and all flash-lite generation calls use next_gemini_flash_key() for round-robin key selection. Both accessors draw from the same underlying key pool but maintain independent rotation cycles so they don’t interfere with each other.
Supports GEMINI_EMBED_KEY_POOL env var (comma-separated keys) with fallback to the default hardcoded pool.
Embedding fallback cascade
When the free pool starts 429-ing (consecutive failures exceed
PAID_KEY_FALLBACK_THRESHOLD), the embed call sites cascade in this order:
Free-tier pool (round-robin via
next_gemini_embed_key()).OpenRouter (
openrouter_embed_batch()/..._sync) — engaged viaset_openrouter_only()so subsequent calls go straight here.Paid tier-3 Gemini key (
gemini_embed_paid_fallback()/..._sync) — used as the absolute last resort when both the free pool and OpenRouter have failed. Its key is returned byget_paid_fallback_key()(env overrideGEMINI_EMBED_PAID_KEY, default_DEFAULT_PAID_KEY).
When openrouter_only is pinned, callers must still fall through to the
paid-key helper if OpenRouter fails on a specific call before raising.
OpenRouter embed fallback (openrouter_embed_batch / openrouter_embed_batch_sync)
retries transient network errors and HTTP 429 / 5xx with exponential backoff.
Tune with env: OPENROUTER_EMBED_MAX_ATTEMPTS (default 24),
OPENROUTER_EMBED_RETRY_BASE_DELAY, OPENROUTER_EMBED_RETRY_MAX_DELAY.
Daily quota tracking
Quotas are tracked per model class ("embed" vs "generate").
Embedding RPD and generation RPD are separate quotas on the same key, so a
key spent for embeddings can still serve flash-lite generation and vice versa.
Each key’s usage is tracked in Redis and two in-memory spent sets. Keys that
receive a daily-quota 429 (PerDay in the quotaId) are excluded from
rotation for the relevant model class until midnight Pacific Time. A
background probe every 2 hours detects keys exhausted by external usage.
- gemini_embed_pool.get_paid_fallback_key()[source]
Return the paid tier-3 Gemini API key used as the absolute last-resort fallback.
Reads the
GEMINI_EMBED_PAID_KEYenv override and falls back to_DEFAULT_PAID_KEY. This key is only reached once the free pool and OpenRouter have both failed, and is also handed out by_next_active_key()when every free key is daily-spent.Called within this module by
_next_active_key(),gemini_embed_paid_fallback(), andgemini_embed_paid_fallback_sync(), and externally bygemini_kg_bulk_client.pyandclassifiers/build_tool_index.pyfor their own paid-key fallbacks.
- async gemini_embed_pool.reload_pool()[source]
Merge donated keys from Redis into the live pool and rebuild cycles.
Returns the new total pool size. Safe to call multiple times.
- Return type:
- gemini_embed_pool.init_quota_tracking(redis_client)[source]
Provide the async Redis client for daily quota persistence.
Must be called once at service startup before any embedding calls are made. Call
reload_pool()afterwards (from an async context) to merge donated keys.
- gemini_embed_pool.next_gemini_embed_key()[source]
Thread-safe round-robin selection from the Gemini embedding key pool.
Skips keys that have been marked as daily-spent for embeddings.
- Return type:
- gemini_embed_pool.next_gemini_flash_key()[source]
Thread-safe round-robin selection for Gemini flash-lite generation calls.
Uses the same key pool as embeddings but an independent cycle so embed and generation rotations don’t interfere. Skips keys that have been marked as daily-spent for generation.
- Return type:
- gemini_embed_pool.is_daily_quota_429(resp)[source]
Return True if resp is a 429 caused by a daily (RPD) quota limit.
Parses the structured
QuotaFailureviolations in the response body and looks for aquotaIdcontainingPerDay.- Return type:
- Parameters:
resp (httpx.Response)
- gemini_embed_pool.is_daily_quota_429_for_model(resp)[source]
Identify which model class a daily-quota 429 belongs to.
Returns
"embed"or"generate"based on thequotaDimensions.modelfield, or None if the response is not a daily 429.
- async gemini_embed_pool.record_key_usage(api_key)[source]
Increment the daily request counter for api_key in Redis (best-effort).
Bumps
gemini_key_daily_usage:<suffix>:countand (re)sets its TTL to the next midnight Pacific via a non-transactional pipeline, giving the diagnostics inget_pool_status()a rough per-key call count that self-expires daily. No-ops when the async Redis client is unwired, and swallows any Redis error rather than disrupting the embed call.Called within this module by
probe_all_keys()andgemini_embed_paid_fallback(), and externally by the embed transport inopenrouter_client/transport.pyafter each Gemini POST.
- async gemini_embed_pool.mark_key_daily_spent(api_key, model_class='embed')[source]
Mark api_key as daily-spent for model_class.
Updates both the in-memory set and the Redis flag.
- async gemini_embed_pool.sync_spent_keys_from_redis()[source]
Rebuild both in-memory daily-spent sets from the per-key flags in Redis.
Lets a worker pick up exhaustion decisions made by other processes: for every pooled key it reads
gemini_key_daily_usage:<suffix>:spent:embedand...:spent:generate(plus the legacy suffix-less...:spentkey, which counts for both classes) and replaces_spent_keys_embed/_spent_keys_generateaccordingly under_spent_keys_lock. Calls_maybe_reset_day()first so a quota-day rollover is honored, and no-ops when the async Redis client is unwired; Redis errors are swallowed.Called within this module at the start of
probe_all_keys(); no external callers were found.- Return type:
- async gemini_embed_pool.get_pool_status()[source]
Return a snapshot of pool health (active vs spent keys, usage) for diagnostics.
Calls
_maybe_reset_day()to honor a quota-day rollover, then reports the total key count, active and daily-spent counts split per model class (embed vs generate), and the current quota day. When the async Redis client is wired it also reads each key’sgemini_key_daily_usage:<suffix>:countand includes a...<suffix>-keyedusage_countsmap; Redis failures are tolerated and simply omit those counts.Called externally by the
donate_embed_keytool (tools/donate_embed_key.py) to surface pool status to users.
- async gemini_embed_pool.probe_all_keys()[source]
Probe each non-spent key for both embedding and generation daily exhaustion.
Sends a minimal request to each endpoint and marks the key spent for the relevant model class on a daily 429.
- Return type:
- exception gemini_embed_pool.OpenRouterEmbedParseError[source]
Bases:
RuntimeErrorOpenRouter returned HTTP 200 but the body was not the expected
{"data": [{"index": int, "embedding": [..]}...]}shape.Treated as non-retriable by the outer retry loops in
openrouter_clientandrag_system.openrouter_embeddings, since the underlying provider is returning a malformed success payload (typically an upstream error surfaced as a 200) and immediate retries will hit the same issue.
- gemini_embed_pool.get_openrouter_api_key()[source]
Return the OpenRouter API key for the embed fallback, from env or the default.
Prefers
OPENROUTER_API_KEY, then the legacyAPI_KEYenv var, and finally_DEFAULT_OPENROUTER_KEYso the OpenRouter embed path works out-of-the-box. The result becomes theBearertoken for the embeddings endpoint.Called within this module by
openrouter_embed_batch()andopenrouter_embed_batch_sync()when no explicitapi_keyis passed.
- async gemini_embed_pool.openrouter_embed_batch(texts, *, model='google/gemini-embedding-001', api_key=None, dimensions=3072)[source]
Embed texts via the OpenRouter /embeddings endpoint (async).
Retries on transient network failures and retriable HTTP codes (429, 5xx) until success or
OPENROUTER_EMBED_MAX_ATTEMPTSis exhausted.Empty or whitespace-only strings are not sent to the API; those positions receive zero vectors of length dimensions.
- gemini_embed_pool.openrouter_embed_batch_sync(texts, *, model='google/gemini-embedding-001', api_key=None, dimensions=3072)[source]
Embed texts via the OpenRouter /embeddings endpoint (sync).
Synchronous version for callers that cannot use
await(e.g. ChromaDB embedding functions). Same retry policy asopenrouter_embed_batch().Empty or whitespace-only strings are not sent to the API; those positions receive zero vectors of length dimensions.
- async gemini_embed_pool.gemini_embed_paid_fallback(texts, *, model='google/gemini-embedding-001', dimensions=3072, task_type=None)[source]
Last-resort embedding via the paid tier-3 Gemini key (async).
Single attempt, no retries — caller decides what to do on failure. Records key usage in Redis (when wired) and marks the paid key daily-spent for embeddings on a daily 429.
Empty/whitespace-only inputs receive zero vectors of length dimensions.
Raises
RuntimeErrorwhen no paid key is configured, when the call fails non-2xx, or when the paid key 429s.
- gemini_embed_pool.gemini_embed_paid_fallback_sync(texts, *, model='google/gemini-embedding-001', dimensions=3072, task_type=None)[source]
Sync mirror of
gemini_embed_paid_fallback().Used by the ChromaDB sync embedding paths. Daily-quota tracking in Redis is best-effort (no-op when the sync Redis client is unavailable), since the paid key is only hit on the slow last-resort path.
- gemini_embed_pool.is_openrouter_only()[source]
Return the in-memory OpenRouter-only flag without touching Redis.
A cheap, synchronous read of the process-local
_openrouter_onlystate set byset_openrouter_only()/ cleared byclear_openrouter_only(). Unlikecheck_openrouter_only(), it never consults Redis, so it cannot observe the flag’s TTL expiry or activations from other processes.No in-repo callers were found outside this module; it serves as a public no-I/O accessor for the circuit-breaker state.
- Returns:
Trueif OpenRouter-only mode is currently flagged in this process.- Return type:
- async gemini_embed_pool.check_openrouter_only()[source]
Return whether OpenRouter-only mode is active.
When Redis is wired (
init_quota_tracking), reads the TTL key so the mode expires when Redis lapses or is cleared. When Redis is not wired, relies on in-memory state set byset_openrouter_only()only.- Return type:
- gemini_embed_pool.check_openrouter_only_sync()[source]
Sync variant: check Redis before each embedding call.
Used by SyncOpenRouterEmbeddings (ChromaDB path). Lazy-creates a sync Redis client from config on first use.
- Return type:
- async gemini_embed_pool.set_openrouter_only()[source]
Activate OpenRouter-only mode for embedding calls.
Sets both the in-memory flag and a Redis key with a TTL of
_OPENROUTER_ONLY_TTL(30 minutes).- Return type:
- async gemini_embed_pool.clear_openrouter_only()[source]
Deactivate OpenRouter-only mode, restoring the free Gemini pool as primary.
Clears the in-memory
_openrouter_onlyflag and deletes theembed:openrouter_onlyRedis key (when the async client is wired) so the embed cascade resumes using the free pool first. Acts as the manual / auto counterpart toset_openrouter_only(); Redis errors are swallowed.Called within this module by
_auto_clear_openrouter_only_on_parse_error(), and externally by the embedding-refresh jobsclassifiers/update_tool_embeddings.pyandclassifiers/update_changed_tool_embeddings.pyat the end of a run.- Return type:
- gemini_embed_pool.clear_openrouter_only_sync()[source]
Sync variant: deactivate OpenRouter-only mode.
Mirrors
clear_openrouter_only()for code paths that cannot await, using the same sync Redis client thatcheck_openrouter_only_sync()initializes on first use.- Return type:
- async gemini_embed_pool.embed_batch_via_gemini(texts, model='google/gemini-embedding-001', *, chunk_size=50)[source]
Embed a batch of texts via the native Gemini API using the shared key pool.
Returns one embedding vector per input text, in order. Empty or whitespace-only texts are replaced with zero vectors. Retries on transient errors.
Cascade on failure (after
PAID_KEY_FALLBACK_THRESHOLDconsecutive non-daily 429s on the free pool):OpenRouter via
openrouter_embed_batch()(and pinopenrouter_onlyfor the next 30 minutes).Paid tier-3 Gemini key via
gemini_embed_paid_fallback()(last resort).
When
openrouter_onlyis already pinned, OpenRouter is tried first; if it fails on a specific batch, the paid Gemini key is tried before raising.
- async gemini_embed_pool.batch_check_keys_usage(redis_client, api_keys)[source]
Fetch the daily usage counts for many API keys in one pipelined Redis MGET.
Reads
spent:api_key:<suffix>for every supplied key (keyed by the last 8 characters for privacy) in a single round trip and maps each full key to its integer count, defaulting missing entries to 0. Note this uses a different key namespace from thegemini_key_daily_usage:*counters written byrecord_key_usage(), so it reflects a separately maintained tally.No production callers were found in the repo; it is currently exercised by
tests/test_context_whitelisting.py.