gemini_embed_pool
Shared Gemini API key pool for rate-limit distribution.
All embedding calls use next_gemini_embed_key() and all flash-lite generation calls use next_gemini_flash_key() for round-robin key selection. Both accessors draw from the same underlying key pool but maintain independent rotation cycles so they don’t interfere with each other.
Supports GEMINI_EMBED_KEY_POOL env var (comma-separated keys) with fallback to the default hardcoded pool.
When the free pool is exhausted (429s exceed PAID_KEY_FALLBACK_THRESHOLD), callers should switch to the paid key returned by get_paid_fallback_key().
Daily quota tracking
Quotas are tracked per model class ("embed" vs "generate").
Embedding RPD and generation RPD are separate quotas on the same key, so a
key spent for embeddings can still serve flash-lite generation and vice versa.
Each key’s usage is tracked in Redis and two in-memory spent sets. Keys that
receive a daily-quota 429 (PerDay in the quotaId) are excluded from
rotation for the relevant model class until midnight Pacific Time. A
background probe every 2 hours detects keys exhausted by external usage.
- gemini_embed_pool.get_paid_fallback_key()[source]
Return the paid Gemini API key for embedding fallback, or None.
- async gemini_embed_pool.reload_pool()[source]
Merge donated keys from Redis into the live pool and rebuild cycles.
Returns the new total pool size. Safe to call multiple times.
- Return type:
- gemini_embed_pool.init_quota_tracking(redis_client)[source]
Provide the async Redis client for daily quota persistence.
Must be called once at startup (from
main.py) before any embedding calls are made. Callreload_pool()afterwards (from an async context) to merge donated keys.
- gemini_embed_pool.next_gemini_embed_key()[source]
Thread-safe round-robin selection from the Gemini embedding key pool.
Skips keys that have been marked as daily-spent for embeddings.
- Return type:
- gemini_embed_pool.next_gemini_flash_key()[source]
Thread-safe round-robin selection for Gemini flash-lite generation calls.
Uses the same key pool as embeddings but an independent cycle so embed and generation rotations don’t interfere. Skips keys that have been marked as daily-spent for generation.
- Return type:
- gemini_embed_pool.is_daily_quota_429(resp)[source]
Return True if resp is a 429 caused by a daily (RPD) quota limit.
Parses the structured
QuotaFailureviolations in the response body and looks for aquotaIdcontainingPerDay.- Return type:
- Parameters:
resp (httpx.Response)
- gemini_embed_pool.is_daily_quota_429_for_model(resp)[source]
Identify which model class a daily-quota 429 belongs to.
Returns
"embed"or"generate"based on thequotaDimensions.modelfield, or None if the response is not a daily 429.
- async gemini_embed_pool.record_key_usage(api_key)[source]
Increment the daily request counter for api_key in Redis.
- async gemini_embed_pool.mark_key_daily_spent(api_key, model_class='embed')[source]
Mark api_key as daily-spent for model_class.
Updates both the in-memory set and the Redis flag.
- async gemini_embed_pool.sync_spent_keys_from_redis()[source]
Refresh both in-memory spent sets from Redis.
- Return type:
- async gemini_embed_pool.get_pool_status()[source]
Return a summary of pool health for diagnostics / logging.
- async gemini_embed_pool.probe_all_keys()[source]
Probe each non-spent key for both embedding and generation daily exhaustion.
Sends a minimal request to each endpoint and marks the key spent for the relevant model class on a daily 429.
- Return type:
- gemini_embed_pool.get_openrouter_api_key()[source]
Return the OpenRouter API key from environment or default.
- async gemini_embed_pool.openrouter_embed_batch(texts, *, model='google/gemini-embedding-001', api_key=None, dimensions=3072)[source]
Embed texts via the OpenRouter /embeddings endpoint (async).
Used as a last-resort fallback when all Gemini keys (including the paid key) are rate-limited.
- gemini_embed_pool.openrouter_embed_batch_sync(texts, *, model='google/gemini-embedding-001', api_key=None, dimensions=3072)[source]
Embed texts via the OpenRouter /embeddings endpoint (sync).
Synchronous version for callers that cannot use
await(e.g. ChromaDB embedding functions).
- gemini_embed_pool.is_openrouter_only()[source]
Return the in-memory OpenRouter-only flag (no I/O).
- Return type:
- async gemini_embed_pool.check_openrouter_only()[source]
Return whether OpenRouter-only mode is active.
Always checks Redis before each embedding call so the mode expires when the Redis TTL lapses or is manually disabled.
- Return type:
- gemini_embed_pool.check_openrouter_only_sync()[source]
Sync variant: check Redis before each embedding call.
Used by SyncOpenRouterEmbeddings (ChromaDB path). Lazy-creates a sync Redis client from config on first use.
- Return type:
- async gemini_embed_pool.set_openrouter_only()[source]
Activate OpenRouter-only mode for embedding calls.
Sets both the in-memory flag and a Redis key with a 4-hour TTL.
- Return type:
- async gemini_embed_pool.clear_openrouter_only()[source]
Deactivate OpenRouter-only mode (manual override).
- Return type:
- async gemini_embed_pool.embed_batch_via_gemini(texts, model='google/gemini-embedding-001', *, chunk_size=50)[source]
Embed a batch of texts via the native Gemini API using the shared key pool.
Returns one embedding vector per input text, in order. Empty or whitespace-only texts are replaced with zero vectors. Retries on transient errors and falls back to the paid key after
PAID_KEY_FALLBACK_THRESHOLDconsecutive 429s.