gemini_embed_pool

Shared Gemini API key pool for rate-limit distribution.

All embedding calls use next_gemini_embed_key() and all flash-lite generation calls use next_gemini_flash_key() for round-robin key selection. Both accessors draw from the same underlying key pool but maintain independent rotation cycles so they don’t interfere with each other.

Supports GEMINI_EMBED_KEY_POOL env var (comma-separated keys) with fallback to the default hardcoded pool.

When the free pool is exhausted (429s exceed PAID_KEY_FALLBACK_THRESHOLD), callers should switch to the paid key returned by get_paid_fallback_key().

Daily quota tracking

Quotas are tracked per model class ("embed" vs "generate"). Embedding RPD and generation RPD are separate quotas on the same key, so a key spent for embeddings can still serve flash-lite generation and vice versa.

Each key’s usage is tracked in Redis and two in-memory spent sets. Keys that receive a daily-quota 429 (PerDay in the quotaId) are excluded from rotation for the relevant model class until midnight Pacific Time. A background probe every 2 hours detects keys exhausted by external usage.

gemini_embed_pool.get_paid_fallback_key()[source]

Return the paid Gemini API key for embedding fallback, or None.

Return type:: str | None

async gemini_embed_pool.reload_pool()[source]

Merge donated keys from Redis into the live pool and rebuild cycles.

Returns the new total pool size. Safe to call multiple times.

Return type:: int

gemini_embed_pool.init_quota_tracking(redis_client)[source]

Provide the async Redis client for daily quota persistence.

Must be called once at startup (from main.py) before any embedding calls are made. Call reload_pool() afterwards (from an async context) to merge donated keys.

Return type:: None
Parameters:: redis_client (Any)

gemini_embed_pool.next_gemini_embed_key()[source]

Thread-safe round-robin selection from the Gemini embedding key pool.

Skips keys that have been marked as daily-spent for embeddings.

Return type:: str

gemini_embed_pool.next_gemini_flash_key()[source]

Thread-safe round-robin selection for Gemini flash-lite generation calls.

Uses the same key pool as embeddings but an independent cycle so embed and generation rotations don’t interfere. Skips keys that have been marked as daily-spent for generation.

Return type:: str

gemini_embed_pool.is_daily_quota_429(resp)[source]

Return True if resp is a 429 caused by a daily (RPD) quota limit.

Parses the structured QuotaFailure violations in the response body and looks for a quotaId containing PerDay.

Return type:: bool
Parameters:: resp (httpx.Response)

gemini_embed_pool.is_daily_quota_429_for_model(resp)[source]

Identify which model class a daily-quota 429 belongs to.

Returns "embed" or "generate" based on the quotaDimensions.model field, or None if the response is not a daily 429.

Return type:: Optional[Literal['embed', 'generate']]
Parameters:: resp (httpx.Response)

async gemini_embed_pool.record_key_usage(api_key)[source]

Increment the daily request counter for api_key in Redis.

Return type:: None
Parameters:: api_key (str)

async gemini_embed_pool.mark_key_daily_spent(api_key, model_class='embed')[source]

Mark api_key as daily-spent for model_class.

Updates both the in-memory set and the Redis flag.

Return type:

None

Parameters:

api_key (str)
model_class (Literal['embed', 'generate'])

async gemini_embed_pool.sync_spent_keys_from_redis()[source]

Refresh both in-memory spent sets from Redis.

Return type:: None

async gemini_embed_pool.get_pool_status()[source]

Return a summary of pool health for diagnostics / logging.

Return type:: dict[str, Any]

async gemini_embed_pool.probe_all_keys()[source]

Probe each non-spent key for both embedding and generation daily exhaustion.

Sends a minimal request to each endpoint and marks the key spent for the relevant model class on a daily 429.

Return type:: None

gemini_embed_pool.get_openrouter_api_key()[source]

Return the OpenRouter API key from environment or default.

Return type:: str | None

async gemini_embed_pool.openrouter_embed_batch(texts, *, model='google/gemini-embedding-001', api_key=None, dimensions=3072)[source]

Embed texts via the OpenRouter /embeddings endpoint (async).

Used as a last-resort fallback when all Gemini keys (including the paid key) are rate-limited.

Return type:

list[list[float]]

Parameters:

texts (list[str])
model (str)
api_key (str | None)
dimensions (int)

gemini_embed_pool.openrouter_embed_batch_sync(texts, *, model='google/gemini-embedding-001', api_key=None, dimensions=3072)[source]

Embed texts via the OpenRouter /embeddings endpoint (sync).

Synchronous version for callers that cannot use await (e.g. ChromaDB embedding functions).

Return type:

list[list[float]]

Parameters:

texts (list[str])
model (str)
api_key (str | None)
dimensions (int)

gemini_embed_pool.is_openrouter_only()[source]

Return the in-memory OpenRouter-only flag (no I/O).

Return type:: bool

async gemini_embed_pool.check_openrouter_only()[source]

Return whether OpenRouter-only mode is active.

Always checks Redis before each embedding call so the mode expires when the Redis TTL lapses or is manually disabled.

Return type:: bool

gemini_embed_pool.check_openrouter_only_sync()[source]

Sync variant: check Redis before each embedding call.

Used by SyncOpenRouterEmbeddings (ChromaDB path). Lazy-creates a sync Redis client from config on first use.

Return type:: bool

async gemini_embed_pool.set_openrouter_only()[source]

Activate OpenRouter-only mode for embedding calls.

Sets both the in-memory flag and a Redis key with a 4-hour TTL.

Return type:: None

async gemini_embed_pool.clear_openrouter_only()[source]

Deactivate OpenRouter-only mode (manual override).

Return type:: None

async gemini_embed_pool.embed_batch_via_gemini(texts, model='google/gemini-embedding-001', *, chunk_size=50)[source]

Embed a batch of texts via the native Gemini API using the shared key pool.

Returns one embedding vector per input text, in order. Empty or whitespace-only texts are replaced with zero vectors. Retries on transient errors and falls back to the paid key after PAID_KEY_FALLBACK_THRESHOLD consecutive 429s.

Return type:

list[list[float]]

Parameters:

texts (list[str])
model (str)
chunk_size (int)