ncm_variant_cache
NCM Cue Variant Cache — LLM-generated rephrasing variants for cascade cues.
On first use of any cue/reason string a background task fires a cheap OpenRouter LLM call to generate alternative phrasings, then immediately batch-embeds them using google/gemini-embedding-001. The result is stored in Redis with a randomized 7-14 day TTL and held in-memory so subsequent turns pay no I/O cost.
Variant selection uses cosine similarity (dot product on unit vectors) against the current limbic emotional context. Falls back to random.choice when the context embedding is not yet available.
Redis layout:
Key:
ncm:cue_variant:{sha256_hex_of_original_string}Value (v2): JSON object — see
_CacheEntrybelowValue (v1): JSON array of strings (old format — loaded as text-only, re-embedded lazily on next
ensure_cachedcall)
- class ncm_variant_cache.CueVariantCache(redis_client=None, api_key=None, openrouter_client=None, variant_models=None)[source]
Bases:
objectLazy LLM-backed variant cache for cascade cue and reason strings.
Variant selection is context-aware: before the cascade engine runs, call
set_context(emotion_text)to register the current dominant- emotion string.get_variant()then picks the variant whose embedding has the highest cosine similarity to the context embedding. Falls back torandom.choiceuntil the context embedding is ready.- Parameters:
redis_client – An
redis.asyncio.Redisinstance. May be None.api_key (
Optional[str]) – OpenRouter API key. When None the cache is a no-op.openrouter_client (OpenRouterClient | None)
- __init__(redis_client=None, api_key=None, openrouter_client=None, variant_models=None)[source]
Initialize the instance.
- Parameters:
redis_client – Redis connection client.
openrouter_client (
OpenRouterClient|None) – Shared OpenRouterClient for connection pooling and batch embedding. Falls back to direct HTTP when None.variant_models (
Optional[list[str]]) – Override models for LLM generation. When None, uses VARIANT_MODELS. Use e.g. [“google/gemini-3.1-flash-lite”] for paid-only, high-throughput pregeneration.
- Return type:
None
- set_context(context_text)[source]
Register the current turn’s emotional context for variant selection.
Records the dominant-emotion string that
get_variantshould resonate with this turn. Returns immediately (no awaiting): on an_query_cachehit it sets_current_query_vecsynchronously so the very nextget_variantis context-aware; on a miss it clears the query vector (so this turn falls back torandom.choice) and schedules a detached_embed_contexttask to populate the cache for later turns, tracking it in_background_tasks. A missing running loop is swallowed by discarding the pending marker. Side effects: mutates_current_query_vecand_embed_pendingand may spawn a background embedding task; no Redis or filesystem access here. Called byLimbicCoordinatorinlimbic_system/coordinator.py(twice, when the dominant emotion and the broader context are known).
- get_variant(s)[source]
Return the most contextually resonant cached variant, or the original.
The synchronous hot-path read: looks up
sin the in-memory_memmap, and when the entry has per-variant embeddings and a current query vector (set byset_context) it scores each variant with_dotand returns the highest-similarity phrasing. With no warmed query vector, no embeddings, or any scoring error it falls back torandom.choiceover the cached texts; with no cache entry at all it returnssunchanged. Read-only with no I/O or side effects. Called bycascade_engine.pyand byLimbicCoordinatorinlimbic_system/coordinator.pyto colour cue and reason strings.
- async ensure_cached(s)[source]
Generate, embed, and cache variants for
sif not already present.The lazy warm-up path for a single cue: short-circuits when
sis already fully embedded in_mem, then takes aDistributedLock(sg:lock:variant_gen:{cue_hash}) so only one worker across the cluster generates a given cue. It reads Redis first via_cache_keyand_entry_from_redis— loading a complete v2 hit directly, or keeping a v1/embedding-less hit’s texts to re-embed — and otherwise runs the full pipeline:_generate(LLM rephrasings) then_embed_texts(Gemini/OpenRouter embeddings), populating_memand writing back through_write_redis. Fire-and-forget safe and idempotent; the lock is always released in afinallyblock. Side effects: Redis reads/writes, LLM and embedding network calls, and a distributed lock. Called bycascade_engine.pyandLimbicCoordinator(viaasyncio.create_task), by the cue pregeneration script, and in the telemetry/lock-migration tests.
- async load_all_from_redis()[source]
Warm the in-memory layer by scanning every cached entry in Redis.
Bulk-restores
_memon startup bySCAN-paging thencm:cue_variant:keyspace, fetching each page’s values concurrently behind anasyncio.Semaphore(limit 2) to avoid saturating the connection pool, and decoding them with_entry_from_redis. Only v2 entries that carry bothoriginalandtextsare reinstated — because the original string lives inside the value, every previously generated cue can be keyed back into_memwithout re-querying the LLM. Read errors and decode failures are logged and skipped. Side effects: Redis scans/reads and mutation of_mem; a no-op whenself._redisis unset. Called once byLimbicCoordinatorat startup (viaasyncio.create_task), by the cue pregeneration script, and in the telemetry/lock-migration tests.- Return type:
- Returns:
None
- async drain()[source]
Await all in-flight background embedding tasks, then clear the set.
Provides a clean shutdown / barrier point: it gathers every task in
_background_tasks(the detached_embed_contextcoroutines spawned byset_context) with exceptions suppressed, then empties the set so no stragglers leak. Side effects: blocks on outstanding tasks and mutates_background_tasks. Called by the cue pregeneration script (scripts/pregenerate_ncm_cue_variants) to make sure all embeddings finish before the process exits.- Return type:
- Returns:
None