ncm_variant_cache

NCM Cue Variant Cache — LLM-generated rephrasing variants for cascade cues.

On first use of any cue/reason string a background task fires a cheap OpenRouter LLM call to generate alternative phrasings, then immediately batch-embeds them using google/gemini-embedding-001. The result is stored in Redis with a randomized 7-14 day TTL and held in-memory so subsequent turns pay no I/O cost.

Variant selection uses cosine similarity (dot product on unit vectors) against the current limbic emotional context. Falls back to random.choice when the context embedding is not yet available.

Redis layout:

  • Key: ncm:cue_variant:{sha256_hex_of_original_string}

  • Value (v2): JSON object — see _CacheEntry below

  • Value (v1): JSON array of strings (old format — loaded as text-only, re-embedded lazily on next ensure_cached call)

class ncm_variant_cache.CueVariantCache(redis_client=None, api_key=None, openrouter_client=None, variant_models=None)[source]

Bases: object

Lazy LLM-backed variant cache for cascade cue and reason strings.

Variant selection is context-aware: before the cascade engine runs, call set_context(emotion_text) to register the current dominant- emotion string. get_variant() then picks the variant whose embedding has the highest cosine similarity to the context embedding. Falls back to random.choice until the context embedding is ready.

Parameters:
  • redis_client – An redis.asyncio.Redis instance. May be None.

  • api_key (Optional[str]) – OpenRouter API key. When None the cache is a no-op.

  • openrouter_client (OpenRouterClient | None)

  • variant_models (Optional[list[str]])

__init__(redis_client=None, api_key=None, openrouter_client=None, variant_models=None)[source]

Initialize the instance.

Parameters:
  • redis_client – Redis connection client.

  • api_key (Optional[str]) – The api key value.

  • openrouter_client (OpenRouterClient | None) – Shared OpenRouterClient for connection pooling and batch embedding. Falls back to direct HTTP when None.

  • variant_models (Optional[list[str]]) – Override models for LLM generation. When None, uses VARIANT_MODELS. Use e.g. [“google/gemini-3.1-flash-lite”] for paid-only, high-throughput pregeneration.

Return type:

None

set_context(context_text)[source]

Register the current turn’s emotional context for variant selection.

Records the dominant-emotion string that get_variant should resonate with this turn. Returns immediately (no awaiting): on an _query_cache hit it sets _current_query_vec synchronously so the very next get_variant is context-aware; on a miss it clears the query vector (so this turn falls back to random.choice) and schedules a detached _embed_context task to populate the cache for later turns, tracking it in _background_tasks. A missing running loop is swallowed by discarding the pending marker. Side effects: mutates _current_query_vec and _embed_pending and may spawn a background embedding task; no Redis or filesystem access here. Called by LimbicCoordinator in limbic_system/coordinator.py (twice, when the dominant emotion and the broader context are known).

Parameters:

context_text (str) – The current dominant-emotion / context string.

Return type:

None

Returns:

None

get_variant(s)[source]

Return the most contextually resonant cached variant, or the original.

The synchronous hot-path read: looks up s in the in-memory _mem map, and when the entry has per-variant embeddings and a current query vector (set by set_context) it scores each variant with _dot and returns the highest-similarity phrasing. With no warmed query vector, no embeddings, or any scoring error it falls back to random.choice over the cached texts; with no cache entry at all it returns s unchanged. Read-only with no I/O or side effects. Called by cascade_engine.py and by LimbicCoordinator in limbic_system/coordinator.py to colour cue and reason strings.

Parameters:

s (str) – The original cue/reason string to find a variant for.

Returns:

The selected variant phrasing, or s itself when no cached variants exist.

Return type:

str

async ensure_cached(s)[source]

Generate, embed, and cache variants for s if not already present.

The lazy warm-up path for a single cue: short-circuits when s is already fully embedded in _mem, then takes a DistributedLock (sg:lock:variant_gen:{cue_hash}) so only one worker across the cluster generates a given cue. It reads Redis first via _cache_key and _entry_from_redis — loading a complete v2 hit directly, or keeping a v1/embedding-less hit’s texts to re-embed — and otherwise runs the full pipeline: _generate (LLM rephrasings) then _embed_texts (Gemini/OpenRouter embeddings), populating _mem and writing back through _write_redis. Fire-and-forget safe and idempotent; the lock is always released in a finally block. Side effects: Redis reads/writes, LLM and embedding network calls, and a distributed lock. Called by cascade_engine.py and LimbicCoordinator (via asyncio.create_task), by the cue pregeneration script, and in the telemetry/lock-migration tests.

Parameters:

s (str) – The cue/reason string to ensure variants for.

Return type:

None

Returns:

None

async load_all_from_redis()[source]

Warm the in-memory layer by scanning every cached entry in Redis.

Bulk-restores _mem on startup by SCAN-paging the ncm:cue_variant: keyspace, fetching each page’s values concurrently behind an asyncio.Semaphore (limit 2) to avoid saturating the connection pool, and decoding them with _entry_from_redis. Only v2 entries that carry both original and texts are reinstated — because the original string lives inside the value, every previously generated cue can be keyed back into _mem without re-querying the LLM. Read errors and decode failures are logged and skipped. Side effects: Redis scans/reads and mutation of _mem; a no-op when self._redis is unset. Called once by LimbicCoordinator at startup (via asyncio.create_task), by the cue pregeneration script, and in the telemetry/lock-migration tests.

Return type:

None

Returns:

None

async drain()[source]

Await all in-flight background embedding tasks, then clear the set.

Provides a clean shutdown / barrier point: it gathers every task in _background_tasks (the detached _embed_context coroutines spawned by set_context) with exceptions suppressed, then empties the set so no stragglers leak. Side effects: blocks on outstanding tasks and mutates _background_tasks. Called by the cue pregeneration script (scripts/pregenerate_ncm_cue_variants) to make sure all embeddings finish before the process exits.

Return type:

None

Returns:

None