ncm_variant_cache

NCM Cue Variant Cache — LLM-generated rephrasing variants for cascade cues.

On first use of any cue/reason string a background task fires a cheap OpenRouter LLM call to generate alternative phrasings, then immediately batch-embeds them using google/gemini-embedding-001. The result is stored in Redis with a 24h TTL and held in-memory so subsequent turns pay no I/O cost. Variant selection uses cosine similarity (dot product on unit vectors) against the current limbic emotional context. Falls back to random.choice when the context embedding is not yet available.

Redis key format: ncm:cue_variant:{sha256_hex_of_original_string} Redis value (v2): JSON object — see _CacheEntry below Redis value (v1): JSON array of strings (old format — loaded as text-only,

re-embedded lazily on next ensure_cached call)

class ncm_variant_cache.CueVariantCache(redis_client=None, api_key=None, openrouter_client=None, variant_models=None)[source]

Bases: object

Lazy LLM-backed variant cache for cascade cue and reason strings.

Variant selection is context-aware: before the cascade engine runs, call set_context(emotion_text) to register the current dominant- emotion string. get_variant() then picks the variant whose embedding has the highest cosine similarity to the context embedding. Falls back to random.choice until the context embedding is ready.

Parameters:
  • redis_client – An redis.asyncio.Redis instance. May be None.

  • api_key (Optional[str]) – OpenRouter API key. When None the cache is a no-op.

  • openrouter_client (OpenRouterClient | None)

  • variant_models (Optional[list[str]])

__init__(redis_client=None, api_key=None, openrouter_client=None, variant_models=None)[source]

Initialize the instance.

Parameters:
  • redis_client – Redis connection client.

  • api_key (Optional[str]) – The api key value.

  • openrouter_client (OpenRouterClient | None) – Shared OpenRouterClient for connection pooling and batch embedding. Falls back to direct HTTP when None.

  • variant_models (Optional[list[str]]) – Override models for LLM generation. When None, uses VARIANT_MODELS. Use e.g. [“google/gemini-3.1-flash-lite-preview”] for paid-only, high-throughput pregeneration.

Return type:

None

set_context(context_text)[source]

Register the current turn’s emotional context for variant selection.

Sync — returns immediately. If the context embedding is already in _query_cache, the query vector is updated synchronously. Otherwise the query vector is cleared (falling back to random this turn) and a background embed task is scheduled.

Return type:

None

Parameters:

context_text (str)

get_variant(s)[source]

Return the most contextually resonant variant, or the original.

Uses cosine similarity (dot product on unit vectors) against _current_query_vec when both the query vector and per-variant embeddings are available. Falls back to random.choice otherwise.

Return type:

str

Parameters:

s (str)

async ensure_cached(s)[source]

Generate, embed, and cache variants for s if not already present.

Fire-and-forget safe — call via asyncio.create_task().

Return type:

None

Parameters:

s (str)

async load_all_from_redis()[source]

Warm the in-memory layer by scanning all cached entries in Redis.

Now that the v2 format stores the original string inside the value, this can populate _mem for every previously generated cue.

Return type:

None

async drain()[source]

Wait for all background embedding/generation tasks to complete.

Return type:

None