ncm_variant_cache
NCM Cue Variant Cache — LLM-generated rephrasing variants for cascade cues.
On first use of any cue/reason string a background task fires a cheap OpenRouter LLM call to generate alternative phrasings, then immediately batch-embeds them using google/gemini-embedding-001. The result is stored in Redis with a 24h TTL and held in-memory so subsequent turns pay no I/O cost. Variant selection uses cosine similarity (dot product on unit vectors) against the current limbic emotional context. Falls back to random.choice when the context embedding is not yet available.
Redis key format: ncm:cue_variant:{sha256_hex_of_original_string} Redis value (v2): JSON object — see _CacheEntry below Redis value (v1): JSON array of strings (old format — loaded as text-only,
re-embedded lazily on next ensure_cached call)
- class ncm_variant_cache.CueVariantCache(redis_client=None, api_key=None, openrouter_client=None, variant_models=None)[source]
Bases:
objectLazy LLM-backed variant cache for cascade cue and reason strings.
Variant selection is context-aware: before the cascade engine runs, call
set_context(emotion_text)to register the current dominant- emotion string.get_variant()then picks the variant whose embedding has the highest cosine similarity to the context embedding. Falls back torandom.choiceuntil the context embedding is ready.- Parameters:
redis_client – An
redis.asyncio.Redisinstance. May be None.api_key (
Optional[str]) – OpenRouter API key. When None the cache is a no-op.openrouter_client (OpenRouterClient | None)
- __init__(redis_client=None, api_key=None, openrouter_client=None, variant_models=None)[source]
Initialize the instance.
- Parameters:
redis_client – Redis connection client.
openrouter_client (
OpenRouterClient|None) – Shared OpenRouterClient for connection pooling and batch embedding. Falls back to direct HTTP when None.variant_models (
Optional[list[str]]) – Override models for LLM generation. When None, uses VARIANT_MODELS. Use e.g. [“google/gemini-3.1-flash-lite-preview”] for paid-only, high-throughput pregeneration.
- Return type:
None
- set_context(context_text)[source]
Register the current turn’s emotional context for variant selection.
Sync — returns immediately. If the context embedding is already in
_query_cache, the query vector is updated synchronously. Otherwise the query vector is cleared (falling back to random this turn) and a background embed task is scheduled.
- get_variant(s)[source]
Return the most contextually resonant variant, or the original.
Uses cosine similarity (dot product on unit vectors) against
_current_query_vecwhen both the query vector and per-variant embeddings are available. Falls back torandom.choiceotherwise.
- async ensure_cached(s)[source]
Generate, embed, and cache variants for s if not already present.
Fire-and-forget safe — call via asyncio.create_task().