proactive_triage
Proactive triage AI – lightweight LLM filter for interjection decisions.
Before the bot generates a full response to an unaddressed message, this
module runs a cheap, fast LLM call to decide whether Stargazer should
interject at all. The model outputs a single digit (1 = INTERJECT,
0 = SILENCE) based on the recent conversation context.
Uses the OpenAI-compatible chat-completions endpoint exposed by the LLM
proxy (config.llm_base_url), keeping the call entirely tool-free.
- class proactive_triage.ProactiveTriageAI(http_client, base_url, api_key, model='gemini-2.0-flash-lite')[source]
Bases:
objectLightweight triage deciding whether Stargazer should interject.
Makes a single OpenAI-compatible chat-completions call to a cheap, fast model (e.g.
gemini-2.0-flash-lite) and parses a binary1/0response.- __init__(http_client, base_url, api_key, model='gemini-2.0-flash-lite')[source]
Store the HTTP client and proxy endpoint for triage calls.
Caches the shared
httpx.AsyncClientand precomputes the OpenAI-compatible chat-completions URL (base_urlwith a single trailing/chat/completionsappended), so eachshould_interject()call can POST without rebuilding the route. No network or I/O happens here. Thehttp_clientis typically the inference worker’s shared OpenRouterClient transport, andbase_url/api_keycome from the bot config (llm_base_url/api_key) at the gate-5 call site.Called by
message_processor/proactive_gates.pywhen it constructs aProactiveTriageAIinside the proactive interjection gate.- Parameters:
http_client (
AsyncClient) – Shared async HTTP client used to reach the LLM proxy; reused across all triage requests.base_url (
str) – Base URL of the OpenAI-compatible LLM proxy; the/chat/completionspath is appended automatically.api_key (
str) – Bearer token sent in theAuthorizationheader.model (
str) – Cheap, fast model name used for the binary triage decision (defaults togemini-2.0-flash-lite).
- Return type:
None
- static format_cached_message(msg)[source]
Render one cached message as a single triage-prompt transcript line.
Converts a
message_cache.CachedMessageinto the[HH:MM:SS] name (user_id) : textline format the triage system prompt expects, turning the stored Unixtimestampinto a UTC wall-clock time so the model can reason about pacing and pauses. Pure and side-effect-free; touches no Redis, network, or shared state.Called by
should_interject(), which joins one formatted line per recent message into the user prompt sent to the model.- Parameters:
msg (
CachedMessage) – The cached message to render, supplying itstimestamp,user_name,user_id, andtext.- Returns:
A single transcript line ready to be joined into the prompt.
- Return type:
- async should_interject(recent_messages, max_retries=3, channel_id='', request_id='')[source]
Run the cheap triage LLM call and decide whether to interject.
Takes the most recent cached messages (capped at the last
_MAX_TRIAGE_MESSAGES), formats them viaformat_cached_message(), and POSTs a tool-free chat-completions request to the LLM proxy atself._chat_urlusing the sharedhttpxclient. The model is asked to emit a single1/0digit, which_parse_decision()interprets. This is the last gate before the bot commits to generating a full proactive reply, so failing closed (SILENCE) is the safe default. It retries on transient HTTP errors and applies a separate, longer exponential backoff (up to_MAX_503_RETRIES) for503/529overload responses raised as_OverloadError. A nested_emithelper fires a fire-and-forgettriage_decisiondebug event toobservability.publish_debug_eventfor every terminal outcome, tagged with the suppliedrequest_idandchannel_id.Called by
message_processor/proactive_gates.pyat gate 5 of the proactive interjection pipeline, which constructs a freshProactiveTriageAIand blocks the interjection when this returnsFalse.- Parameters:
recent_messages (
list[CachedMessage]) – Recent channel messages in chronological order; only the final_MAX_TRIAGE_MESSAGESare evaluated.max_retries (
int) – Maximum attempts for non-overload errors before giving up and returning SILENCE.channel_id (
str) – Channel identifier attached to emitted debug events for correlation.request_id (
str) – Observability request id attached to emitted debug events for correlation.
- Returns:
(should_interject, raw_decision_text). The boolean isTrueto interject andFalseto stay silent; the string carries the raw model output or an error description. Any unrecoverable error yields(False, ...)so the bot stays silent.- Return type: