proactive_triage

Proactive triage AI – lightweight LLM filter for interjection decisions.

Before the bot generates a full response to an unaddressed message, this module runs a cheap, fast LLM call to decide whether Stargazer should interject at all. The model outputs a single digit (1 = INTERJECT, 0 = SILENCE) based on the recent conversation context.

Uses the OpenAI-compatible chat-completions endpoint exposed by the LLM proxy (config.llm_base_url), keeping the call entirely tool-free.

class proactive_triage.ProactiveTriageAI(http_client, base_url, api_key, model='gemini-2.0-flash-lite')[source]

Bases: object

Lightweight triage deciding whether Stargazer should interject.

Makes a single OpenAI-compatible chat-completions call to a cheap, fast model (e.g. gemini-2.0-flash-lite) and parses a binary 1 / 0 response.

Parameters:
  • http_client (httpx.AsyncClient)

  • base_url (str)

  • api_key (str)

  • model (str)

__init__(http_client, base_url, api_key, model='gemini-2.0-flash-lite')[source]

Store the HTTP client and proxy endpoint for triage calls.

Caches the shared httpx.AsyncClient and precomputes the OpenAI-compatible chat-completions URL (base_url with a single trailing /chat/completions appended), so each should_interject() call can POST without rebuilding the route. No network or I/O happens here. The http_client is typically the inference worker’s shared OpenRouterClient transport, and base_url / api_key come from the bot config (llm_base_url / api_key) at the gate-5 call site.

Called by message_processor/proactive_gates.py when it constructs a ProactiveTriageAI inside the proactive interjection gate.

Parameters:
  • http_client (AsyncClient) – Shared async HTTP client used to reach the LLM proxy; reused across all triage requests.

  • base_url (str) – Base URL of the OpenAI-compatible LLM proxy; the /chat/completions path is appended automatically.

  • api_key (str) – Bearer token sent in the Authorization header.

  • model (str) – Cheap, fast model name used for the binary triage decision (defaults to gemini-2.0-flash-lite).

Return type:

None

static format_cached_message(msg)[source]

Render one cached message as a single triage-prompt transcript line.

Converts a message_cache.CachedMessage into the [HH:MM:SS] name (user_id) : text line format the triage system prompt expects, turning the stored Unix timestamp into a UTC wall-clock time so the model can reason about pacing and pauses. Pure and side-effect-free; touches no Redis, network, or shared state.

Called by should_interject(), which joins one formatted line per recent message into the user prompt sent to the model.

Parameters:

msg (CachedMessage) – The cached message to render, supplying its timestamp, user_name, user_id, and text.

Returns:

A single transcript line ready to be joined into the prompt.

Return type:

str

async should_interject(recent_messages, max_retries=3, channel_id='', request_id='')[source]

Run the cheap triage LLM call and decide whether to interject.

Takes the most recent cached messages (capped at the last _MAX_TRIAGE_MESSAGES), formats them via format_cached_message(), and POSTs a tool-free chat-completions request to the LLM proxy at self._chat_url using the shared httpx client. The model is asked to emit a single 1 / 0 digit, which _parse_decision() interprets. This is the last gate before the bot commits to generating a full proactive reply, so failing closed (SILENCE) is the safe default. It retries on transient HTTP errors and applies a separate, longer exponential backoff (up to _MAX_503_RETRIES) for 503 / 529 overload responses raised as _OverloadError. A nested _emit helper fires a fire-and-forget triage_decision debug event to observability.publish_debug_event for every terminal outcome, tagged with the supplied request_id and channel_id.

Called by message_processor/proactive_gates.py at gate 5 of the proactive interjection pipeline, which constructs a fresh ProactiveTriageAI and blocks the interjection when this returns False.

Parameters:
  • recent_messages (list[CachedMessage]) – Recent channel messages in chronological order; only the final _MAX_TRIAGE_MESSAGES are evaluated.

  • max_retries (int) – Maximum attempts for non-overload errors before giving up and returning SILENCE.

  • channel_id (str) – Channel identifier attached to emitted debug events for correlation.

  • request_id (str) – Observability request id attached to emitted debug events for correlation.

Returns:

(should_interject, raw_decision_text). The boolean is True to interject and False to stay silent; the string carries the raw model output or an error description. Any unrecoverable error yields (False, ...) so the bot stays silent.

Return type:

tuple[bool, str]