classifiers.dangerous_command_guard module

Runtime guard: RediSearch KNN against dangerous vs benign technical centroids.

Compares the user message embedding to the nearest dangerous-command centroid and, when the benign-tech index is populated, to the nearest benign-technical centroid. Injects a suffix only when the message is closer to danger than to benign (above threshold), reducing false positives from shared technical vocabulary.

async classifiers.dangerous_command_guard.maybe_dangerous_command_warning(redis, query_embedding, config, channel_id='', user_id='', request_id='')[source]

Classify a user message and return the safety warning suffix when it looks dangerous.

The guard’s main entry point: embeds-comparison logic that decides whether to append DANGEROUS_CMD_WARNING_SUFFIX to the current user turn. It runs a RediSearch KNN of the precomputed query_embedding against the dangerous-command centroids and, when the benign-technical index is populated, against the benign centroids too, warning only when the message is closer to danger than to benign (above dangerous_command_similarity_threshold and, if set, beyond dangerous_command_benign_margin). The benign comparison is what suppresses false positives from shared technical vocabulary; when idx:benign_tech is empty it falls back to the backward-compatible danger-only threshold.

It reads DANGEROUS_CMD_INDEX_NAME and BENIGN_TECH_INDEX_NAME from Redis via redisearch_index_doc_count(), knn_search_dangerous_cmds(), and knn_search_benign_tech(). Any Redis or KNN failure is routed through _infra_fail_suffix() so the configured fail-open/closed policy applies. On a real trigger it fires a fire-and-forget dangerous_cmd_trigger observability event through the nested _emit_trigger closure (which schedules observability.publish_debug_event() as an asyncio task) and returns the warning. Disabled or empty inputs short-circuit to None.

Called by message_processor.generate_and_send while assembling the inference request, and exercised by tests/test_dangerous_command_guard.py and tests/test_observability_classifier.py; no other production callers were found.

Parameters:
  • redis (Redis) – Async Redis client backing the RediSearch indexes.

  • query_embedding (ndarray | None) – Precomputed embedding of the user message; None or empty short-circuits to None.

  • config (Config) – Config object supplying the enable flag, thresholds, margin, and fail mode.

  • channel_id (str) – Channel id, forwarded only into the observability event.

  • user_id (str) – User id, forwarded only into the observability event.

  • request_id (str) – Request id, forwarded only into the observability event.

Returns:

DANGEROUS_CMD_WARNING_SUFFIX when the message is judged dangerous (or under a fail-closed infrastructure error), otherwise None.

Return type:

str | None