classifiers.dangerous_command_guard module
Runtime guard: RediSearch KNN against dangerous vs benign technical centroids.
Compares the user message embedding to the nearest dangerous-command centroid and, when the benign-tech index is populated, to the nearest benign-technical centroid. Injects a suffix only when the message is closer to danger than to benign (above threshold), reducing false positives from shared technical vocabulary.
- async classifiers.dangerous_command_guard.maybe_dangerous_command_warning(redis, query_embedding, config, channel_id='', user_id='', request_id='')[source]
Classify a user message and return the safety warning suffix when it looks dangerous.
The guard’s main entry point: embeds-comparison logic that decides whether to append
DANGEROUS_CMD_WARNING_SUFFIXto the current user turn. It runs a RediSearch KNN of the precomputed query_embedding against the dangerous-command centroids and, when the benign-technical index is populated, against the benign centroids too, warning only when the message is closer to danger than to benign (abovedangerous_command_similarity_thresholdand, if set, beyonddangerous_command_benign_margin). The benign comparison is what suppresses false positives from shared technical vocabulary; whenidx:benign_techis empty it falls back to the backward-compatible danger-only threshold.It reads
DANGEROUS_CMD_INDEX_NAMEandBENIGN_TECH_INDEX_NAMEfrom Redis viaredisearch_index_doc_count(),knn_search_dangerous_cmds(), andknn_search_benign_tech(). Any Redis or KNN failure is routed through_infra_fail_suffix()so the configured fail-open/closed policy applies. On a real trigger it fires a fire-and-forgetdangerous_cmd_triggerobservability event through the nested_emit_triggerclosure (which schedulesobservability.publish_debug_event()as anasynciotask) and returns the warning. Disabled or empty inputs short-circuit toNone.Called by
message_processor.generate_and_sendwhile assembling the inference request, and exercised bytests/test_dangerous_command_guard.pyandtests/test_observability_classifier.py; no other production callers were found.- Parameters:
redis (
Redis) – Async Redis client backing the RediSearch indexes.query_embedding (
ndarray|None) – Precomputed embedding of the user message;Noneor empty short-circuits toNone.config (
Config) – Config object supplying the enable flag, thresholds, margin, and fail mode.channel_id (
str) – Channel id, forwarded only into the observability event.user_id (
str) – User id, forwarded only into the observability event.request_id (
str) – Request id, forwarded only into the observability event.
- Returns:
DANGEROUS_CMD_WARNING_SUFFIXwhen the message is judged dangerous (or under a fail-closed infrastructure error), otherwiseNone.- Return type: