classifiers.vector_classifier module
Vector-based classifier for tool selection.
Lightweight semantic vector classifier that replaces sending all tools to the LLM with deterministic vector retrieval. Pre-computed centroid embeddings (stored in Redis hashes) are compared against user-query embeddings via cosine similarity to select the most relevant tools.
- classifiers.vector_classifier.detect_tool_request_keywords(response_text)[source]
Return True when the bot seems to request missing tools.
This lightweight regex check gates the heavier embedding-based tool expansion to avoid false positives on legitimate no-tool responses.
- classifiers.vector_classifier.find_tools_explicitly_named(message, valid_names)[source]
Return tool names that appear verbatim in message as whole tokens.
Detection: * Maximal runs of ASCII letters, digits, and underscores (typical
snake_casetools), equivalent to word boundaries for those names.Text inside ASCII backticks (
inline code): inner text is stripped and must match a registered tool name exactly, so names containing hyphens or other punctuation still match when quoted.
Hits are ordered by first occurrence in the message; each tool appears at most once.
- class classifiers.vector_classifier.VectorClassifier(redis_client, similarity_threshold=0.15, top_k=20, api_key=None)[source]
Bases:
objectSemantic vector-based classifier for tool selection.
- Parameters:
- __init__(redis_client, similarity_threshold=0.15, top_k=20, api_key=None)[source]
Initialize the instance.
- async classify(message, query_embedding=None, registry_tool_names=None, *, scan_explicit_tool_mentions=True)[source]
Classify message and return tool names + strategy.
- Parameters:
query_embedding (
ndarray|None) – Pre-computed embedding for message. When provided the internal embedding API call is skipped.registry_tool_names (
Optional[Iterable[str]]) – Registered tool names (e.g. registry keys). When provided, any name that appears as a whole token in message is included in the tool set alongside vector matches.scan_explicit_tool_mentions (
bool) – WhenTrue(default), scan message for explicit registered tool names. SetFalsefor non-user text (e.g. assistant drafts, response postprocessing) so mentions in those stringstools (Returns a dict with keys)
strategy
message (str)
- Return type:
:param : :param
complexity: :param andsafety.:
- async classify_response_for_missing_tools(response_text, current_tools, threshold=0.85)[source]
Find tools the bot might need but lacks.
Used for dynamic tool expansion when the bot signals it needs tools not included in the original set.
Runs vector similarity only on response_text — not
find_tools_explicitly_named(), so tool names that appear in assistant output or postprocessed reply text never add tools.
- async classifiers.vector_classifier.initialize_tool_embeddings_from_file(index_file_path, redis_client, api_key=None, force_recompute=False)[source]
Compute centroid embeddings and store in Redis.
Reads
tool_index_data.json, embeds every synthetic query per tool, calculates the centroid, and writes the result into Redis hashes.