classifiers.update_tool_embeddings module
Incrementally update tool embeddings in Redis.
Discovers all registered tools, compares against what already exists in Redis, generates synthetic queries for missing tools via the Gemini API (using the shared embedding key pool), and adds only the new embeddings – without touching existing ones.
Usage:
python -m classifiers.update_tool_embeddings [--force-index] [--force-all] [--openrouter-only] [--tools-dir tools]
--openrouter-only skips Gemini for both synthetic query generation (OpenRouter
chat: google/gemini-3.1-flash-lite) and embeddings (OpenRouter
/embeddings: google/gemini-embedding-001 via gemini_embed_pool.set_openrouter_only()).
Requires OPENROUTER_QUERY_GEN_API_KEY, OPENROUTER_API_KEY, or API_KEY.
Raises default TOOL_EMBED_OR_MAX_CONCURRENT to 32 when unset (heavy parallel
batches; override with env).
--force-all regenerates synthetic queries (when combined with --force-index
or implied) and embeddings for every registered tool, not only keys missing
from Redis.
To regenerate synthetic queries and embeddings for specific tools only (e.g. after fixing query-gen for one tool), use:
python -m classifiers.refresh_tool_embeddings --tools vmware_control
Comma-separate multiple names. This updates tool_index_data.json and Redis
for those tools only; other tools are unchanged.
- Environment variables:
REDIS_URL – defaults to redis://localhost:6379/0 TOOL_EMBED_OR_MAX_CONCURRENT – max concurrent embedding HTTP batches in
OpenRouter-only mode (default 32 if unset)
- async classifiers.update_tool_embeddings.get_existing_redis_tools(redis_client)[source]
Return the set of tool names that already have centroid embeddings in Redis.
Issues a single
HKEYSagainst theTOOL_EMBEDDINGS_HASH_KEYhash (thestargazer:tool_embeddingsmap defined inclassifiers.vector_classifier) and decodes anybytesfield names tostrso callers can diff them against the live tool registry. This is what drives the incremental behavior: names present here are skipped, names absent are treated as missing, and names here but no longer registered are pruned as orphans. Errors are swallowed and logged so a transient Redis failure yields an empty set rather than aborting the run.Called once by
update_tool_embeddings()in this module to compute the missing/orphaned split; no other callers were found.
- classifiers.update_tool_embeddings.discover_tools(tools_dir='tools')[source]
Auto-discover every tool on disk and return it keyed by tool name.
Builds a fresh
tools.ToolRegistry, populates it by callingtool_loader.load_tools()over the given directory (which imports each tool module and registers its definition), and collapsesregistry.list_tools()into a name-to-definition mapping. This is the live source of truth for which tools exist, against which the Redis embedding hash is diffed for missing/orphaned tools; it touches the filesystem (importing tool modules) but not Redis.Called by
update_tool_embeddings()here, byupdate_changed_tool_embeddings.update_changed_tool_embeddings(), and byclassifiers/refresh_tool_embeddings.pyto resolve the current tool set.
- classifiers.update_tool_embeddings.load_index_file()[source]
Load index file from the configured source.
- async classifiers.update_tool_embeddings.update_tool_embeddings(force_index=False, tools_dir='tools', *, force_all=False, openrouter_only=False)[source]
Incrementally reconcile the tool vector index in Redis with the live registry.
The core routine of this module. It discovers the registered tools via
discover_tools(), reads the existing embedding keys viaget_existing_redis_tools(), and computes the missing set (tools to add) and the orphaned set (tools to remove). Orphans are pruned fromTOOL_EMBEDDINGS_HASH_KEY/TOOL_METADATA_HASH_KEYand the per-tool RediSearch documents, and also from the on-disktool_index_data.json. For the missing tools it generates synthetic queries (Gemini, or OpenRouter chatgoogle/gemini-3.1-flash-litewhenopenrouter_onlyis set) via the nestedgenclosure, computes centroid vectors throughclassifiers.tool_embedding_batch.compute_tool_centroids_bulk()and anOpenRouterEmbeddingsclient, and additively HSETs the new vectors plus metadata back into Redis.Heavily I/O bound: it loads
config.Config, opens its own async Redis connection (URL or Sentinel), callsgemini_embed_pool.init_quota_tracking(), optionally toggles OpenRouter-only mode for the embedding pool, makes HTTP calls for query generation and embeddings, and reads/writes both Redis andtool_index_data.json. The Redis connection and any OpenRouter-only state are always released in thefinallyblock.Called only by
main()in this module (thepython -m classifiers.update_tool_embeddingsentry point); no other internal callers were found.- Parameters:
force_index (
bool) – Regenerate synthetic queries even when the index file already has enough for a tool. Implied byforce_all.tools_dir (
str) – Directory to scan for tools. Defaults to"tools".force_all (
bool) – Treat every registered tool as missing, recomputing queries and embeddings for all of them rather than only Redis-absent keys.openrouter_only (
bool) – Route both query generation and embeddings through OpenRouter instead of the Gemini key pool. Requires an OpenRouter orAPI_KEYcredential; raises the default embedding concurrency to 32.
- Returns:
Trueon success (including the no-op case where everything is already embedded),Falseif a required credential is missing.- Return type:
- Raises:
RuntimeError – If a tool slated for embedding lacks the expected number of synthetic queries in the index file.
- async classifiers.update_tool_embeddings.main()[source]
Async CLI entry point that parses flags and runs the incremental update.
Builds an
argparse.ArgumentParserexposing--force-index/-f(regenerate synthetic queries),--force-all(recompute every tool, not just Redis-absent ones),--openrouter-only(route query generation and embeddings through OpenRouter), and--tools-dir(tool scan directory). It then awaitsupdate_tool_embeddings()with those values and translates the returned boolean into a process exit code viasys.exit()(0on success,1on failure or unhandled exception, which is logged with a traceback).Invoked only by the module’s
if __name__ == "__main__"guard throughasyncio.run()(python -m classifiers.update_tool_embeddings); no other internal callers were found.- Returns:
The process is terminated via
sys.exit().- Return type: