classifiers.update_changed_tool_embeddings module
Surgical embedding update for tools whose descriptions changed.
This script is intentionally smaller and safer than
update_tool_embeddings and refresh_tool_embeddings:
It never calls
DEL,HDEL,FLUSHDB, or any other delete primitive on Redis. OnlyHSETfor the explicit per-tool keys it is updating.It does not prune orphaned entries (use
update_tool_embeddingsfor that — once you’re sure you actually want the prune).It does not regenerate embeddings for tools that did not change.
It edits
tool_index_data.jsononly for the targeted tools, leaving every other entry byte-for-byte the same.If a target’s query generation or centroid embedding fails, the script skips that one tool and leaves Redis untouched for it.
Resolution order for “what changed”:
--tools name1,name2— explicit allowlist (intersected with the live tool registry; unknown names are warned and skipped).Otherwise auto-detect by reading
stargazer:tool_metadatafrom Redis and comparing each tool’s storeddescriptionagainst the liveTOOL_DESCRIPTION. Tools missing from Redis entirely are NOT added — that’supdate_tool_embeddings‘ job.
Pass --dry-run to print the plan without writing anything.
Usage:
python -m classifiers.update_changed_tool_embeddings
python -m classifiers.update_changed_tool_embeddings --tools redis_admin,rag_index_file
python -m classifiers.update_changed_tool_embeddings --dry-run
python -m classifiers.update_changed_tool_embeddings --openrouter-only
Env vars: REDIS_URL (default redis://localhost:6379/0),
OPENROUTER_QUERY_GEN_API_KEY / OPENROUTER_API_KEY / API_KEY
when --openrouter-only is active.
- async classifiers.update_changed_tool_embeddings.find_stale_tools(registered, redis_client)[source]
Return names of registered tools whose live description differs from the one stored in Redis
TOOL_METADATA_HASH_KEY.Tools whose name is missing from Redis are intentionally NOT included — adding new tools is a job for
update_tool_embeddings.
- async classifiers.update_changed_tool_embeddings.update_changed_tool_embeddings(*, tool_names=None, tools_dir='tools', dry_run=False, openrouter_only=False, paid_key=None, concurrency=None)[source]
Re-embed only the listed or description-changed tools, never wiping anything.
The core routine of this module: a deliberately conservative, additive counterpart to
update_tool_embeddings.update_tool_embeddings(). It resolves a target set either from the explicittool_namesallowlist (intersected with the live registry fromdiscover_tools()) or, when none is given, by auto-detecting stale tools throughfind_stale_tools()(livedescriptiondiffering from the value stored inTOOL_METADATA_HASH_KEY). For each target it regenerates synthetic queries via the nestedgen_oneclosure (Gemini, or OpenRouter whenopenrouter_onlyis set), computes centroids throughclassifiers.tool_embedding_batch.compute_tool_centroids_bulk(), and writes them withHSETonly — intoTOOL_EMBEDDINGS_HASH_KEY,TOOL_METADATA_HASH_KEY, and the per-tool RediSearch documents viaclassifiers.redis_vector_index.store_tool_embedding_hash(). It performs noDEL/HDEL/FLUSHDBand never adds tools missing from Redis, and per-tool failures are isolated so one bad tool does not abort the run.Opens (from
config.Config/REDIS_URL) and always closes its own async Redis connection, callsgemini_embed_pool.init_quota_tracking(), optionally toggles OpenRouter-only mode, may exportGEMINI_EMBED_PAID_KEYwhenpaid_keyis given, makes HTTP query-gen and embedding calls, and rewrites only the targeted entries oftool_index_data.jsonviasave_index_file()(other entries preserved verbatim). A--dry-runpath logs the plan and returns without any writes. Called only bymain()here (thepython -m classifiers.update_changed_tool_embeddingsentry point); no other internal callers were found.- Parameters:
tool_names (
list[str] |None) – Explicit tool names to refresh; unknown names are warned and skipped. WhenNoneor empty, stale tools are auto-detected by description compare.tools_dir (
str) – Directory to scan for tools. Defaults to"tools".dry_run (
bool) – Print the plan (targets and intended HSETs) and returnTruewithout contacting the embedding backends or writing Redis or the index file.openrouter_only (
bool) – Route both query generation and embeddings through OpenRouter; requires an OpenRouter orAPI_KEYcredential and raises the default embedding concurrency to 32.paid_key (
str|None) – Pin synthetic-query generation to a single paid Gemini key (also exported for the embed pool’s fallback) and force its use on the first call.concurrency (
int|None) – Override for the synthetic-query generation concurrency. WhenNone, defaults to 1 withpaid_key, 8 withopenrouter_only, otherwise 3.
- Returns:
Trueon success (including the no-stale-tools and dry-run cases),Falseif a credential is missing, no valid targets were given, or no target produced usable queries or centroids.- Return type:
- async classifiers.update_changed_tool_embeddings.main()[source]
Async CLI entry point that parses flags and runs the surgical update.
Builds an
argparse.ArgumentParserexposing--tools/-t(a comma-separated allowlist; auto-detect stale tools when omitted),--tools-dir,--dry-run,--openrouter-only,--paid-key, and--concurrency. It splits the comma-separated tool list, then awaitsupdate_changed_tool_embeddings()with the parsed options and translates the returned boolean into a process exit code viasys.exit()(0on success,1on failure or an unhandled exception, which is logged with a traceback).Invoked only by the module’s
if __name__ == "__main__"guard throughasyncio.run()(python -m classifiers.update_changed_tool_embeddings); no other internal callers were found.- Returns:
The process is terminated via
sys.exit().- Return type: