classifiers.update_changed_tool_embeddings module

Surgical embedding update for tools whose descriptions changed.

This script is intentionally smaller and safer than update_tool_embeddings and refresh_tool_embeddings:

  • It never calls DEL, HDEL, FLUSHDB, or any other delete primitive on Redis. Only HSET for the explicit per-tool keys it is updating.

  • It does not prune orphaned entries (use update_tool_embeddings for that — once you’re sure you actually want the prune).

  • It does not regenerate embeddings for tools that did not change.

  • It edits tool_index_data.json only for the targeted tools, leaving every other entry byte-for-byte the same.

  • If a target’s query generation or centroid embedding fails, the script skips that one tool and leaves Redis untouched for it.

Resolution order for “what changed”:

  1. --tools name1,name2 — explicit allowlist (intersected with the live tool registry; unknown names are warned and skipped).

  2. Otherwise auto-detect by reading stargazer:tool_metadata from Redis and comparing each tool’s stored description against the live TOOL_DESCRIPTION. Tools missing from Redis entirely are NOT added — that’s update_tool_embeddings‘ job.

Pass --dry-run to print the plan without writing anything.

Usage:

python -m classifiers.update_changed_tool_embeddings
python -m classifiers.update_changed_tool_embeddings --tools redis_admin,rag_index_file
python -m classifiers.update_changed_tool_embeddings --dry-run
python -m classifiers.update_changed_tool_embeddings --openrouter-only

Env vars: REDIS_URL (default redis://localhost:6379/0), OPENROUTER_QUERY_GEN_API_KEY / OPENROUTER_API_KEY / API_KEY when --openrouter-only is active.

async classifiers.update_changed_tool_embeddings.find_stale_tools(registered, redis_client)[source]

Return names of registered tools whose live description differs from the one stored in Redis TOOL_METADATA_HASH_KEY.

Tools whose name is missing from Redis are intentionally NOT included — adding new tools is a job for update_tool_embeddings.

Return type:

list[str]

Parameters:
  • registered (dict[str, Any])

  • redis_client (redis.asyncio.Redis)

async classifiers.update_changed_tool_embeddings.update_changed_tool_embeddings(*, tool_names=None, tools_dir='tools', dry_run=False, openrouter_only=False, paid_key=None, concurrency=None)[source]

Re-embed only the listed or description-changed tools, never wiping anything.

The core routine of this module: a deliberately conservative, additive counterpart to update_tool_embeddings.update_tool_embeddings(). It resolves a target set either from the explicit tool_names allowlist (intersected with the live registry from discover_tools()) or, when none is given, by auto-detecting stale tools through find_stale_tools() (live description differing from the value stored in TOOL_METADATA_HASH_KEY). For each target it regenerates synthetic queries via the nested gen_one closure (Gemini, or OpenRouter when openrouter_only is set), computes centroids through classifiers.tool_embedding_batch.compute_tool_centroids_bulk(), and writes them with HSET only — into TOOL_EMBEDDINGS_HASH_KEY, TOOL_METADATA_HASH_KEY, and the per-tool RediSearch documents via classifiers.redis_vector_index.store_tool_embedding_hash(). It performs no DEL / HDEL / FLUSHDB and never adds tools missing from Redis, and per-tool failures are isolated so one bad tool does not abort the run.

Opens (from config.Config / REDIS_URL) and always closes its own async Redis connection, calls gemini_embed_pool.init_quota_tracking(), optionally toggles OpenRouter-only mode, may export GEMINI_EMBED_PAID_KEY when paid_key is given, makes HTTP query-gen and embedding calls, and rewrites only the targeted entries of tool_index_data.json via save_index_file() (other entries preserved verbatim). A --dry-run path logs the plan and returns without any writes. Called only by main() here (the python -m classifiers.update_changed_tool_embeddings entry point); no other internal callers were found.

Parameters:
  • tool_names (list[str] | None) – Explicit tool names to refresh; unknown names are warned and skipped. When None or empty, stale tools are auto-detected by description compare.

  • tools_dir (str) – Directory to scan for tools. Defaults to "tools".

  • dry_run (bool) – Print the plan (targets and intended HSETs) and return True without contacting the embedding backends or writing Redis or the index file.

  • openrouter_only (bool) – Route both query generation and embeddings through OpenRouter; requires an OpenRouter or API_KEY credential and raises the default embedding concurrency to 32.

  • paid_key (str | None) – Pin synthetic-query generation to a single paid Gemini key (also exported for the embed pool’s fallback) and force its use on the first call.

  • concurrency (int | None) – Override for the synthetic-query generation concurrency. When None, defaults to 1 with paid_key, 8 with openrouter_only, otherwise 3.

Returns:

True on success (including the no-stale-tools and dry-run cases), False if a credential is missing, no valid targets were given, or no target produced usable queries or centroids.

Return type:

bool

async classifiers.update_changed_tool_embeddings.main()[source]

Async CLI entry point that parses flags and runs the surgical update.

Builds an argparse.ArgumentParser exposing --tools / -t (a comma-separated allowlist; auto-detect stale tools when omitted), --tools-dir, --dry-run, --openrouter-only, --paid-key, and --concurrency. It splits the comma-separated tool list, then awaits update_changed_tool_embeddings() with the parsed options and translates the returned boolean into a process exit code via sys.exit() (0 on success, 1 on failure or an unhandled exception, which is logged with a traceback).

Invoked only by the module’s if __name__ == "__main__" guard through asyncio.run() (python -m classifiers.update_changed_tool_embeddings); no other internal callers were found.

Returns:

The process is terminated via sys.exit().

Return type:

None