classifiers.update_skill_embeddings module

Compute skill embeddings from SQLite index and store in Redis.

async classifiers.update_skill_embeddings.prune_skill_embedding_orphans(*, db_path)[source]

Delete Redis skill embeddings whose ids no longer exist in the SQLite catalog.

Loads the authoritative skill rows from the SQLite index via classifiers.skill_catalog.load_all_skills(), reads the ids currently in Redis via _get_existing_ids(), and removes every id present in Redis but absent from SQLite. For each orphan it HDELs the field from both SKILL_EMBEDDINGS_HASH_KEY and SKILL_METADATA_HASH_KEY and drops the per-skill RediSearch document through classifiers.redis_vector_index.delete_skill_embedding_hash(). This keeps the vector index from routing to skills that have been deleted or renamed.

Opens (from config.Config / REDIS_URL) and always closes its own async Redis connection; it touches Redis and reads the SQLite file but makes no embedding calls. Called by the nested _run coroutine in main() when --prune-orphans is passed; no other internal callers were found.

Parameters:

db_path (str) – Path to the skills index SQLite database to treat as the source of valid skill ids.

Returns:

Number of orphaned skill ids removed from Redis (0 if none).

Return type:

int

async classifiers.update_skill_embeddings.update_skill_embeddings(*, db_path, force_all=False)[source]

Embed skills from the SQLite catalog into Redis, incrementally by skill id.

The core routine of this module. It loads every skill row via classifiers.skill_catalog.load_all_skills(), reads the already-embedded ids via _get_existing_ids(), and (unless force_all) limits work to the skills missing from Redis. For each pending skill it builds the tier-1 embedding text with classifiers.skill_catalog.skill_embedding_text(), batches those texts through an OpenRouterEmbeddings client (cfg.embedding_batch_size per request), L2-normalizes each vector, and stores them: the JSON vector under SKILL_EMBEDDINGS_HASH_KEY, a metadata blob (id, name, description, paths) under SKILL_METADATA_HASH_KEY, and a per-skill RediSearch document via classifiers.redis_vector_index.store_skill_embedding_hash().

Opens (from config.Config / REDIS_URL) and always closes its own async Redis connection, reads the SQLite file, and makes OpenRouter embedding HTTP calls. Called by the nested _run coroutine in main() and by scripts/skills_corpus_pipeline.py; no other internal callers were found.

Parameters:
  • db_path (str) – Path to the skills index SQLite database to read skills from.

  • force_all (bool) – Recompute embeddings for every skill rather than only those missing from Redis.

Returns:

True on completion, including the no-op cases where the database is empty or all skills already have embeddings.

Return type:

bool

classifiers.update_skill_embeddings.main()[source]

Parse CLI flags and run the skill embedding update synchronously.

Builds an argparse.ArgumentParser exposing --db (SQLite path, defaulting to Config.skills_index_db), --force-all (recompute every skill rather than only missing ids), and --prune-orphans (drop Redis keys absent from SQLite before the update). The resolved database path is threaded into a nested _run coroutine that optionally calls prune_skill_embedding_orphans() and then update_skill_embeddings(); _run is driven to completion with asyncio.run. Both helpers open their own Redis connection (writing the SKILL_EMBEDDINGS_HASH_KEY / SKILL_METADATA_HASH_KEY hashes plus the per-skill skill_emb:* RediSearch documents) and call OpenRouter for embeddings.

This is the module entry point invoked under if __name__ == "__main__" via raise SystemExit(main()) (e.g. python -m classifiers.update_skill_embeddings); no other internal callers were found.

Returns:

0 when the update reports success, 1 otherwise — suitable as a process exit code.

Return type:

int