classifiers.update_skill_embeddings module
Compute skill embeddings from SQLite index and store in Redis.
- async classifiers.update_skill_embeddings.prune_skill_embedding_orphans(*, db_path)[source]
Delete Redis skill embeddings whose ids no longer exist in the SQLite catalog.
Loads the authoritative skill rows from the SQLite index via
classifiers.skill_catalog.load_all_skills(), reads the ids currently in Redis via_get_existing_ids(), and removes every id present in Redis but absent from SQLite. For each orphan it HDELs the field from bothSKILL_EMBEDDINGS_HASH_KEYandSKILL_METADATA_HASH_KEYand drops the per-skill RediSearch document throughclassifiers.redis_vector_index.delete_skill_embedding_hash(). This keeps the vector index from routing to skills that have been deleted or renamed.Opens (from
config.Config/REDIS_URL) and always closes its own async Redis connection; it touches Redis and reads the SQLite file but makes no embedding calls. Called by the nested_runcoroutine inmain()when--prune-orphansis passed; no other internal callers were found.
- async classifiers.update_skill_embeddings.update_skill_embeddings(*, db_path, force_all=False)[source]
Embed skills from the SQLite catalog into Redis, incrementally by skill id.
The core routine of this module. It loads every skill row via
classifiers.skill_catalog.load_all_skills(), reads the already-embedded ids via_get_existing_ids(), and (unlessforce_all) limits work to the skills missing from Redis. For each pending skill it builds the tier-1 embedding text withclassifiers.skill_catalog.skill_embedding_text(), batches those texts through anOpenRouterEmbeddingsclient (cfg.embedding_batch_sizeper request), L2-normalizes each vector, and stores them: the JSON vector underSKILL_EMBEDDINGS_HASH_KEY, a metadata blob (id, name, description, paths) underSKILL_METADATA_HASH_KEY, and a per-skill RediSearch document viaclassifiers.redis_vector_index.store_skill_embedding_hash().Opens (from
config.Config/REDIS_URL) and always closes its own async Redis connection, reads the SQLite file, and makes OpenRouter embedding HTTP calls. Called by the nested_runcoroutine inmain()and byscripts/skills_corpus_pipeline.py; no other internal callers were found.- Parameters:
- Returns:
Trueon completion, including the no-op cases where the database is empty or all skills already have embeddings.- Return type:
- classifiers.update_skill_embeddings.main()[source]
Parse CLI flags and run the skill embedding update synchronously.
Builds an
argparse.ArgumentParserexposing--db(SQLite path, defaulting toConfig.skills_index_db),--force-all(recompute every skill rather than only missing ids), and--prune-orphans(drop Redis keys absent from SQLite before the update). The resolved database path is threaded into a nested_runcoroutine that optionally callsprune_skill_embedding_orphans()and thenupdate_skill_embeddings();_runis driven to completion withasyncio.run. Both helpers open their own Redis connection (writing theSKILL_EMBEDDINGS_HASH_KEY/SKILL_METADATA_HASH_KEYhashes plus the per-skillskill_emb:*RediSearch documents) and call OpenRouter for embeddings.This is the module entry point invoked under
if __name__ == "__main__"viaraise SystemExit(main())(e.g.python -m classifiers.update_skill_embeddings); no other internal callers were found.- Returns:
0when the update reports success,1otherwise — suitable as a process exit code.- Return type: