classifiers.skill_catalog module
SQLite-backed index for Agent Skills (SKILL.md discovery and metadata).
- classifiers.skill_catalog.stable_skill_id(skill_root, corpus_root)[source]
Derive a deterministic skill ID from its path relative to the corpus.
Produces a stable, content-independent identifier for a skill so the same skill keeps the same primary key across re-ingests (it depends only on location, not on the skill body). The relative posix path under corpus_root is hashed with SHA-256 and truncated to 32 hex chars; if the skill root is not actually under corpus_root the directory name is used as the fallback basis instead.
A pure path/hash helper with no I/O. Called by
classifiers.ingest_skills(to key each upserted row) and exercised bytests/test_skill_catalog.py.
- classifiers.skill_catalog.canonical_skill_sort_key(skill_dir, corpus_root)[source]
Rank a skill directory so canonical sources win when deduping.
Returns a sort key that establishes source precedence among otherwise identical skills: git-cloned corpora under
repossort first (bucket 0),npx-installed copies next (bucket 1), everything else after (bucket 2), and anything outside corpus_root last (bucket 99). When several directories share the samebody_hash, sorting by this key and keeping the first ensures the git-managed copy is treated as canonical rather than a transientnpxinstall.A pure path-classification helper with no I/O. Called by
classifiers.ingest_skillsas thekeywhen sorting discovered skill directories before deduplication. No other in-repo callers were found.- Parameters:
- Return type:
- Returns:
A
(bucket, relative_posix_path)tuple; lower buckets and lexicographically smaller paths sort first.
- classifiers.skill_catalog.discover_skill_dirs(root, *, max_depth=8)[source]
Find every directory under root that directly holds a SKILL.md.
Entry point for skill discovery: it resolves root and runs a depth-bounded recursive walk (see the inner
walkclosure) that collects each directory containing aSKILL.mdand prunes that branch, since skills are not nested. Directories named in_SKIP_DIR_NAMESand hidden dot-directories (except the allow-listed.agents) are ignored so VCS, cache, and dependency trees are not scanned.Touches the filesystem (directory iteration only); unreadable directories are silently skipped. Called by
classifiers.ingest_skillsto enumerate the corpus and exercised bytests/test_skill_catalog.py.
- classifiers.skill_catalog.init_db(db_path)[source]
Create the SQLite skills table (and its parent dir) if absent.
Idempotent schema bootstrap for the skill catalog database. It ensures the parent directory exists, opens db_path, and runs a
CREATE TABLE IF NOT EXISTS skillsso subsequentupsert_skill()andload_*calls have a table to work against. Safe to call repeatedly.Touches the filesystem and the SQLite database (creates directories, connects, commits DDL, then closes the connection); no other I/O. Called by
classifiers.ingest_skillsbefore populating the catalog and bytests/test_skill_catalog.py.
- classifiers.skill_catalog.upsert_skill(db_path, row)[source]
Insert or replace a single skill row in the catalog database.
Upserts one skill keyed by
skill_idusingINSERT OR REPLACEso a re-ingested skill overwrites its prior row rather than duplicating it. Theingested_attimestamp defaults totime.time()when the caller does not supply one, recording when the row was last written.Touches the SQLite database (connects, executes the upsert, commits, and closes); assumes the
skillstable already exists viainit_db(). Called byclassifiers.ingest_skillsfor each discovered skill. No other in-repo callers were found.
- classifiers.skill_catalog.load_skill_by_id(db_path, skill_id)[source]
Load a single skill’s catalog row by its ID.
Point lookup against the
skillstable used to resolve a skill’s metadata (including itsskill_md_pathon disk) from the stable ID. A missing database file or a missing row both yieldNonerather than raising, so callers can treat “unknown skill” uniformly.Touches the SQLite database (opens read-only via a
SELECT, then closes). Called by theactivate_skilltool to fetch the row before reading the skill body, and exercised bytests/test_skill_catalog.py.
- classifiers.skill_catalog.load_all_skills(db_path)[source]
Return every skill’s metadata row from the catalog database.
Full-table scan of
skillsused wherever the whole catalog is needed: counting ingested skills, building embeddings over all skills, and end-to-end verification. A missing database file yields an empty list rather than raising.Touches the SQLite database (opens, runs a
SELECTof all rows, closes). Called byclassifiers.ingest_skillsandclassifiers.update_skill_embeddings(to embed skills), byscripts/skills_corpus_pipeline.pyandscripts/verify_npx_skills_e2e.py, and bytests/test_skill_catalog.py.
- classifiers.skill_catalog.read_skill_body(skill_md_path)[source]
Read a SKILL.md and split its markdown body from the raw text.
Loads the file and strips the leading
---fenced YAML frontmatter, returning both the body alone (for presenting/activating the skill) and the untouched full text (for callers that still need the frontmatter). When no frontmatter is present the whole file is treated as the body.Touches the filesystem (reads skill_md_path); no other I/O. Called by the
activate_skilltool when surfacing a skill’s instructions. No other in-repo callers were found.
- classifiers.skill_catalog.skill_embedding_text(name, description)[source]
Build the text representation of a skill for semantic embedding.
Joins a skill’s name and description into the single string that gets embedded for vector search, so a query can be matched against skills by meaning. Keeping this in one helper ensures ingestion and lookup embed skills identically (tier-1 style: name then description, newline-separated).
A pure string helper with no I/O. Called by
classifiers.update_skill_embeddingswhen computing the embedding for each skill row. No other in-repo callers were found.