classifiers.ingest_skills module
Scan corpus roots for SKILL.md files and populate the SQLite skills index.
CLI and helper for the skills-corpus ingest stage: it walks one or more corpus
roots, discovers every SKILL.md under them, and upserts the parsed metadata
into the on-disk SQLite skills index that the runtime skill catalog reads from.
Optional body-hash dedupe keeps copies of the same skill from being indexed more
than once.
Heavy lifting lives in classifiers.skill_catalog (discovery, parsing,
stable id generation, and the database upserts); this module wraps it with the
ingest_roots() driver and a main() argument-parsing entry point. The
module is runnable as python -m classifiers.ingest_skills and is also invoked
as a subprocess by the skills-corpus reconcile and pipeline scripts.
- classifiers.ingest_skills.ingest_roots(corpus_roots, db_path, *, dedupe_by_body_hash=True)[source]
Scan the given corpus roots and upsert every discovered skill into SQLite.
The core ingest driver: for each resolved corpus root it discovers skill directories, parses each
SKILL.md, optionally drops body-hash duplicates, assigns a stable skill id, and writes one row per surviving skill into the SQLite index. This is what keeps the searchable skill catalog in sync with the on-disk corpus.It ensures the schema exists via
classifiers.skill_catalog.init_db(), enumerates directories withclassifiers.skill_catalog.discover_skill_dirs()sorted byclassifiers.skill_catalog.canonical_skill_sort_key(), parses files withclassifiers.skill_catalog._parse_skill_md, derives ids viaclassifiers.skill_catalog.stable_skill_id(), and persists rows throughclassifiers.skill_catalog.upsert_skill(). Its side effects are the SQLite writes atdb_pathand INFO/WARNING logging; missing roots and unparseable or duplicate files are counted as skips, not errors. Called bymain(), byscripts/skills_corpus_pipeline.py, and bytests/test_skill_catalog.py; no other callers were found.- Parameters:
- Returns:
(inserted_or_updated, skipped)counts across all roots.- Return type:
- classifiers.ingest_skills.main()[source]
Run the SKILL.md ingestion as a standalone CLI entry point.
Parses command-line arguments (
--roots,--db,--no-dedupe-body-hash), resolves the corpus roots to scan, ingests every discoveredSKILL.mdinto the SQLite skills index, and logs a summary of how many rows were upserted, skipped, and now present.When no
--rootsare supplied it falls back to the configured corpus roots by importingconfig.Configand readingConfig.load().skills_corpus_roots; if that yields nothing it logs an error and aborts. The actual scan and database writes are delegated toingest_roots(), and the final total is computed viaclassifiers.skill_catalog.load_all_skills.Called by the module’s
__main__guard (raise SystemExit(main())), so it is the process entry point when run aspython -m classifiers.ingest_skills— for example the subprocess spawned byscripts/reconcile_skills_sqlite.py. (No in-process Python callers invokemaindirectly; the pipeline inscripts/skills_corpus_pipeline.pycallsingest_roots()instead.)- Returns:
0on successful ingestion, or1when no corpus roots could be resolved from arguments or config.- Return type: