tools.feature_atlas.extract_features_swarm module
Step 1b: Gemini Flash swarm for feature mapping.
Takes the repo symbol index from Step 1a and uses a Gemini Flash swarm
to map actual code evidence onto the canonical features defined in
config.yaml.
- Architecture:
MasterOrchestrator: coordinates the run
N FeatureExtractor agents: map evidence onto features (parallel)
1 Curator agent: deduplicates, validates, normalizes
Uses local OpenAI-compatible proxy at localhost:3000 (no API keys needed).
- Usage:
python -m tools.feature_atlas.extract_features_swarm
# skull fire – THE SWARM NAMES THE ORGANS
- tools.feature_atlas.extract_features_swarm.load_config()[source]
Load the Feature Atlas configuration from
config.yaml.Reads and parses the atlas config that drives the whole swarm: the
swarmblock (Gemini model name, local proxy URL, temperature, concurrency, retry budget) and thecanonical_featureslist that every extractor agent maps code evidence onto. This is a pure filesystem read of_CONFIG_PATH(tools/feature_atlas/config.yaml) parsed withyaml.safe_load; it touches no Redis, knowledge graph, or network.Called by this module’s
run_extraction()(when no config is passed) andasync_main(); a same-named helper also appears in the sibling atlas scriptsextract_repo_symbols.pyanddiscover_features.py, each loading its own module copy.
- tools.feature_atlas.extract_features_swarm.load_symbols()[source]
Load the repo symbol index produced by Step 1a.
Reads the per-file symbol records (classes, functions, constants, imports, template/CSS/HTML metadata) that
extract_repo_symbols.pywrites tooutputs/repo_symbols.json; these records are the raw evidence the swarm later maps onto canonical features. This is a pure filesystem read of_SYMBOLS_PATHparsed withjson.load, with no Redis, knowledge graph, or network access.Called by this module’s
run_extraction(); a same-named helper also exists in the siblingdiscover_features.pyscript.- Return type:
- Returns:
A list of per-file symbol record dictionaries.
- Raises:
FileNotFoundError – If
outputs/repo_symbols.jsonis missing, signalling thatextract_repo_symbols.pyhas not been run yet.
- async tools.feature_atlas.extract_features_swarm.run_extraction(config=None)[source]
Run the full Step 1b feature-extraction swarm end to end.
Orchestrates every stage in order: loads config (via
load_config()when none is passed) and the symbol index (load_symbols()), buckets files by directory (_group_files_by_directory()), then fans out one_extract_features_for_group()extractor task per group under a sharedasyncio.Semaphoreand gathers them withreturn_exceptions=Trueso a failed group is logged rather than fatal. The collected mappings are folded into a registry by_aggregate_mappings()and enriched with LLM-written blurbs by_generate_descriptions(). Side effects are confined to the filesystem reads and the Gemini Flash HTTP/LLM calls those helpers make; no Redis or knowledge-graph access.Called by this module’s
async_main(). A same-named coroutine exists in other modules (build_kg.py,memories_port/import_memories.py) but those are unrelated functions with different signatures.- Parameters:
config (
dict[str,Any] |None) – An optional pre-loaded atlas config; whenNoneit is loaded fromconfig.yamlviaload_config().- Return type:
- Returns:
The feature registry as a list of feature dictionaries, each with files, symbols, data stores, confidence, evidence, and a generated description.
- Raises:
ValueError – If the config defines no
canonical_features.
- async tools.feature_atlas.extract_features_swarm.async_main()[source]
Async entry point: run the swarm and persist the feature registry.
Drives a single end-to-end run for command-line use. It loads the config, invokes
run_extraction()to build the feature registry, writes the result as pretty-printed JSON tooutputs/feature_registry.json(_OUTPUT_PATH, creating the parent directory if needed), and prints a summary banner with feature, confidence, file, symbol, and timing counts. The filesystem write and stdout output are its only side effects.Called by this module’s
main()throughasyncio.run; not invoked by other modules (each atlas script defines its ownasync_main).- Return type:
- tools.feature_atlas.extract_features_swarm.main()[source]
Sync entry point: configure logging and run
async_main().The console entry point for
python -m tools.feature_atlas.extract_features_swarm. It sets up basic INFO-level logging with a timestamped format, then drives the async pipeline viaasyncio.run(async_main()). Side effects are limited to global logging configuration and whateverasync_main()performs.Called by this module’s
if __name__ == "__main__"guard.- Return type: