tools.feature_atlas.extract_features_swarm module

Step 1b: Gemini Flash swarm for feature mapping.

Takes the repo symbol index from Step 1a and uses a Gemini Flash swarm to map actual code evidence onto the canonical features defined in config.yaml.

Architecture:
  • MasterOrchestrator: coordinates the run

  • N FeatureExtractor agents: map evidence onto features (parallel)

  • 1 Curator agent: deduplicates, validates, normalizes

Uses local OpenAI-compatible proxy at localhost:3000 (no API keys needed).

Usage:

python -m tools.feature_atlas.extract_features_swarm

# skull fire – THE SWARM NAMES THE ORGANS

tools.feature_atlas.extract_features_swarm.load_config()[source]

Load the Feature Atlas configuration from config.yaml.

Reads and parses the atlas config that drives the whole swarm: the swarm block (Gemini model name, local proxy URL, temperature, concurrency, retry budget) and the canonical_features list that every extractor agent maps code evidence onto. This is a pure filesystem read of _CONFIG_PATH (tools/feature_atlas/config.yaml) parsed with yaml.safe_load; it touches no Redis, knowledge graph, or network.

Called by this module’s run_extraction() (when no config is passed) and async_main(); a same-named helper also appears in the sibling atlas scripts extract_repo_symbols.py and discover_features.py, each loading its own module copy.

Return type:

dict[str, Any]

Returns:

The parsed config as a dictionary, typically carrying swarm and canonical_features keys.

tools.feature_atlas.extract_features_swarm.load_symbols()[source]

Load the repo symbol index produced by Step 1a.

Reads the per-file symbol records (classes, functions, constants, imports, template/CSS/HTML metadata) that extract_repo_symbols.py writes to outputs/repo_symbols.json; these records are the raw evidence the swarm later maps onto canonical features. This is a pure filesystem read of _SYMBOLS_PATH parsed with json.load, with no Redis, knowledge graph, or network access.

Called by this module’s run_extraction(); a same-named helper also exists in the sibling discover_features.py script.

Return type:

list[dict[str, Any]]

Returns:

A list of per-file symbol record dictionaries.

Raises:

FileNotFoundError – If outputs/repo_symbols.json is missing, signalling that extract_repo_symbols.py has not been run yet.

async tools.feature_atlas.extract_features_swarm.run_extraction(config=None)[source]

Run the full Step 1b feature-extraction swarm end to end.

Orchestrates every stage in order: loads config (via load_config() when none is passed) and the symbol index (load_symbols()), buckets files by directory (_group_files_by_directory()), then fans out one _extract_features_for_group() extractor task per group under a shared asyncio.Semaphore and gathers them with return_exceptions=True so a failed group is logged rather than fatal. The collected mappings are folded into a registry by _aggregate_mappings() and enriched with LLM-written blurbs by _generate_descriptions(). Side effects are confined to the filesystem reads and the Gemini Flash HTTP/LLM calls those helpers make; no Redis or knowledge-graph access.

Called by this module’s async_main(). A same-named coroutine exists in other modules (build_kg.py, memories_port/import_memories.py) but those are unrelated functions with different signatures.

Parameters:

config (dict[str, Any] | None) – An optional pre-loaded atlas config; when None it is loaded from config.yaml via load_config().

Return type:

list[dict[str, Any]]

Returns:

The feature registry as a list of feature dictionaries, each with files, symbols, data stores, confidence, evidence, and a generated description.

Raises:

ValueError – If the config defines no canonical_features.

async tools.feature_atlas.extract_features_swarm.async_main()[source]

Async entry point: run the swarm and persist the feature registry.

Drives a single end-to-end run for command-line use. It loads the config, invokes run_extraction() to build the feature registry, writes the result as pretty-printed JSON to outputs/feature_registry.json (_OUTPUT_PATH, creating the parent directory if needed), and prints a summary banner with feature, confidence, file, symbol, and timing counts. The filesystem write and stdout output are its only side effects.

Called by this module’s main() through asyncio.run; not invoked by other modules (each atlas script defines its own async_main).

Return type:

None

tools.feature_atlas.extract_features_swarm.main()[source]

Sync entry point: configure logging and run async_main().

The console entry point for python -m tools.feature_atlas.extract_features_swarm. It sets up basic INFO-level logging with a timestamped format, then drives the async pipeline via asyncio.run(async_main()). Side effects are limited to global logging configuration and whatever async_main() performs.

Called by this module’s if __name__ == "__main__" guard.

Return type:

None