tools.feature_atlas.discover_features module
Discover features from an unknown codebase.
When no canonical_features are defined (or –discover-features is passed), this module uses the Gemini Flash swarm to autonomously discover and propose canonical features from the repo symbol index.
This is the “inhale protocol” – point at ANY repo and let the swarm identify organs from scratch.
- Usage:
python -m tools.feature_atlas.discover_features
# skull fire spider – THE SWARM DISCOVERS UNKNOWN ANATOMY
- tools.feature_atlas.discover_features.load_config()[source]
Load the Feature Atlas configuration from
config.yaml.Reads the module-level
_CONFIG_PATH(tools/feature_atlas/config.yaml) and parses it withyaml.safe_load. The returned mapping supplies swarm concurrency and model settings consumed downstream bydiscover_features()and the batch discovery calls. This is a pure filesystem read with no Redis, knowledge-graph, LLM, or HTTP side effects.Invoked by
async_main()in this module; no other internal callers were found.- Return type:
- Returns:
The parsed configuration as a dictionary.
- Raises:
FileNotFoundError – If
config.yamldoes not exist at the expected path.yaml.YAMLError – If the file is present but cannot be parsed as YAML.
- tools.feature_atlas.discover_features.load_symbols()[source]
Load the repository symbol index, failing loudly if it is missing.
Reads
outputs/repo_symbols.json(the module-level_SYMBOLS_PATH), the per-file symbol records emitted by the repo symbol extractor. If the file is absent, raises a descriptiveFileNotFoundErrorpointing the operator atextract_repo_symbols.py. The records feeddiscover_features(), which groups them by directory before handing them to the discovery swarm. This is a pure filesystem read with no external side effects.Invoked by
discover_features()in this module; no other internal callers were found.- Return type:
- Returns:
The list of per-file symbol records loaded from the symbol index JSON.
- Raises:
FileNotFoundError – If the symbol index file is missing, with guidance to run
extract_repo_symbols.pyfirst.json.JSONDecodeError – If the file contents are not valid JSON.
- async tools.feature_atlas.discover_features.discover_features(config)[source]
Drive the inhale-protocol pipeline that discovers features from a repo.
The top-level discovery routine: it loads the repo symbol index via
load_symbols(), groups files by top-level directory, renders each group with_build_directory_summary(), and batches roughly four directories per LLM call. It then fans those batches out concurrently through_discover_features_for_batch()(bounded by anasyncio.Semaphoresized from theswarm.max_concurrentconfig), gathers the raw candidates while logging any failed batch, and reconciles them through_merge_and_deduplicate(). Progress is logged at each stage. Side effects are the filesystem read of the symbol index and the outbound LLM/HTTP calls made by the swarm; this function itself does not touch Redis or the knowledge graph.Invoked by
async_main()in this module; no other internal callers were found.
- async tools.feature_atlas.discover_features.async_main()[source]
Run the feature-discovery step and persist its results to disk.
Orchestrates the discovery half of the atlas pipeline: it configures INFO logging, loads the atlas config via
load_config(), runs the fulldiscover_features()pipeline, and writes the proposed features tooutputs/discovered_features.json(re-encoding through UTF-8 with replacement to stay JSON-safe). It then prints a summary that includes the feature count, a per-category breakdown, timing, the output path, and a preview of the top discovered features. Side effects are the filesystem write of the output JSON plus the transitive LLM/HTTP calls and symbol-index read performed insidediscover_features().Invoked by
main()in this module’s__main__guard and imported asasync_mainbytools.feature_atlas.run_atlas.step_discover_features()(thediscover-featuresstep of the atlas runner); no other internal callers were found.- Return type:
- Returns:
None.
- tools.feature_atlas.discover_features.main()[source]
Synchronous entry point for the feature-discovery step.
Configures root logging at INFO level and drives the async pipeline by calling
asyncio.run(async_main()), which scans the symbol index, runs the Gemini Flash discovery swarm, and writesoutputs/discovered_features.json. All Redis, LLM/HTTP proxy, and filesystem side effects happen transitively insideasync_main()anddiscover_features(); this wrapper only sets up logging and starts the event loop.Invoked from the module’s
if __name__ == "__main__"guard viapython -m tools.feature_atlas.discover_features; no other internal callers were found.- Return type:
- Returns:
None.