tools.feature_atlas.discover_features module

Discover features from an unknown codebase.

When no canonical_features are defined (or –discover-features is passed), this module uses the Gemini Flash swarm to autonomously discover and propose canonical features from the repo symbol index.

This is the “inhale protocol” – point at ANY repo and let the swarm identify organs from scratch.

Usage:

python -m tools.feature_atlas.discover_features

# skull fire spider – THE SWARM DISCOVERS UNKNOWN ANATOMY

tools.feature_atlas.discover_features.load_config()[source]

Load the Feature Atlas configuration from config.yaml.

Reads the module-level _CONFIG_PATH (tools/feature_atlas/config.yaml) and parses it with yaml.safe_load. The returned mapping supplies swarm concurrency and model settings consumed downstream by discover_features() and the batch discovery calls. This is a pure filesystem read with no Redis, knowledge-graph, LLM, or HTTP side effects.

Invoked by async_main() in this module; no other internal callers were found.

Return type:

dict[str, Any]

Returns:

The parsed configuration as a dictionary.

Raises:
  • FileNotFoundError – If config.yaml does not exist at the expected path.

  • yaml.YAMLError – If the file is present but cannot be parsed as YAML.

tools.feature_atlas.discover_features.load_symbols()[source]

Load the repository symbol index, failing loudly if it is missing.

Reads outputs/repo_symbols.json (the module-level _SYMBOLS_PATH), the per-file symbol records emitted by the repo symbol extractor. If the file is absent, raises a descriptive FileNotFoundError pointing the operator at extract_repo_symbols.py. The records feed discover_features(), which groups them by directory before handing them to the discovery swarm. This is a pure filesystem read with no external side effects.

Invoked by discover_features() in this module; no other internal callers were found.

Return type:

list[dict[str, Any]]

Returns:

The list of per-file symbol records loaded from the symbol index JSON.

Raises:
async tools.feature_atlas.discover_features.discover_features(config)[source]

Drive the inhale-protocol pipeline that discovers features from a repo.

The top-level discovery routine: it loads the repo symbol index via load_symbols(), groups files by top-level directory, renders each group with _build_directory_summary(), and batches roughly four directories per LLM call. It then fans those batches out concurrently through _discover_features_for_batch() (bounded by an asyncio.Semaphore sized from the swarm.max_concurrent config), gathers the raw candidates while logging any failed batch, and reconciles them through _merge_and_deduplicate(). Progress is logged at each stage. Side effects are the filesystem read of the symbol index and the outbound LLM/HTTP calls made by the swarm; this function itself does not touch Redis or the knowledge graph.

Invoked by async_main() in this module; no other internal callers were found.

Parameters:

config (dict[str, Any]) – The atlas configuration controlling swarm concurrency and the model settings passed down to the Gemini client.

Return type:

list[dict[str, Any]]

Returns:

The final, merged list of discovered feature dictionaries.

async tools.feature_atlas.discover_features.async_main()[source]

Run the feature-discovery step and persist its results to disk.

Orchestrates the discovery half of the atlas pipeline: it configures INFO logging, loads the atlas config via load_config(), runs the full discover_features() pipeline, and writes the proposed features to outputs/discovered_features.json (re-encoding through UTF-8 with replacement to stay JSON-safe). It then prints a summary that includes the feature count, a per-category breakdown, timing, the output path, and a preview of the top discovered features. Side effects are the filesystem write of the output JSON plus the transitive LLM/HTTP calls and symbol-index read performed inside discover_features().

Invoked by main() in this module’s __main__ guard and imported as async_main by tools.feature_atlas.run_atlas.step_discover_features() (the discover-features step of the atlas runner); no other internal callers were found.

Return type:

None

Returns:

None.

tools.feature_atlas.discover_features.main()[source]

Synchronous entry point for the feature-discovery step.

Configures root logging at INFO level and drives the async pipeline by calling asyncio.run(async_main()), which scans the symbol index, runs the Gemini Flash discovery swarm, and writes outputs/discovered_features.json. All Redis, LLM/HTTP proxy, and filesystem side effects happen transitively inside async_main() and discover_features(); this wrapper only sets up logging and starts the event loop.

Invoked from the module’s if __name__ == "__main__" guard via python -m tools.feature_atlas.discover_features; no other internal callers were found.

Return type:

None

Returns:

None.