tools.feature_atlas.extract_repo_symbols module

Step 1a: Multi-language repository symbol extraction.

Walks the stargazer-v3 repo and extracts symbols from: - Python (.py): classes, functions, constants, imports, env vars via ast - TypeScript/JavaScript (.ts, .tsx, .js, .jsx): exports, classes, functions,

interfaces, types, constants, imports, React components via regex

  • CSS (.css): class selectors, custom properties, keyframes

  • HTML (.html): component references, element IDs

  • YAML (.yaml, .yml): top-level keys

  • Jinja2 (.j2): template variables, block names

  • JSON (.json): top-level keys

Outputs outputs/repo_symbols.json which the feature extraction swarm uses as its evidence base.

Usage:

python -m tools.feature_atlas.extract_repo_symbols

# fire skull spider – CATALOGING ALL THE BONES IN EVERY LANGUAGE

tools.feature_atlas.extract_repo_symbols.load_config()[source]

Load the Feature Atlas configuration from config.yaml.

Reads the module-level _CONFIG_PATH (tools/feature_atlas/config.yaml) and parses it with yaml.safe_load, returning the scan settings (include extensions, excluded dirs and files) that drive scan_repository(). This is a pure filesystem read with no Redis, knowledge-graph, LLM, or HTTP side effects.

Invoked by main() in this module. The same load_config name is also defined and used in the sibling atlas steps (discover_features.py and extract_features_swarm.py), but those are each their own module-level function, not this one.

Return type:

dict[str, Any]

Returns:

The parsed configuration as a dictionary.

Raises:
  • FileNotFoundError – If config.yaml does not exist at the expected path.

  • yaml.YAMLError – If the file is present but cannot be parsed as YAML.

tools.feature_atlas.extract_repo_symbols.scan_repository(config)[source]

Scan the entire repository and extract symbols from all files.

Returns a list of file symbol records.

Return type:

list[dict[str, Any]]

Parameters:

config (dict[str, Any])

tools.feature_atlas.extract_repo_symbols.main()[source]

Run the full symbol-extraction pass and write the evidence base.

The synchronous entry point for atlas step 1a: it configures logging, loads the config via load_config(), scans the repo with scan_repository(), then serializes the collected records to outputs/repo_symbols.json (the module-level _OUTPUT_PATH), re-encoding through UTF-8 with replacement to strip any lone surrogate characters that errors="replace" reads can leave behind. Finally it prints aggregate counts (files by type, classes, functions, constants, imports, env vars, and TS/JS totals) and elapsed time to stdout. Side effects are limited to logging, creating the output directory, writing the JSON file, and printing; no Redis, knowledge-graph, LLM, or HTTP calls.

Called by the module’s __main__ guard at the bottom of this file and dispatched as a subprocess step by run_atlas.py (which imports it as run). The downstream feature-extraction swarm consumes the repo_symbols.json it produces.

Return type:

None