tools.feature_atlas.extract_repo_symbols module
Step 1a: Multi-language repository symbol extraction.
Walks the stargazer-v3 repo and extracts symbols from:
- Python (.py): classes, functions, constants, imports, env vars via ast
- TypeScript/JavaScript (.ts, .tsx, .js, .jsx): exports, classes, functions,
interfaces, types, constants, imports, React components via regex
CSS (.css): class selectors, custom properties, keyframes
HTML (.html): component references, element IDs
YAML (.yaml, .yml): top-level keys
Jinja2 (.j2): template variables, block names
JSON (.json): top-level keys
Outputs outputs/repo_symbols.json which the feature extraction swarm
uses as its evidence base.
- Usage:
python -m tools.feature_atlas.extract_repo_symbols
# fire skull spider – CATALOGING ALL THE BONES IN EVERY LANGUAGE
- tools.feature_atlas.extract_repo_symbols.load_config()[source]
Load the Feature Atlas configuration from
config.yaml.Reads the module-level
_CONFIG_PATH(tools/feature_atlas/config.yaml) and parses it withyaml.safe_load, returning the scan settings (include extensions, excluded dirs and files) that drivescan_repository(). This is a pure filesystem read with no Redis, knowledge-graph, LLM, or HTTP side effects.Invoked by
main()in this module. The sameload_configname is also defined and used in the sibling atlas steps (discover_features.pyandextract_features_swarm.py), but those are each their own module-level function, not this one.- Return type:
- Returns:
The parsed configuration as a dictionary.
- Raises:
FileNotFoundError – If
config.yamldoes not exist at the expected path.yaml.YAMLError – If the file is present but cannot be parsed as YAML.
- tools.feature_atlas.extract_repo_symbols.scan_repository(config)[source]
Scan the entire repository and extract symbols from all files.
Returns a list of file symbol records.
- tools.feature_atlas.extract_repo_symbols.main()[source]
Run the full symbol-extraction pass and write the evidence base.
The synchronous entry point for atlas step 1a: it configures logging, loads the config via
load_config(), scans the repo withscan_repository(), then serializes the collected records tooutputs/repo_symbols.json(the module-level_OUTPUT_PATH), re-encoding through UTF-8 with replacement to strip any lone surrogate characters thaterrors="replace"reads can leave behind. Finally it prints aggregate counts (files by type, classes, functions, constants, imports, env vars, and TS/JS totals) and elapsed time to stdout. Side effects are limited to logging, creating the output directory, writing the JSON file, and printing; no Redis, knowledge-graph, LLM, or HTTP calls.Called by the module’s
__main__guard at the bottom of this file and dispatched as a subprocess step byrun_atlas.py(which imports it asrun). The downstream feature-extraction swarm consumes therepo_symbols.jsonit produces.- Return type: