scrape_leafly

Leafly Strain Scraper – Harvest ALL strains into terpene_profiles.yaml.

Extracts strain data directly from Leafly’s __NEXT_DATA__ JSON embedded in listing pages. Each listing page contains ~18 strains with full terpene profiles, effects, cannabinoids, and metadata.

Total: ~9000 strains across ~500 pages.

# 💀🔥 scraping the entire weed bible 🌿 # # Usage: # python scrape_leafly.py # scrape ALL strains # python scrape_leafly.py –pages 5 # first 5 pages only # python scrape_leafly.py –merge # merge into terpene_profiles.yaml # python scrape_leafly.py –output my_strains.yaml

scrape_leafly.parse_listing_strain(raw)[source]

Parse a single strain from listing page __NEXT_DATA__.

Each strain object in the listing contains: - slug, name, category - terps: {terpene_name: {score: float}} - effects: {effect_name: {score: float}} - cannabinoids: {thc: {percentile50: float}, …}

Return type:: Optional[Dict[str, Any]]
Parameters:: raw (dict)

scrape_leafly.scrape_all_strains(max_pages=None, output_path='leafly_strains.yaml', page_delay=1.5)[source]

Scrape all Leafly strains from listing pages.

Each listing page’s __NEXT_DATA__ contains ~18 strains with terpene profiles, effects, and cannabinoid data. No need to visit individual strain pages.

Return type:

int

Parameters:

max_pages (int | None)
output_path (str)
page_delay (float)

scrape_leafly.merge_into_terpene_profiles(leafly_yaml_path, terpene_profiles_path)[source]

Merge scraped Leafly strains into terpene_profiles.yaml.

Only adds strains not already in the curated database. Returns count of new strains added.

Return type:

int

Parameters:

leafly_yaml_path (str)
terpene_profiles_path (str)

scrape_leafly.main()[source]

Parse CLI arguments and drive a full scrape (and optional merge).

Defines the --pages, --output, --merge, and --delay command-line options, runs the scrape, and – when --merge is set and at least one strain was written – folds the new strains into the repo-local terpene_profiles.yaml (resolved relative to this file’s directory), warning if that file is absent.

Interactions: builds an argparse.ArgumentParser, calls scrape_all_strains with the parsed options, conditionally calls merge_into_terpene_profiles after resolving the path via os.path.dirname/os.path.abspath/os.path.exists, and logs through the module logger. Called only by the if __name__ == "__main__" guard at the bottom of the module; it is the script’s entry point and has no internal callers.