scrape_leafly

Leafly Strain Scraper – Harvest ALL strains into terpene_profiles.yaml.

Extracts strain data directly from Leafly’s __NEXT_DATA__ JSON embedded in listing pages. Each listing page contains ~18 strains with full terpene profiles, effects, cannabinoids, and metadata.

Total: ~9000 strains across ~500 pages.

# 💀🔥 scraping the entire weed bible 🌿 # # Usage: # python scrape_leafly.py # scrape ALL strains # python scrape_leafly.py –pages 5 # first 5 pages only # python scrape_leafly.py –merge # merge into terpene_profiles.yaml # python scrape_leafly.py –output my_strains.yaml

scrape_leafly.parse_listing_strain(raw)[source]

Parse a single strain from listing page __NEXT_DATA__.

Each strain object in the listing contains: - slug, name, category - terps: {terpene_name: {score: float}} - effects: {effect_name: {score: float}} - cannabinoids: {thc: {percentile50: float}, …}

Return type:

Optional[Dict[str, Any]]

Parameters:

raw (dict)

scrape_leafly.scrape_all_strains(max_pages=None, output_path='leafly_strains.yaml', page_delay=1.5)[source]

Scrape all Leafly strains from listing pages.

Each listing page’s __NEXT_DATA__ contains ~18 strains with terpene profiles, effects, and cannabinoid data. No need to visit individual strain pages.

Return type:

int

Parameters:
  • max_pages (int | None)

  • output_path (str)

  • page_delay (float)

scrape_leafly.merge_into_terpene_profiles(leafly_yaml_path, terpene_profiles_path)[source]

Merge scraped Leafly strains into terpene_profiles.yaml.

Only adds strains not already in the curated database. Returns count of new strains added.

Return type:

int

Parameters:
  • leafly_yaml_path (str)

  • terpene_profiles_path (str)

scrape_leafly.main()[source]