scrape_leafly
Leafly Strain Scraper – Harvest ALL strains into terpene_profiles.yaml.
Extracts strain data directly from Leafly’s __NEXT_DATA__ JSON embedded in listing pages. Each listing page contains ~18 strains with full terpene profiles, effects, cannabinoids, and metadata.
Total: ~9000 strains across ~500 pages.
# 💀🔥 scraping the entire weed bible 🌿 # # Usage: # python scrape_leafly.py # scrape ALL strains # python scrape_leafly.py –pages 5 # first 5 pages only # python scrape_leafly.py –merge # merge into terpene_profiles.yaml # python scrape_leafly.py –output my_strains.yaml
- scrape_leafly.parse_listing_strain(raw)[source]
Parse a single strain from listing page __NEXT_DATA__.
Each strain object in the listing contains: - slug, name, category - terps: {terpene_name: {score: float}} - effects: {effect_name: {score: float}} - cannabinoids: {thc: {percentile50: float}, …}
- scrape_leafly.scrape_all_strains(max_pages=None, output_path='leafly_strains.yaml', page_delay=1.5)[source]
Scrape all Leafly strains from listing pages.
Each listing page’s __NEXT_DATA__ contains ~18 strains with terpene profiles, effects, and cannabinoid data. No need to visit individual strain pages.
- scrape_leafly.merge_into_terpene_profiles(leafly_yaml_path, terpene_profiles_path)[source]
Merge scraped Leafly strains into terpene_profiles.yaml.
Only adds strains not already in the curated database. Returns count of new strains added.
- scrape_leafly.main()[source]
Parse CLI arguments and drive a full scrape (and optional merge).
Defines the
--pages,--output,--merge, and--delaycommand-line options, runs the scrape, and – when--mergeis set and at least one strain was written – folds the new strains into the repo-localterpene_profiles.yaml(resolved relative to this file’s directory), warning if that file is absent.Interactions: builds an
argparse.ArgumentParser, callsscrape_all_strainswith the parsed options, conditionally callsmerge_into_terpene_profilesafter resolving the path viaos.path.dirname/os.path.abspath/os.path.exists, and logs through the modulelogger. Called only by theif __name__ == "__main__"guard at the bottom of the module; it is the script’s entry point and has no internal callers.