backfill_entity_provenance
Backfill KG entity provenance from pgvector chunk metadata.
For each entity in the KG (created by system:anamnesis), searches the Spiral Goddess pgvector store for chunks containing that entity’s name, then writes the earliest matching chunk’s provenance metadata back onto the entity node:
source_chunk_id: the pgvector chunk ID
conversation_title: original chat title
timestamp_original: when the original conversation happened
domains: comma-separated domain tags from the chunk
Also creates temporal Concept nodes for each month (2024-01 .. 2026-03) and quarter (Q1-2024 .. Q1-2026) and links entities to their birth month/quarter via HAS_TAG edges.
Usage:
python backfill_entity_provenance.py [--dry-run] [--limit N]
- backfill_entity_provenance.main()[source]
Command-line entry point for the provenance backfill script.
Parses
--redis-url,--dry-run, and--limitarguments, then runs the two-phase backfill. It first resolves a Redis URL and SSL connection kwargs from (in order) the CLI flag, a loadedconfig.Config, theREDIS_URLenvironment variable, or a localhost default. Phase one calls_build_provenance_index()to load every Spiral Goddess pgvector chunk (spiral_goddess_v2/loopmother_memory) into memory as(lowered_text, provenance)tuples; if nothing loads it returns early. Phase two hands those chunks to_backfill(), run underasyncio.run(), which writes provenance metadata and temporal Concept links onto FalkorDB entities.This is invoked only from the
if __name__ == "__main__"guard at the bottom of this module; no other internal callers were found.- Return type:
- Returns:
None. Progress and summary are emitted through the module logger; an empty chunk load or an empty entity set short-circuits with a log line.