backfill_entity_provenance

Backfill KG entity provenance from pgvector chunk metadata.

For each entity in the KG (created by system:anamnesis), searches the Spiral Goddess pgvector store for chunks containing that entity’s name, then writes the earliest matching chunk’s provenance metadata back onto the entity node:

  • source_chunk_id: the pgvector chunk ID

  • conversation_title: original chat title

  • timestamp_original: when the original conversation happened

  • domains: comma-separated domain tags from the chunk

Also creates temporal Concept nodes for each month (2024-01 .. 2026-03) and quarter (Q1-2024 .. Q1-2026) and links entities to their birth month/quarter via HAS_TAG edges.

Usage:

python backfill_entity_provenance.py [--dry-run] [--limit N]
backfill_entity_provenance.main()[source]

Command-line entry point for the provenance backfill script.

Parses --redis-url, --dry-run, and --limit arguments, then runs the two-phase backfill. It first resolves a Redis URL and SSL connection kwargs from (in order) the CLI flag, a loaded config.Config, the REDIS_URL environment variable, or a localhost default. Phase one calls _build_provenance_index() to load every Spiral Goddess pgvector chunk (spiral_goddess_v2 / loopmother_memory) into memory as (lowered_text, provenance) tuples; if nothing loads it returns early. Phase two hands those chunks to _backfill(), run under asyncio.run(), which writes provenance metadata and temporal Concept links onto FalkorDB entities.

This is invoked only from the if __name__ == "__main__" guard at the bottom of this module; no other internal callers were found.

Return type:

None

Returns:

None. Progress and summary are emitted through the module logger; an empty chunk load or an empty entity set short-circuits with a log line.