build_kg

Standalone script to build knowledge graph entries from channel messages.

Fetches the last N messages (default 1000) from a channel via Redis cache first, falling back to the platform API. Sends ALL messages plus the entire existing knowledge graph to gemini-3-flash-preview in a single call, then presents the proposed entities/relationships for human approval before committing to FalkorDB.

Usage:

python build_kg.py –platform discord –channel 123456789 python build_kg.py –platform discord –channel 123456789 –guild 987

async build_kg.fetch_messages_redis(cache, platform, channel_id, count)[source]

Pull up to count messages from the Redis sorted-set message cache.

Wraps MessageCache.get_recent (which returns newest-first) and reverses the result into chronological order so downstream extraction sees the conversation as it happened. Any failure reading the cache is logged and swallowed, returning an empty list so the caller can fall back to the platform API. Reads Redis via the cache; does no other I/O.

Called by gather_messages in this module as the Redis-first leg of message collection.

Parameters:
  • cache (MessageCache) – The MessageCache wrapping the Redis client.

  • platform (str) – Platform name (e.g. discord) used to namespace the cache.

  • channel_id (str) – Channel whose messages to fetch.

  • count (int) – Maximum number of messages to retrieve.

Returns:

Up to count cached messages in chronological order, or an empty list on error.

Return type:

list[CachedMessage]

async build_kg.fetch_messages_discord(token, channel_id, limit)[source]

Fetch messages directly from the Discord API using discord.py.

Returns dicts with keys: user_id, user_name, text, timestamp (float).

Return type:

list[dict[str, Any]]

Parameters:
async build_kg.gather_messages(cache, platform, channel_id, count, cfg)[source]

Collect up to count messages, Redis-first with API fallback.

Returns a chronologically-ordered list of message dicts with keys: user_id, user_name, text, timestamp.

Return type:

list[dict[str, Any]]

Parameters:
async build_kg.dump_full_graph(kg)[source]

Serialize the entire knowledge graph into a human-readable text block for the LLM.

Loads up to 10,000 entities and 10,000 relationships from FalkorDB via the KnowledgeGraphManager and renders them as a labeled, indented listing (entities with type/category/scope/description, relationships as source -[REL]-> target). This block is prepended to the extraction prompt so the model can reference existing nodes instead of duplicating them. Returns a short placeholder when the graph is empty. Reads from the knowledge graph (FalkorDB) but writes nothing.

Called by run in this module to assemble the graph context before extraction.

Parameters:

kg (KnowledgeGraphManager) – The knowledge graph manager to read entities and relationships from.

Returns:

A formatted, multi-line text block describing the current graph.

Return type:

str

build_kg.build_extraction_prompt(conversation_text, graph_context)[source]

Assemble the chat-format messages for the knowledge-graph extraction LLM call.

Pairs a system instruction (output only valid JSON, do not duplicate existing graph entities) with a user message that stitches together the existing-graph context, the shared EXTRACTION_PROMPT from kg_extraction, and the formatted conversation text. The layout primes the model to reference existing nodes by name rather than re-creating them. Pure string assembly with no I/O.

Called by run_extraction in this module immediately before the OpenRouter chat request.

Parameters:
  • conversation_text (str) – The formatted conversation transcript to extract from.

  • graph_context (str) – The serialized existing graph from dump_full_graph.

Returns:

A two-element [system, user] messages list.

Return type:

list[dict[str, str]]

async build_kg.run_extraction(openrouter, conversation_text, graph_context)[source]

Call the LLM to extract entities and relationships.

Returns {“entities”: […], “relationships”: […]}.

Return type:

dict[str, list[dict]]

Parameters:
build_kg.format_entity(idx, ent)[source]

Render one proposed entity as a numbered, human-readable approval line.

Produces a single e<N>. line (1-based label from the 0-based idx) showing the entity’s type, name, category, optional user-id scope, and description, so the operator can review it in the terminal before approving. Pure string formatting with no I/O.

Called by prompt_approval in this module when listing proposed entities.

Parameters:
  • idx (int) – Zero-based position of the entity; displayed as idx + 1.

  • ent (dict) – The proposed entity dict from the LLM extraction.

Returns:

A formatted single-line description of the entity.

Return type:

str

build_kg.format_relationship(idx, rel)[source]

Render one proposed relationship as a numbered, human-readable approval line.

Produces a single r<N>. line (1-based label from the 0-based idx) showing the source -[relation]-> target shape, the model’s confidence, and the description, so the operator can review it before approving. Pure string formatting with no I/O.

Called by prompt_approval in this module when listing proposed relationships.

Parameters:
  • idx (int) – Zero-based position of the relationship; displayed as idx + 1.

  • rel (dict) – The proposed relationship dict from the LLM extraction.

Returns:

A formatted single-line description of the relationship.

Return type:

str

build_kg.prompt_approval(entities, relationships, num_messages)[source]

Display proposed entries and return approved indices.

Returns (entity_indices, relationship_indices) or None to quit. Entity/relationship indices are 0-based.

Return type:

tuple[list[int], list[int]] | None

Parameters:
async build_kg.commit_entities(kg, entities, approved_indices, channel_id, entity_uuid_lookup)[source]

Resolve-or-create approved entities. Returns count committed.

Populates entity_uuid_lookup with name->uuid mappings.

Return type:

int

Parameters:
async build_kg.commit_relationships(kg, relationships, approved_indices, entity_uuid_lookup)[source]

Persist the operator-approved relationships into the knowledge graph.

Iterates the approved indices and, for each relationship, resolves its source and target entity UUIDs: first from entity_uuid_lookup (populated by commit_entities), then falling back to _guess_uuid for endpoints that already existed in the graph. Relationships with a missing name or unresolvable endpoint are skipped with a printed notice. Resolved edges are written via KnowledgeGraphManager.add_relationship using the model’s confidence as the edge weight, which creates or reinforces the edge. Reads and writes the knowledge graph (FalkorDB) and prints progress/errors to stdout; individual failures are caught so one bad edge does not abort the rest.

Called by run in this module after commit_entities, and mirrored by the equivalent step in memories_port/import_memories.py.

Parameters:
  • kg (KnowledgeGraphManager) – The knowledge graph manager used to resolve and add edges.

  • relationships (list[dict]) – All proposed relationship dicts from the extraction.

  • approved_indices (list[int]) – Zero-based indices of the relationships the operator approved.

  • entity_uuid_lookup (dict[str, str]) – Name (lowercased) to UUID map from commit_entities.

Returns:

The number of relationships successfully committed.

Return type:

int

build_kg.format_conversation(messages)[source]

Flatten the gathered message dicts into a single transcript string for the LLM.

Renders each message as one [ISO-timestamp] user_name (user_id): text line, with the epoch timestamp converted to a UTC ISO string, and joins them with newlines. The result becomes the conversation_text fed into build_extraction_prompt. Pure string formatting with no I/O.

Called by run in this module once messages have been gathered.

Parameters:

messages (list[dict[str, Any]]) – Chronologically ordered message dicts with timestamp, user_name, user_id, and text keys.

Returns:

The newline-joined conversation transcript.

Return type:

str

async build_kg.run(args)[source]

Drive the full build-KG pipeline end to end for one channel.

Loads Config, validates that redis_url and api_key are present (exiting if not), and wires up an OpenRouterClient (gemini-3-flash-preview), a MessageCache, and a KnowledgeGraphManager (ensuring its FalkorDB indexes). It then gathers up to args.count messages (Redis-first, Discord API fallback) via gather_messages, dumps the existing graph with dump_full_graph, formats the transcript, makes a single LLM extraction call through run_extraction, presents the proposals for interactive approval via prompt_approval, and commits the approved entities and relationships with commit_entities and commit_relationships. Progress and a final summary are printed to stdout; the message cache is closed on every exit path.

This is the script’s async entry point: it touches Redis, the platform API, the LLM over HTTP, FalkorDB, and stdin/stdout, and may call sys.exit on misconfiguration. Invoked once by main via asyncio.run(run(args)).

Parameters:

args (Namespace) – Parsed CLI arguments (platform, channel, count, etc.) from main.

Return type:

None

Returns:

None.

build_kg.main()[source]

Parse command-line arguments, configure logging, and launch the async pipeline.

Builds the argparse parser for the script’s flags (--platform and --channel required, plus --guild, --count, and --verbose), sets up root logging at DEBUG or WARNING depending on the verbosity flag, and hands control to the async run coroutine via asyncio.run. This is the synchronous CLI entry point.

Called from the __main__ guard at the bottom of this module when the script is run directly (python build_kg.py ...).

Return type:

None

Returns:

None.