core.control_ops module

Fleet-wide control-ops: restart / git-pull across every Stargazer service.

The monolith is gone, so an admin !restart_* / !bot_pull command can no longer just act on one process. Instead the gateway authorizes the command and broadcasts a small JSON op over the Redis pub/sub channel CONTROL_OPS_CHANNEL. Every service runs a ControlOpsDaemon (wired into core.service_base.StargazerService) that decides whether the op applies to it and self-restarts / self-pulls accordingly.

Why pub/sub (not a stream): a restart op must fan out to all live instances at once and must NOT be replayed by a service that was down (that would cause a restart storm on recovery). Pub/sub’s “live subscribers only, no persistence” is exactly the right delivery guarantee — and it matches the existing sg:channel:cancel control channel.

Ordering guarantee: the gateway holds the live Discord connection, so it always restarts last (a longer grace) — its own daemon self-restarts only after the publisher has flushed the ACK and aggregated replies.

core.control_ops.control_op_for(text)[source]

Return (op, target) for a cluster control command, or None.

Return type:

Optional[tuple[str, str]]

Parameters:

text (str)

core.control_ops.is_control_ops_command(text)[source]

True if text is a fleet-wide control-ops command.

Return type:

bool

Parameters:

text (str)

core.control_ops.unit_name_for(config, service_name)[source]

Resolve the systemd unit for service_name (explicit map → prefix+name).

Return type:

str

Parameters:

service_name (str)

core.control_ops.fleet_units(config=None)[source]

Resolve every SERVICE_TIERS tier to its systemd unit, in order.

The single source of truth for “all the Stargazer service units” — the live microservices that replaced the retired stargazer / stargazer-swarm monolith. Each tier is resolved through unit_name_for(), so the result honours a deployment’s control_unit_prefix / control_unit_names overrides; with the defaults it yields stargazer-gateway, stargazer-inference, stargazer-agents, stargazer-consolidation, stargazer-web. config may be None (callers without a Config in hand), in which case the default prefix applies. Pure; no I/O.

Used by the admin journal tail (Config.resolved_journal_units), the read_service_logs tool, the log→RAG ingest task, and the bot_control restart tools so they all target the same fleet.

Parameters:

config – Optional Config-like object carrying control_unit_prefix / control_unit_names; None falls back to the default prefix.

Returns:

Resolved systemd unit names, one per tier, in SERVICE_TIERS order.

Return type:

list[str]

class core.control_ops.ControlOpsDaemon(svc)[source]

Bases: object

Subscribes to CONTROL_OPS_CHANNEL and acts on matching ops.

One runs in every service (started by StargazerService.boot). It mirrors the worker cancellation daemon’s pub/sub loop, but lives at the service-base level so all five tiers participate in fleet-wide restart / pull.

async run()[source]
Return type:

None

async core.control_ops.dispatch_control_op(redis, config, *, op, target, requested_by, send, trace_id='')[source]

Publish a control op, aggregate per-service acks, and report via send.

Sends an immediate ACK, broadcasts the op on CONTROL_OPS_CHANNEL, collects ack lines on a per-request reply list for control_reply_timeout seconds, then sends and returns a roster string.

Return type:

str

Parameters:
async core.control_ops.format_service_roster(redis)[source]

Render the live service registry for the !services command.

Return type:

str