core.control_ops module
Fleet-wide control-ops: restart / git-pull across every Stargazer service.
The monolith is gone, so an admin !restart_* / !bot_pull command can no
longer just act on one process. Instead the gateway authorizes the command
and broadcasts a small JSON op over the Redis pub/sub channel
CONTROL_OPS_CHANNEL. Every service runs a ControlOpsDaemon
(wired into core.service_base.StargazerService) that decides whether the
op applies to it and self-restarts / self-pulls accordingly.
Why pub/sub (not a stream): a restart op must fan out to all live instances
at once and must NOT be replayed by a service that was down (that would cause a
restart storm on recovery). Pub/sub’s “live subscribers only, no persistence” is
exactly the right delivery guarantee — and it matches the existing
sg:channel:cancel control channel.
Ordering guarantee: the gateway holds the live Discord connection, so it always restarts last (a longer grace) — its own daemon self-restarts only after the publisher has flushed the ACK and aggregated replies.
- core.control_ops.control_op_for(text)[source]
Return
(op, target)for a cluster control command, or None.
- core.control_ops.is_control_ops_command(text)[source]
True if text is a fleet-wide control-ops command.
- core.control_ops.unit_name_for(config, service_name)[source]
Resolve the systemd unit for service_name (explicit map → prefix+name).
- core.control_ops.fleet_units(config=None)[source]
Resolve every
SERVICE_TIERStier to its systemd unit, in order.The single source of truth for “all the Stargazer service units” — the live microservices that replaced the retired
stargazer/stargazer-swarmmonolith. Each tier is resolved throughunit_name_for(), so the result honours a deployment’scontrol_unit_prefix/control_unit_namesoverrides; with the defaults it yieldsstargazer-gateway,stargazer-inference,stargazer-agents,stargazer-consolidation,stargazer-web. config may beNone(callers without a Config in hand), in which case the default prefix applies. Pure; no I/O.Used by the admin journal tail (
Config.resolved_journal_units), theread_service_logstool, the log→RAG ingest task, and thebot_controlrestart tools so they all target the same fleet.
- class core.control_ops.ControlOpsDaemon(svc)[source]
Bases:
objectSubscribes to
CONTROL_OPS_CHANNELand acts on matching ops.One runs in every service (started by
StargazerService.boot). It mirrors the worker cancellation daemon’s pub/sub loop, but lives at the service-base level so all five tiers participate in fleet-wide restart / pull.
- async core.control_ops.dispatch_control_op(redis, config, *, op, target, requested_by, send, trace_id='')[source]
Publish a control op, aggregate per-service acks, and report via send.
Sends an immediate ACK, broadcasts the op on
CONTROL_OPS_CHANNEL, collects ack lines on a per-request reply list forcontrol_reply_timeoutseconds, then sends and returns a roster string.