core.health_server module

Liveness / readiness HTTP probes for orchestrators.

HealthServer exposes /healthz (liveness) and /readyz (readiness) on a local aiohttp server, backed by a cached Redis-ping loop that flips to unhealthy only after several consecutive failures, so probe traffic never hammers Redis. Started automatically by StargazerService.

class core.health_server.HealthServer(redis, port=9090, ping_interval=5.0, failure_threshold=3)[source]

Bases: object

Local HTTP liveness/readiness probe server backed by cached Redis health.

Runs a tiny aiohttp server bound to 127.0.0.1 that answers /healthz (liveness) and /readyz (readiness) for an orchestrator such as Kubernetes or a process supervisor. Rather than pinging Redis on every probe request, a background loop pings periodically and caches a single _is_healthy flag that only flips to unhealthy after several consecutive failures, so health checks stay cheap and resistant to transient blips while still reflecting a genuine Redis outage.

One instance is created per service by core.service_base.StargazerService when health probing is enabled, which calls start() during service startup and stop() on shutdown; the probe handlers and ping logic are also exercised by tests/core/migration/test_health_server.py.

Parameters:
  • port (int)

  • ping_interval (float)

  • failure_threshold (int)

__init__(redis, port=9090, ping_interval=5.0, failure_threshold=3)[source]

Initialize the health server’s state and aiohttp routing.

Stores the Redis client and probe-tuning parameters, seeds the cached health state to healthy, and builds the aiohttp Application with GET /healthz -> healthz() and GET /readyz -> readyz() routes plus an AppRunner. No socket is bound and no background task is spawned here; that happens in start().

This constructor only registers routes against the bound methods and creates the runner; it performs no I/O. The Redis client is retained for the later ping loop (_check_health_once()) but is not contacted yet. Called by core.service_base.StargazerService.__init__, which instantiates one HealthServer per service when use_health_server is set and a Redis client is present (port read from SG_HEALTH_PORT); also constructed directly in the migration tests.

Parameters:
  • redis – An async Redis client (redis.asyncio) whose ping() is polled by the background loop to derive the cached health state.

  • port (int) – TCP port to bind the probe server on 127.0.0.1. Defaults to 9090.

  • ping_interval (float) – Seconds to sleep between Redis pings in the background loop. Defaults to 5.0.

  • failure_threshold (int) – Number of consecutive failed pings required before the cached state flips to unhealthy. Defaults to 3.

async healthz(request)[source]

Answer the GET /healthz liveness probe from cached health state.

Returns 200 OK while the cached _is_healthy flag is set and 503 Service Unavailable once it has flipped, signalling the orchestrator that the container is wedged and should be restarted. The flag is maintained out-of-band by the background ping loop, so this handler does no I/O and never touches Redis itself, keeping probes fast and cheap.

Registered as the GET /healthz route in __init__() and invoked by aiohttp for each liveness request; also called directly by tests/core/migration/test_health_server.py.

Parameters:

request – The incoming aiohttp request (unused; present to satisfy the handler signature).

Returns:

A 200 aiohttp.web.Response when healthy, otherwise a 503.

async readyz(request)[source]

Answer the GET /readyz readiness probe from cached health state.

Returns 200 Ready while the cached _is_healthy flag is set and 503 Service Unavailable otherwise, telling the orchestrator whether this instance should receive traffic. Readiness shares the same cached Redis-health signal as liveness here, so a service that cannot reach Redis is pulled out of rotation; like healthz() this handler reads only the cached flag and performs no I/O.

Registered as the GET /readyz route in __init__() and invoked by aiohttp for each readiness request; also called directly by tests/core/migration/test_health_server.py.

Parameters:

request – The incoming aiohttp request (unused; present to satisfy the handler signature).

Returns:

A 200 aiohttp.web.Response when ready, otherwise a 503.

async start()[source]

Bind the probe server to localhost and launch the Redis ping loop.

Sets up the aiohttp AppRunner, binds a TCPSite to 127.0.0.1 on the configured port (deliberately non-routable from outside the host so the probes are not publicly exposed), and spawns _ping_redis_loop() as a background asyncio task stored on self._ping_task. After this returns the /healthz and /readyz endpoints are live and the cached health state begins refreshing on its own.

Performs network setup (listening socket) and logs health_server_started. Called by core.service_base.StargazerService() during service startup, and by tests/core/migration/test_health_server.py.

async stop()[source]

Cancel the ping loop and tear down the aiohttp probe server.

Performs an orderly shutdown: cancels the background _ping_task and awaits it (absorbing the expected asyncio.CancelledError), then cleans up the aiohttp runner so the listening socket is released. Safe to call even if start() left some piece uninitialised, since the task and runner are guarded before use.

Releases the network resources acquired by start() and logs health_server_stopped. Called by core.service_base.StargazerService() during service shutdown, and by tests/core/migration/test_health_server.py.