KuberCoin Docs

Operations

Health-check contract, cold-start behaviour, and deployment notes.

Operations

This page describes the operational contract that every KuberCoin PHP surface honours. It exists so that orchestrator authors, on-call operators and CI integrators can rely on a single shape across all services.

Health checks

Every surface exposes two HTTP endpoints, both returning JSON:

  • GET /healthz — health probe. Served before bootstrap so the response time is < 5 ms even on a cold process. Returns {"status":"ok","service":"<name>"} with HTTP 200.
  • GET /readyz — readiness probe. Runs registered dependency checks (node RPC, database, etc.). Returns {"status":"ok|degraded|fail","service":"<name>","checks":{...}} with HTTP 200 when all checks pass and HTTP 503 otherwise.

The full schema lives in ops/contracts/health.openapi.yaml and is enforced on every CI run by tests-e2e-cross/health-readyz-contract.spec.ts.

Cold-start behaviour

PHP-FPM serves each surface with a small pool of long-lived workers. The first request to a freshly forked worker pays a one-time cost of ~50–150 ms covering Composer autoload, configuration parse and the first PDO connection. Subsequent requests on the same worker reuse the autoloader and the connection.

Two design choices keep cold-start invisible to health checks:

  1. /healthz is dispatched by the front controller before Composer autoload runs, so health checks never trigger autoload.
  2. When APCu is available, opcode and userland caches are warmed across worker generations, dropping the cold-start cost roughly in half. The process_apcu_available gauge in /metrics exposes this state per surface.

Metrics

Every surface exposes Prometheus text on GET /metrics. The baseline metric set is:

  • http_requests_total{service,route,method,status} — counter.
  • http_request_duration_seconds{service,route,method,status} — histogram.
  • process_apcu_available{service} — gauge, 1 when APCu is loaded, 0 otherwise.
  • process_start_seconds{service} — gauge, Unix timestamp of when the worker began serving requests.

Deployment notes

  • Configure orchestrator health probes to hit /healthz with a 1s timeout and a 3s period.
  • Configure readiness probes to hit /readyz with a 5s timeout and a 10s period; treat 503 as not-ready.
  • Scrape /metrics every 15s. Use service as the partitioning label in Grafana.
  • Roll workers when process_apcu_available drops to 0 unexpectedly — this indicates the extension was unloaded or rebuilt.