Skip to content

Service & Infrastructure Metrics (Prometheus)¶

Status: 📋 Planned (Architecture designed, implementation pending)

Monitoring philosophy¶

If a system cannot be measured, it cannot be operated or improved.

Prometheus is planned as the primary metrics backend for all runtime components.

Planned Monitored Components¶

Metrics will be collected from: - FastAPI inference service, - Celery workers, - RabbitMQ broker, - Redis cache (if enabled), - Kubernetes runtime.

Core Service Metrics (Planned)¶

API metrics¶

request rate (RPS),
latency (p50 / p95 / p99),
error rate (4xx / 5xx),
request payload size.

Worker metrics¶

task execution time,
task success/failure rate,
retry count,
concurrency.

Queue metrics¶

queue depth,
message age,
consumer lag.

Infrastructure metrics¶

CPU and memory utilization,
pod restarts,
container OOM events,
node-level saturation.

Why these metrics matter¶

These signals will allow operators to: - detect overload conditions, - correlate failures with deployments, - reason about scaling and capacity.