Skip to content

Service & Infrastructure Metrics (Prometheus)

Status: 📋 Planned (Architecture designed, implementation pending)

Monitoring philosophy

If a system cannot be measured, it cannot be operated or improved.

Prometheus is planned as the primary metrics backend for all runtime components.


Planned Monitored Components

Metrics will be collected from: - FastAPI inference service, - Celery workers, - RabbitMQ broker, - Redis cache (if enabled), - Kubernetes runtime.


Core Service Metrics (Planned)

API metrics

  • request rate (RPS),
  • latency (p50 / p95 / p99),
  • error rate (4xx / 5xx),
  • request payload size.

Worker metrics

  • task execution time,
  • task success/failure rate,
  • retry count,
  • concurrency.

Queue metrics

  • queue depth,
  • message age,
  • consumer lag.

Infrastructure metrics

  • CPU and memory utilization,
  • pod restarts,
  • container OOM events,
  • node-level saturation.

Why these metrics matter

These signals will allow operators to: - detect overload conditions, - correlate failures with deployments, - reason about scaling and capacity.