Service & Infrastructure Metrics (Prometheus)¶
Status: 📋 Planned (Architecture designed, implementation pending)
Monitoring philosophy¶
If a system cannot be measured, it cannot be operated or improved.
Prometheus is planned as the primary metrics backend for all runtime components.
Planned Monitored Components¶
Metrics will be collected from: - FastAPI inference service, - Celery workers, - RabbitMQ broker, - Redis cache (if enabled), - Kubernetes runtime.
Core Service Metrics (Planned)¶
API metrics¶
- request rate (RPS),
- latency (p50 / p95 / p99),
- error rate (4xx / 5xx),
- request payload size.
Worker metrics¶
- task execution time,
- task success/failure rate,
- retry count,
- concurrency.
Queue metrics¶
- queue depth,
- message age,
- consumer lag.
Infrastructure metrics¶
- CPU and memory utilization,
- pod restarts,
- container OOM events,
- node-level saturation.
Why these metrics matter¶
These signals will allow operators to: - detect overload conditions, - correlate failures with deployments, - reason about scaling and capacity.