Service & Infrastructure Metrics (Prometheus)¶
Status: ✅ Operational — GET /metrics endpoint live, 8 metrics exported
Metrics are collected from the FastAPI inference service via src/app/metrics.py
and a _PrometheusMiddleware applied to all requests.
Available at: GET /metrics (Prometheus exposition format)
Exported metrics¶
API metrics (✅ live)¶
| Metric | Type | Description |
|---|---|---|
soccer_requests_total |
Counter | Total HTTP requests by endpoint and status |
soccer_request_duration_seconds |
Histogram | Request latency by endpoint (p50/p95/p99) |
soccer_errors_total |
Counter | Total 4xx/5xx errors |
Prediction metrics (✅ live)¶
| Metric | Type | Description |
|---|---|---|
soccer_predictions_total |
Counter | Total predictions served |
soccer_model_loaded |
Gauge | 1 if model is loaded, 0 otherwise |
soccer_model_version |
Gauge (label) | Currently loaded model version |
Celery metrics (✅ live)¶
| Metric | Type | Description |
|---|---|---|
soccer_celery_queue_length |
Gauge | Per-queue message count |
soccer_celery_workers_active |
Gauge | Active Celery worker count |
Celery runtime status is also available via REST:
- GET /monitoring/celery/queues
- GET /monitoring/celery/workers
Not yet implemented¶
- RabbitMQ queue metrics via dedicated exporter
- Kubernetes CPU / memory / pod restarts
- PostgreSQL query latency via pg_exporter
- Log aggregation (stdout only today)
Dashboards¶
Grafana dashboards for these metrics are planned — see Dashboards. Full coverage matrix: Monitoring Status