Serving Audit Report — SoccerPredictAI¶
Date: 2026-04-28
Auditor: GitHub Copilot (Claude Opus 4.7) — /skill-ml-system-audit full (audit 07/12)
Scope: FastAPI endpoints, model loading, Celery async, batch lookup, error handling
Baseline: docs/validation/20260424/07_serving_audit.md
Delta vs baseline¶
src/app/routers/, src/app/services/predict.py, src/app/tasks/predict.py, src/app/schemas/predict.py, src/app/worker_ml.py unchanged since 2026-04-26. Baseline findings remain in force.
Confirmed endpoint surface¶
13 endpoints across /predict, /monitoring, /livescores, /sources, /healthcheck, /metrics. Auth via X-Token (header / query) only on /sources/*. All /predict/* endpoints are unauthenticated.
POST /predict/ and GET /predict/{match_id} route to Celery ml queue with 30 s sync timeout; POST /predict/async/ returns task_id polled via GET /monitoring/task_status/{task_id} (Redis result backend).
Model loading: worker_process_init → PredictionService.load() → mlflow.pyfunc.load_model("models:/soccer_clf@champion") with thread-safe double-checked locking and lazy fallback. pyfunc → predict_proba fallback handles label vs probability output.
Batch lookup: FeatureLookupService — local file cache by mtime, MinIO LastModified re-check every FEATURE_CACHE_CHECK_INTERVAL (default 60 s), graceful degraded mode on MinIO unavailability.
Redis prediction cache: key predict:{match_id}:{run_id} (auto-invalidates on model change), TTL PREDICTION_CACHE_TTL (default 3600 s).
Celery predict_match: queue ml, max_retries=2, default_retry_delay=10, task_acks_late=True, task_reject_on_worker_lost=True, task_time_limit=3600.
Risk register (re-confirmed)¶
| ID | Severity | Description | Status |
|---|---|---|---|
| SRV-01 | P1 | No auth on /predict/* — open access |
Open |
| SRV-02 | P1 | No automatic model reload on champion alias change |
Open (= R3) |
| SRV-03 | P2 | GET /predict/matches/ returns list[dict] with no Pydantic response model |
Open |
| SRV-04 | P2 | FeatureLookupService._load() lacks threading.Lock on MinIO reload — concurrent double-load possible |
Open |
| SRV-05 | P2 | Retry window (2×10 s) + 30 s sync timeout race — late retry may execute after 504 returned | Open |
| SRV-06 | P2 | No explicit 503 when Celery broker unavailable | Open |
| SRV-07 | P3 | asyncio.get_event_loop() deprecated in 3.12 — should be asyncio.get_running_loop() |
Open |
Summary¶
| Aspect | Status |
|---|---|
| Endpoint inventory + Pydantic schemas | ✅ (except SRV-03) |
| Model load via registry alias, no hardcoded path | ✅ |
Cold-start prevention (worker_process_init) |
✅ |
| Redis prediction cache with run_id-scoped key | ✅ |
| Batch lookup with staleness-aware MinIO polling | ✅ |
Async flow (POST /predict/async/ + polling) |
✅ |
| Auth on prediction endpoints | ❌ (SRV-01) |
| Hot-reload of registered model | ❌ (SRV-02) |
| Concurrency-safe feature reload | ❌ (SRV-04) |
Recommendation: SRV-01 and SRV-02 are highest-impact; SRV-04 is a latent correctness issue under concurrent load (Locust scenarios should reproduce). Python-3.13 deprecation (SRV-07) should be addressed proactively.
See baseline §1–§5 for code-level detail.