Skip to content

Serving Audit Report — SoccerPredictAI

Date: 2026-04-28 Auditor: GitHub Copilot (Claude Opus 4.7) — /skill-ml-system-audit full (audit 07/12) Scope: FastAPI endpoints, model loading, Celery async, batch lookup, error handling Baseline: docs/validation/20260424/07_serving_audit.md


Delta vs baseline

src/app/routers/, src/app/services/predict.py, src/app/tasks/predict.py, src/app/schemas/predict.py, src/app/worker_ml.py unchanged since 2026-04-26. Baseline findings remain in force.


Confirmed endpoint surface

13 endpoints across /predict, /monitoring, /livescores, /sources, /healthcheck, /metrics. Auth via X-Token (header / query) only on /sources/*. All /predict/* endpoints are unauthenticated.

POST /predict/ and GET /predict/{match_id} route to Celery ml queue with 30 s sync timeout; POST /predict/async/ returns task_id polled via GET /monitoring/task_status/{task_id} (Redis result backend).

Model loading: worker_process_initPredictionService.load()mlflow.pyfunc.load_model("models:/soccer_clf@champion") with thread-safe double-checked locking and lazy fallback. pyfunc → predict_proba fallback handles label vs probability output.

Batch lookup: FeatureLookupService — local file cache by mtime, MinIO LastModified re-check every FEATURE_CACHE_CHECK_INTERVAL (default 60 s), graceful degraded mode on MinIO unavailability.

Redis prediction cache: key predict:{match_id}:{run_id} (auto-invalidates on model change), TTL PREDICTION_CACHE_TTL (default 3600 s).

Celery predict_match: queue ml, max_retries=2, default_retry_delay=10, task_acks_late=True, task_reject_on_worker_lost=True, task_time_limit=3600.


Risk register (re-confirmed)

ID Severity Description Status
SRV-01 P1 No auth on /predict/* — open access Open
SRV-02 P1 No automatic model reload on champion alias change Open (= R3)
SRV-03 P2 GET /predict/matches/ returns list[dict] with no Pydantic response model Open
SRV-04 P2 FeatureLookupService._load() lacks threading.Lock on MinIO reload — concurrent double-load possible Open
SRV-05 P2 Retry window (2×10 s) + 30 s sync timeout race — late retry may execute after 504 returned Open
SRV-06 P2 No explicit 503 when Celery broker unavailable Open
SRV-07 P3 asyncio.get_event_loop() deprecated in 3.12 — should be asyncio.get_running_loop() Open

Summary

Aspect Status
Endpoint inventory + Pydantic schemas ✅ (except SRV-03)
Model load via registry alias, no hardcoded path
Cold-start prevention (worker_process_init)
Redis prediction cache with run_id-scoped key
Batch lookup with staleness-aware MinIO polling
Async flow (POST /predict/async/ + polling)
Auth on prediction endpoints ❌ (SRV-01)
Hot-reload of registered model ❌ (SRV-02)
Concurrency-safe feature reload ❌ (SRV-04)

Recommendation: SRV-01 and SRV-02 are highest-impact; SRV-04 is a latent correctness issue under concurrent load (Locust scenarios should reproduce). Python-3.13 deprecation (SRV-07) should be addressed proactively.

See baseline §1–§5 for code-level detail.