Serving Audit Report — SoccerPredictAI¶

Date: 2026-04-28 Auditor: GitHub Copilot (Claude Opus 4.7) — /skill-ml-system-audit full (audit 07/12) Scope: FastAPI endpoints, model loading, Celery async, batch lookup, error handling Baseline: docs/validation/20260424/07_serving_audit.md

Delta vs baseline¶

src/app/routers/, src/app/services/predict.py, src/app/tasks/predict.py, src/app/schemas/predict.py, src/app/worker_ml.py unchanged since 2026-04-26. Baseline findings remain in force.

Confirmed endpoint surface¶

13 endpoints across /predict, /monitoring, /livescores, /sources, /healthcheck, /metrics. Auth via X-Token (header / query) only on /sources/*. All /predict/* endpoints are unauthenticated.

POST /predict/ and GET /predict/{match_id} route to Celery ml queue with 30 s sync timeout; POST /predict/async/ returns task_id polled via GET /monitoring/task_status/{task_id} (Redis result backend).

Model loading: worker_process_init → PredictionService.load() → mlflow.pyfunc.load_model("models:/soccer_clf@champion") with thread-safe double-checked locking and lazy fallback. pyfunc → predict_proba fallback handles label vs probability output.

Batch lookup: FeatureLookupService — local file cache by mtime, MinIO LastModified re-check every FEATURE_CACHE_CHECK_INTERVAL (default 60 s), graceful degraded mode on MinIO unavailability.

Redis prediction cache: key predict:{match_id}:{run_id} (auto-invalidates on model change), TTL PREDICTION_CACHE_TTL (default 3600 s).

Celery predict_match: queue ml, max_retries=2, default_retry_delay=10, task_acks_late=True, task_reject_on_worker_lost=True, task_time_limit=3600.

Risk register (re-confirmed)¶

ID	Severity	Description	Status
SRV-01	P1	No auth on `/predict/*` — open access	Open
SRV-02	P1	No automatic model reload on `champion` alias change	Open (= R3)
SRV-03	P2	`GET /predict/matches/` returns `list[dict]` with no Pydantic response model	Open
SRV-04	P2	`FeatureLookupService._load()` lacks `threading.Lock` on MinIO reload — concurrent double-load possible	Open
SRV-05	P2	Retry window (2×10 s) + 30 s sync timeout race — late retry may execute after 504 returned	Open
SRV-06	P2	No explicit 503 when Celery broker unavailable	Open
SRV-07	P3	`asyncio.get_event_loop()` deprecated in 3.12 — should be `asyncio.get_running_loop()`	Open

Summary¶

Aspect	Status
Endpoint inventory + Pydantic schemas	✅ (except SRV-03)
Model load via registry alias, no hardcoded path	✅
Cold-start prevention (`worker_process_init`)	✅
Redis prediction cache with run_id-scoped key	✅
Batch lookup with staleness-aware MinIO polling	✅
Async flow (`POST /predict/async/` + polling)	✅
Auth on prediction endpoints	❌ (SRV-01)
Hot-reload of registered model	❌ (SRV-02)
Concurrency-safe feature reload	❌ (SRV-04)

Recommendation: SRV-01 and SRV-02 are highest-impact; SRV-04 is a latent correctness issue under concurrent load (Locust scenarios should reproduce). Python-3.13 deprecation (SRV-07) should be addressed proactively.

See baseline §1–§5 for code-level detail.