Skip to content

Train ↔ Serve Consistency Audit Report — SoccerPredictAI

Date: 2026-04-28 Auditor: GitHub Copilot (Claude Opus 4.7) — /skill-ml-system-audit full (audit 06/12) Scope: Skew between training, batch inference, and online serving feature paths Baseline: docs/validation/20260424/06_train_serve_consistency_audit.md


Delta vs baseline

src/features/, src/pipelines/{features,inference}.py, src/app/services/predict.py, src/app/tasks/predict.py, src/app/routers/predict.py unchanged since 2026-04-26. Baseline findings remain in force.


Confirmed paths

Path Feature source Computation
Training (final_train) features.parquet + features_meta.parquet offline DVC feature_engineering
Batch inference (batch_inference) re-computed via compute_all_match_features() same code as training (build_team_match_table, add_rolling_features, to_match_level, compute_elo_ratings, select_model_features)
Online (POST /predict/) request body features: dict client-supplied; no server-side recomputation

All preprocessing (StandardScaler, SimpleImputer, OneHotEncoder) is encapsulated in the serialized sklearn Pipeline → loaded via mlflow.pyfunc.load_model("models:/soccer_clf@champion"). No manual preprocessing at serving time.


Risk register (re-confirmed)

ID Severity Description Status
TS-01 P1 POST /predict/ has no server-side feature computation or schema validation against model input Open
TS-02 P1 No model hot-reload in Celery worker on champion change → manual worker restart required Open (= R3)
TS-03 P2 Batch-inference features fresher than training features (use more-recent history) — small systematic coverage skew Open
TS-04 P2 FeatureLookupService.get_features() strips NaNs; SimpleImputer fills the gap silently Open
TS-05 P2 No staleness check on match_features.parquet (no max-age guard) — stale predictions without alerting Open (= R5)

Summary

Aspect Status
Training ↔ batch inference parity (code, params, selector)
Preprocessing baked into serialized model
Feature order via features_meta.parquet
Server-side feature recomputation for online predict ❌ (TS-01)
Hot-reload of registered model ❌ (TS-02)
Staleness guard on batch features ❌ (TS-05)

Recommendation: TS-02 and TS-05 are top operational risks. Minimal viable fixes: (a) periodic alias check + lazy reload in worker_process_init lifecycle (e.g. SIGUSR1 → re-load); (b) last_modified SLO on match_features.parquet with HTTP 503 on staleness.

See baseline §1–§5 for code-level detail.