Train ↔ Serve Consistency Audit Report — SoccerPredictAI¶

Date: 2026-04-28 Auditor: GitHub Copilot (Claude Opus 4.7) — /skill-ml-system-audit full (audit 06/12) Scope: Skew between training, batch inference, and online serving feature paths Baseline: docs/validation/20260424/06_train_serve_consistency_audit.md

Delta vs baseline¶

src/features/, src/pipelines/{features,inference}.py, src/app/services/predict.py, src/app/tasks/predict.py, src/app/routers/predict.py unchanged since 2026-04-26. Baseline findings remain in force.

Confirmed paths¶

Path	Feature source	Computation
Training (`final_train`)	`features.parquet` + `features_meta.parquet`	offline DVC `feature_engineering`
Batch inference (`batch_inference`)	re-computed via `compute_all_match_features()`	same code as training (`build_team_match_table`, `add_rolling_features`, `to_match_level`, `compute_elo_ratings`, `select_model_features`)
Online (`POST /predict/`)	request body `features: dict`	client-supplied; no server-side recomputation

All preprocessing (StandardScaler, SimpleImputer, OneHotEncoder) is encapsulated in the serialized sklearn Pipeline → loaded via mlflow.pyfunc.load_model("models:/soccer_clf@champion"). No manual preprocessing at serving time.

Risk register (re-confirmed)¶

ID	Severity	Description	Status
TS-01	P1	`POST /predict/` has no server-side feature computation or schema validation against model input	Open
TS-02	P1	No model hot-reload in Celery worker on `champion` change → manual worker restart required	Open (= R3)
TS-03	P2	Batch-inference features fresher than training features (use more-recent history) — small systematic coverage skew	Open
TS-04	P2	`FeatureLookupService.get_features()` strips NaNs; SimpleImputer fills the gap silently	Open
TS-05	P2	No staleness check on `match_features.parquet` (no max-age guard) — stale predictions without alerting	Open (= R5)

Summary¶

Aspect	Status
Training ↔ batch inference parity (code, params, selector)	✅
Preprocessing baked into serialized model	✅
Feature order via `features_meta.parquet`	✅
Server-side feature recomputation for online predict	❌ (TS-01)
Hot-reload of registered model	❌ (TS-02)
Staleness guard on batch features	❌ (TS-05)

Recommendation: TS-02 and TS-05 are top operational risks. Minimal viable fixes: (a) periodic alias check + lazy reload in worker_process_init lifecycle (e.g. SIGUSR1 → re-load); (b) last_modified SLO on match_features.parquet with HTTP 503 on staleness.

See baseline §1–§5 for code-level detail.