Audit Cycle Summary — SoccerPredictAI¶
Date: 2026-04-28
Auditor: GitHub Copilot (Claude Opus 4.7) — /skill-ml-system-audit full
Cycle scope: All 13 audits (00 → 12) — system, data, features, training, DVC/Hydra, MLflow, train/serve consistency, serving, orchestration, UI, ops/security/observability, docs/testing, docs validation.
Baseline reference: docs/validation/20260424/ (full prior cycle) + docs/validation/20260426/ (weekly check).
One-line verdict: Blockers present, but no production change since baseline — all 8 system risks (R1–R8) re-confirmed open; 2 doc↔code contradictions (one already at P0) require immediate action.
1. Per-audit reports¶
| # | Audit | Report | One-line outcome |
|---|---|---|---|
| 00 | System | 00_system_audit.md | Architecture unchanged; 8 risks (R1–R8) re-confirmed open |
| 01 | Data | 01_data_audit.md | 7 findings open; P0 D-01 (validate_interim contract drift) persists |
| 02 | Features | 02_feature_audit.md | Train/inference parity OK; 5 open issues; no server-side feature recompute online |
| 03 | Training & Eval | 03_training_evaluation_audit.md | 6 findings; 2× P0 (smoke fracs_for_train, n_trials=2) |
| 04 | Pipeline DVC + Hydra | 04_pipeline_dvc_hydra_audit.md | 15-stage DAG OK; Hydra unused; 7 open issues |
| 05 | MLflow Registry | 05_mlflow_registry_audit.md | Active experiment is matches_clf_smoke; no champion gate, no rollback |
| 06 | Train ↔ Serve | 06_train_serve_consistency_audit.md | Training↔batch parity ✅; online predict client-supplied; 5 open issues |
| 07 | Serving | 07_serving_audit.md | 13 endpoints; no auth on /predict/*; no model hot-reload; 7 open issues |
| 08 | Orchestration | 08_orchestration_audit.md | 3× P0 (no DAGs for dvc repro, batch_inference, export); no alerting |
| 09 | UI | 09_ui_audit.md | P0: src/ui/app/pages/ empty — predictions invisible to end users |
| 10 | Ops / Security / Obs | 10_ops_security_observability_audit.md | 8 findings; replicas=1, CORS *, Grafana/Evidently not deployed |
| 11 | Docs & Tests | 11_docs_testing_audit.md | ~294 tests; P0 contract test broken; status.md stale; UI claim wrong |
| 12 | Docs Validation | 12_docs_validation.md | 38 claims: 27 verified / 5 partial / 3 planned-correct / 2 contradictions / 1 unverified |
1a. Best-practices compliance scorecard¶
Methodology per SKILL.md §7 (Summary-table mapping ✅=1.0, ⚠ / partial=0.5, ❌=0.0; for audit 12 VERIFIED=1.0, PLANNED-correct=1.0, PARTIAL=0.5, UNVERIFIED=0.5, CONTRADICTION=0.0).
Prior baseline: docs/validation/20260424/SUMMARY.md (full cycle) — recomputed retroactively using the same methodology.
| # | Audit | Current % | Prior (20260424) % | Δ |
|---|---|---|---|---|
| 00 | System | 91.7 | 91.7 | 0.0 |
| 01 | Data | 57.1 | 57.1 | 0.0 |
| 02 | Features | 50.0 | 50.0 | 0.0 |
| 03 | Training & Eval | 57.1 | 57.1 | 0.0 |
| 04 | Pipeline DVC + Hydra | 66.7 | 66.7 | 0.0 |
| 05 | MLflow Registry | 62.5 | 62.5 | 0.0 |
| 06 | Train ↔ Serve | 50.0 | 50.0 | 0.0 |
| 07 | Serving | 61.1 | 61.1 | 0.0 |
| 08 | Orchestration | 60.0 | 60.0 | 0.0 |
| 09 | UI | 60.0 | 60.0 | 0.0 |
| 10 | Ops / Security / Obs | 44.4 | 44.4 | 0.0 |
| 11 | Docs & Tests | 41.7 | 41.7 | 0.0 |
| 12 | Docs Validation | 86.8 | n/a (new audit) | — |
| — Overall | 60.7 | 58.5 (12 audits) | +2.2 ⚠ apparent only |
Interpretation:
- No regressions and no real improvements — production code is unchanged since baseline. All 00–11 deltas are exactly 0.0.
- The +2.2 overall delta is purely an artifact of including audit 12 (newly introduced this cycle) in the mean. When restricted to the 12 baseline audits, the overall score is unchanged at 58.5%.
- No subsystem improved or regressed; the cycle re-confirms the prior state.
2. Consolidated risk register (P0 / P1, deduplicated)¶
Each row deduplicated by underlying cause. "Aliases" lists every per-audit ID that maps to the same root cause.
| Rank | Severity | ID | Title | Aliases | Owning audits | Status vs baseline |
|---|---|---|---|---|---|---|
| 1 | P0 | R1 | params.yaml in smoke mode (fracs_for_train=[0.001, 0.002], tuning.n_trials=2) |
TR-01, TR-02 | 00, 03 | re-confirmed |
| 2 | P0 | R2 | No automated retrain trigger; dvc repro is manual |
OR-01 | 00, 08 | re-confirmed |
| 3 | P0 | R5 | No DAG for batch_inference → serving features go stale silently |
OR-02, TS-05 | 00, 06, 08 | re-confirmed |
| 4 | P0 | D-01 | validate_interim referenced in contract test but absent from dvc.yaml (CI-red) |
DOC-01, C-02 | 01, 11, 12 | re-confirmed |
| 5 | P0 | D-03 | etl_export_matches_to_source is manual — no automatic raw-data refresh |
OR-03 | 01, 08 | re-confirmed |
| 6 | P0 | UI-01 | src/ui/app/pages/ empty — no prediction UI; status.md claims otherwise (C-01) |
DOC-02, UI-02 | 09, 11, 12 | re-confirmed (now also a documented contradiction) |
| 7 | P1 | R3 | No model hot-reload on champion alias change; manual worker restart required |
TS-02, SRV-02, ML-03 | 00, 05, 06, 07 | re-confirmed |
| 8 | P1 | R6 | No metric/champion-vs-challenger gate before promotion to champion |
ML-01, ML-02 | 00, 05 | re-confirmed |
| 9 | P1 | R8 | All components replicas=1; HPA template disabled |
OPS-01, OPS-06 | 00, 10 | re-confirmed |
| 10 | P1 | SRV-01 | /predict/* and /monitoring/* unauthenticated |
OPS-02 | 07, 10 | re-confirmed |
| 11 | P1 | OPS-03 | CORS allow_origins=["*"] |
— | 10 | re-confirmed |
| 12 | P1 | R7 | No drift detection (src/monitoring/ empty; Evidently not wired) |
OPS-05 | 00, 10 | re-confirmed |
| 13 | P1 | OPS-04 | Grafana not deployed in K8s | — | 10 | re-confirmed |
| 14 | P1 | OR-04 | No DAG-level alerting (email/Slack) on failures | — | 08 | re-confirmed |
| 15 | P1 | OR-05 | No automatic retrain trigger (drift / metric / volume) | — | 08 | re-confirmed |
| 16 | P1 | TR-03 | Holdout used for model selection in classification_models (not blind) |
— | 03 | re-confirmed |
| 17 | P1 | TR-04 | ablation_study isolated from tune_xgb/final_train DAG path |
— | 03 | re-confirmed |
| 18 | P1 | F-01 | POST /predict/ has no server-side feature contract enforcement |
TS-01 | 02, 06 | re-confirmed |
| 19 | P1 | F-02 | No runtime check that classification.window_sizes ⊆ features.window_sizes |
— | 02 | re-confirmed |
| 20 | P1 | D-02 | match.parquet declared as DVC out but no downstream consumer |
P-05 | 01, 04 | re-confirmed |
| 21 | P1 | D-04 | MinIO ETag for multipart objects is not a content hash → silent skip risk | — | 01 | re-confirmed |
| 22 | P1 | R4 | stats.py router not registered in main.py (dead endpoint) |
— | 00 | re-confirmed |
| 23 | P1 | P-01 | split_data dep references src/pipelines/validation.py (likely misname → splitting changes don't invalidate DVC) |
— | 04 | re-confirmed |
| 24 | P1 | P-02 | ablation_study lacks dep on test_ids.parquet |
— | 04 | re-confirmed |
| 25 | P1 | P-03 | validate_* stages lack deps on src/data_quality/*.py (GE edits don't trigger re-run) |
— | 04 | re-confirmed |
| 26 | P1 | P-04 | Hydra conf/ shipped but unused — parallel config sources |
— | 04 | re-confirmed |
| 27 | P1 | ML-01 | All runs land in matches_clf_smoke; no production experiment switch |
— | 05 | re-confirmed |
| 28 | P1 | UI-02 | APIClient lacks /predict/* methods |
— | 09 | re-confirmed (component of #6) |
P2/P3 findings (~20 items: SRV-03..07, OPS-07..08, TR-05..06, TS-03..04, ML-04..05, F-03..05, D-05..07, P-06..07, OR-06..07, UI-03..05, DOC-03..06, claim #9, claim #37, c4-containers wording) are listed in their respective audits and not duplicated here.
3. Delta vs prior baseline (2026-04-24 / 2026-04-26)¶
- Production code: unchanged (no commits since
c64561ddated 2026-04-22;find -neweroversrc/,airflow/,dvc.yaml,params.yaml,docker/returns empty). - Resolved risks: none.
- New risks: none in code.
- New documentation findings (introduced this cycle by the new audit
12): - C-01 —
docs/status.mdStreamlit UI claim contradicts code (now formal contradiction; previously surfaced under DOC-02 / UI-01). - C-02 —
tests/contract/test_pipeline_contracts.py::EXPECTED_STAGES↔dvc.yamldrift (formalises D-01 / DOC-01 as a doc-truth violation). - Overstatement #9 —
docs/status.md"Model Training" line lacks a "smoke parameters active by default" caveat. - Wording #37 —
docs/status.mdsays GE runs at "raw / interim / features"; actual gates are at raw / finished / future / features. - Stale docs:
docs/status.mdlast-reviewed date is 2026-04-19.
4. Top must-fix items (≤ 10, ordered)¶
- Fix CI-red contract test (D-01 / DOC-01 / C-02) — remove
validate_interimfromEXPECTED_STAGESintests/contract/test_pipeline_contracts.py, OR add the missing DVC stage. - Restore production training params (R1 / TR-01 / TR-02) — set
classification.fracs_for_trainto a real range andtuning.n_trialsto a meaningful value before any new model deployment. - Reconcile UI claim with code (UI-01 / DOC-02 / C-01) — either implement the prediction page (and
APIClientmethods) or downgradedocs/status.mdStreamlit UI line to "livescores only; predictions Planned". - Add freshness guard on
match_features.parquet(R5 / TS-05) —last_modifiedSLO with HTTP 503 on staleness. - Add model hot-reload on
championalias change (R3 / TS-02 / SRV-02) — periodic alias check + lazy reload in worker lifecycle, or SIGUSR1 trigger. - Add champion-vs-challenger gate before promotion (R6 / ML-01 / ML-02) — block
register_modelfrom settingchampionon metric regression. - Automate the retrain loop (R2 / OR-01..03) — single DAG: export → repro → batch_inference → registry validation → worker rolling-restart.
- Add auth + tighten CORS on inference endpoints (SRV-01 / OPS-02 / OPS-03) — token middleware on
/predict/*and/monitoring/*; restrict CORS origins. - Wire DVC validation deps (P-01, P-03) — fix
split_datadep to point atsrc/data/splitting.py; addsrc/data_quality/*.pyto allvalidate_*stages. - Refresh
docs/status.md— update last-reviewed date, add "smoke params active" caveat to Model Training line, correct GE-gate wording to "raw / finished / future / features".
5. Open questions / unverified areas¶
- Current MLflow champion (
soccer_clf@champion) — actual run, metrics, lineage tags. Requires live tracking-server query; not validated by this read-only cycle. - Real cadence of
batch_inference— last run timestamp onmatch_features.parquet. Requires live MinIO listing. docs/evidence/— placeholder check (e.g. CLI/JSON outputs are real, not stubs). Deferred from audit 12 STEP 6; recommend a dedicated targeted audit.- Celery worker liveness probe — whether
celery inspect pingis wired correctly (audit 10 OPS-07). - Dev-only
docker/grafana/anddocker/Dockerfile.evidently— untracked; promotion to K8s would require re-running audit 10. - Unstaged edits to
docs/ml/mlflow.md,docs/ml/training-pipeline.md,docs/serving/deployment.md— content not re-validated this cycle (no edits since previous audit run); recommended for next cycle if committed.
6. Recommendation¶
Since no production change occurred since 2026-04-22, the next material full cycle should be triggered by an actual commit to src/, dvc.yaml, airflow/, or docker/. Until then, the team should prioritise the must-fix items above (especially the CI-red item #1 and the documentation contradictions in items #3 and #10), all of which can be fixed with surgical, minimal-scope patches that do not require re-running this audit cycle.