Skip to content

Audit Cycle Summary — SoccerPredictAI

Date: 2026-04-28 Auditor: GitHub Copilot (Claude Opus 4.7) — /skill-ml-system-audit full Cycle scope: All 13 audits (00 → 12) — system, data, features, training, DVC/Hydra, MLflow, train/serve consistency, serving, orchestration, UI, ops/security/observability, docs/testing, docs validation. Baseline reference: docs/validation/20260424/ (full prior cycle) + docs/validation/20260426/ (weekly check). One-line verdict: Blockers present, but no production change since baseline — all 8 system risks (R1–R8) re-confirmed open; 2 doc↔code contradictions (one already at P0) require immediate action.


1. Per-audit reports

# Audit Report One-line outcome
00 System 00_system_audit.md Architecture unchanged; 8 risks (R1–R8) re-confirmed open
01 Data 01_data_audit.md 7 findings open; P0 D-01 (validate_interim contract drift) persists
02 Features 02_feature_audit.md Train/inference parity OK; 5 open issues; no server-side feature recompute online
03 Training & Eval 03_training_evaluation_audit.md 6 findings; 2× P0 (smoke fracs_for_train, n_trials=2)
04 Pipeline DVC + Hydra 04_pipeline_dvc_hydra_audit.md 15-stage DAG OK; Hydra unused; 7 open issues
05 MLflow Registry 05_mlflow_registry_audit.md Active experiment is matches_clf_smoke; no champion gate, no rollback
06 Train ↔ Serve 06_train_serve_consistency_audit.md Training↔batch parity ✅; online predict client-supplied; 5 open issues
07 Serving 07_serving_audit.md 13 endpoints; no auth on /predict/*; no model hot-reload; 7 open issues
08 Orchestration 08_orchestration_audit.md 3× P0 (no DAGs for dvc repro, batch_inference, export); no alerting
09 UI 09_ui_audit.md P0: src/ui/app/pages/ empty — predictions invisible to end users
10 Ops / Security / Obs 10_ops_security_observability_audit.md 8 findings; replicas=1, CORS *, Grafana/Evidently not deployed
11 Docs & Tests 11_docs_testing_audit.md ~294 tests; P0 contract test broken; status.md stale; UI claim wrong
12 Docs Validation 12_docs_validation.md 38 claims: 27 verified / 5 partial / 3 planned-correct / 2 contradictions / 1 unverified

1a. Best-practices compliance scorecard

Methodology per SKILL.md §7 (Summary-table mapping ✅=1.0, ⚠ / partial=0.5, ❌=0.0; for audit 12 VERIFIED=1.0, PLANNED-correct=1.0, PARTIAL=0.5, UNVERIFIED=0.5, CONTRADICTION=0.0).

Prior baseline: docs/validation/20260424/SUMMARY.md (full cycle) — recomputed retroactively using the same methodology.

# Audit Current % Prior (20260424) % Δ
00 System 91.7 91.7 0.0
01 Data 57.1 57.1 0.0
02 Features 50.0 50.0 0.0
03 Training & Eval 57.1 57.1 0.0
04 Pipeline DVC + Hydra 66.7 66.7 0.0
05 MLflow Registry 62.5 62.5 0.0
06 Train ↔ Serve 50.0 50.0 0.0
07 Serving 61.1 61.1 0.0
08 Orchestration 60.0 60.0 0.0
09 UI 60.0 60.0 0.0
10 Ops / Security / Obs 44.4 44.4 0.0
11 Docs & Tests 41.7 41.7 0.0
12 Docs Validation 86.8 n/a (new audit)
— Overall 60.7 58.5 (12 audits) +2.2 ⚠ apparent only

Interpretation: - No regressions and no real improvements — production code is unchanged since baseline. All 00–11 deltas are exactly 0.0. - The +2.2 overall delta is purely an artifact of including audit 12 (newly introduced this cycle) in the mean. When restricted to the 12 baseline audits, the overall score is unchanged at 58.5%. - No subsystem improved or regressed; the cycle re-confirms the prior state.


2. Consolidated risk register (P0 / P1, deduplicated)

Each row deduplicated by underlying cause. "Aliases" lists every per-audit ID that maps to the same root cause.

Rank Severity ID Title Aliases Owning audits Status vs baseline
1 P0 R1 params.yaml in smoke mode (fracs_for_train=[0.001, 0.002], tuning.n_trials=2) TR-01, TR-02 00, 03 re-confirmed
2 P0 R2 No automated retrain trigger; dvc repro is manual OR-01 00, 08 re-confirmed
3 P0 R5 No DAG for batch_inference → serving features go stale silently OR-02, TS-05 00, 06, 08 re-confirmed
4 P0 D-01 validate_interim referenced in contract test but absent from dvc.yaml (CI-red) DOC-01, C-02 01, 11, 12 re-confirmed
5 P0 D-03 etl_export_matches_to_source is manual — no automatic raw-data refresh OR-03 01, 08 re-confirmed
6 P0 UI-01 src/ui/app/pages/ empty — no prediction UI; status.md claims otherwise (C-01) DOC-02, UI-02 09, 11, 12 re-confirmed (now also a documented contradiction)
7 P1 R3 No model hot-reload on champion alias change; manual worker restart required TS-02, SRV-02, ML-03 00, 05, 06, 07 re-confirmed
8 P1 R6 No metric/champion-vs-challenger gate before promotion to champion ML-01, ML-02 00, 05 re-confirmed
9 P1 R8 All components replicas=1; HPA template disabled OPS-01, OPS-06 00, 10 re-confirmed
10 P1 SRV-01 /predict/* and /monitoring/* unauthenticated OPS-02 07, 10 re-confirmed
11 P1 OPS-03 CORS allow_origins=["*"] 10 re-confirmed
12 P1 R7 No drift detection (src/monitoring/ empty; Evidently not wired) OPS-05 00, 10 re-confirmed
13 P1 OPS-04 Grafana not deployed in K8s 10 re-confirmed
14 P1 OR-04 No DAG-level alerting (email/Slack) on failures 08 re-confirmed
15 P1 OR-05 No automatic retrain trigger (drift / metric / volume) 08 re-confirmed
16 P1 TR-03 Holdout used for model selection in classification_models (not blind) 03 re-confirmed
17 P1 TR-04 ablation_study isolated from tune_xgb/final_train DAG path 03 re-confirmed
18 P1 F-01 POST /predict/ has no server-side feature contract enforcement TS-01 02, 06 re-confirmed
19 P1 F-02 No runtime check that classification.window_sizes ⊆ features.window_sizes 02 re-confirmed
20 P1 D-02 match.parquet declared as DVC out but no downstream consumer P-05 01, 04 re-confirmed
21 P1 D-04 MinIO ETag for multipart objects is not a content hash → silent skip risk 01 re-confirmed
22 P1 R4 stats.py router not registered in main.py (dead endpoint) 00 re-confirmed
23 P1 P-01 split_data dep references src/pipelines/validation.py (likely misname → splitting changes don't invalidate DVC) 04 re-confirmed
24 P1 P-02 ablation_study lacks dep on test_ids.parquet 04 re-confirmed
25 P1 P-03 validate_* stages lack deps on src/data_quality/*.py (GE edits don't trigger re-run) 04 re-confirmed
26 P1 P-04 Hydra conf/ shipped but unused — parallel config sources 04 re-confirmed
27 P1 ML-01 All runs land in matches_clf_smoke; no production experiment switch 05 re-confirmed
28 P1 UI-02 APIClient lacks /predict/* methods 09 re-confirmed (component of #6)

P2/P3 findings (~20 items: SRV-03..07, OPS-07..08, TR-05..06, TS-03..04, ML-04..05, F-03..05, D-05..07, P-06..07, OR-06..07, UI-03..05, DOC-03..06, claim #9, claim #37, c4-containers wording) are listed in their respective audits and not duplicated here.


3. Delta vs prior baseline (2026-04-24 / 2026-04-26)

  • Production code: unchanged (no commits since c64561d dated 2026-04-22; find -newer over src/, airflow/, dvc.yaml, params.yaml, docker/ returns empty).
  • Resolved risks: none.
  • New risks: none in code.
  • New documentation findings (introduced this cycle by the new audit 12):
  • C-01docs/status.md Streamlit UI claim contradicts code (now formal contradiction; previously surfaced under DOC-02 / UI-01).
  • C-02tests/contract/test_pipeline_contracts.py::EXPECTED_STAGESdvc.yaml drift (formalises D-01 / DOC-01 as a doc-truth violation).
  • Overstatement #9docs/status.md "Model Training" line lacks a "smoke parameters active by default" caveat.
  • Wording #37docs/status.md says GE runs at "raw / interim / features"; actual gates are at raw / finished / future / features.
  • Stale docs: docs/status.md last-reviewed date is 2026-04-19.

4. Top must-fix items (≤ 10, ordered)

  1. Fix CI-red contract test (D-01 / DOC-01 / C-02) — remove validate_interim from EXPECTED_STAGES in tests/contract/test_pipeline_contracts.py, OR add the missing DVC stage.
  2. Restore production training params (R1 / TR-01 / TR-02) — set classification.fracs_for_train to a real range and tuning.n_trials to a meaningful value before any new model deployment.
  3. Reconcile UI claim with code (UI-01 / DOC-02 / C-01) — either implement the prediction page (and APIClient methods) or downgrade docs/status.md Streamlit UI line to "livescores only; predictions Planned".
  4. Add freshness guard on match_features.parquet (R5 / TS-05)last_modified SLO with HTTP 503 on staleness.
  5. Add model hot-reload on champion alias change (R3 / TS-02 / SRV-02) — periodic alias check + lazy reload in worker lifecycle, or SIGUSR1 trigger.
  6. Add champion-vs-challenger gate before promotion (R6 / ML-01 / ML-02) — block register_model from setting champion on metric regression.
  7. Automate the retrain loop (R2 / OR-01..03) — single DAG: export → repro → batch_inference → registry validation → worker rolling-restart.
  8. Add auth + tighten CORS on inference endpoints (SRV-01 / OPS-02 / OPS-03) — token middleware on /predict/* and /monitoring/*; restrict CORS origins.
  9. Wire DVC validation deps (P-01, P-03) — fix split_data dep to point at src/data/splitting.py; add src/data_quality/*.py to all validate_* stages.
  10. Refresh docs/status.md — update last-reviewed date, add "smoke params active" caveat to Model Training line, correct GE-gate wording to "raw / finished / future / features".

5. Open questions / unverified areas

  • Current MLflow champion (soccer_clf@champion) — actual run, metrics, lineage tags. Requires live tracking-server query; not validated by this read-only cycle.
  • Real cadence of batch_inference — last run timestamp on match_features.parquet. Requires live MinIO listing.
  • docs/evidence/ — placeholder check (e.g. CLI/JSON outputs are real, not stubs). Deferred from audit 12 STEP 6; recommend a dedicated targeted audit.
  • Celery worker liveness probe — whether celery inspect ping is wired correctly (audit 10 OPS-07).
  • Dev-only docker/grafana/ and docker/Dockerfile.evidently — untracked; promotion to K8s would require re-running audit 10.
  • Unstaged edits to docs/ml/mlflow.md, docs/ml/training-pipeline.md, docs/serving/deployment.md — content not re-validated this cycle (no edits since previous audit run); recommended for next cycle if committed.

6. Recommendation

Since no production change occurred since 2026-04-22, the next material full cycle should be triggered by an actual commit to src/, dvc.yaml, airflow/, or docker/. Until then, the team should prioritise the must-fix items above (especially the CI-red item #1 and the documentation contradictions in items #3 and #10), all of which can be fixed with surgical, minimal-scope patches that do not require re-running this audit cycle.