Audit Cycle Summary — SoccerPredictAI¶

Date: 2026-04-28 Auditor: GitHub Copilot (Claude Opus 4.7) — /skill-ml-system-audit full Cycle scope: All 13 audits (00 → 12) — system, data, features, training, DVC/Hydra, MLflow, train/serve consistency, serving, orchestration, UI, ops/security/observability, docs/testing, docs validation. Baseline reference: docs/validation/20260424/ (full prior cycle) + docs/validation/20260426/ (weekly check). One-line verdict: Blockers present, but no production change since baseline — all 8 system risks (R1–R8) re-confirmed open; 2 doc↔code contradictions (one already at P0) require immediate action.

1. Per-audit reports¶

#	Audit	Report	One-line outcome
00	System	00_system_audit.md	Architecture unchanged; 8 risks (R1–R8) re-confirmed open
01	Data	01_data_audit.md	7 findings open; P0 D-01 (`validate_interim` contract drift) persists
02	Features	02_feature_audit.md	Train/inference parity OK; 5 open issues; no server-side feature recompute online
03	Training & Eval	03_training_evaluation_audit.md	6 findings; 2× P0 (smoke `fracs_for_train`, `n_trials=2`)
04	Pipeline DVC + Hydra	04_pipeline_dvc_hydra_audit.md	15-stage DAG OK; Hydra unused; 7 open issues
05	MLflow Registry	05_mlflow_registry_audit.md	Active experiment is `matches_clf_smoke`; no champion gate, no rollback
06	Train ↔ Serve	06_train_serve_consistency_audit.md	Training↔batch parity ✅; online predict client-supplied; 5 open issues
07	Serving	07_serving_audit.md	13 endpoints; no auth on `/predict/*`; no model hot-reload; 7 open issues
08	Orchestration	08_orchestration_audit.md	3× P0 (no DAGs for `dvc repro`, `batch_inference`, export); no alerting
09	UI	09_ui_audit.md	P0: `src/ui/app/pages/` empty — predictions invisible to end users
10	Ops / Security / Obs	10_ops_security_observability_audit.md	8 findings; `replicas=1`, CORS `*`, Grafana/Evidently not deployed
11	Docs & Tests	11_docs_testing_audit.md	~294 tests; P0 contract test broken; status.md stale; UI claim wrong
12	Docs Validation	12_docs_validation.md	38 claims: 27 verified / 5 partial / 3 planned-correct / 2 contradictions / 1 unverified

1a. Best-practices compliance scorecard¶

Methodology per SKILL.md §7 (Summary-table mapping ✅=1.0, ⚠ / partial=0.5, ❌=0.0; for audit 12 VERIFIED=1.0, PLANNED-correct=1.0, PARTIAL=0.5, UNVERIFIED=0.5, CONTRADICTION=0.0).

Prior baseline: docs/validation/20260424/SUMMARY.md (full cycle) — recomputed retroactively using the same methodology.

#	Audit	Current %	Prior (20260424) %	Δ
00	System	91.7	91.7	0.0
01	Data	57.1	57.1	0.0
02	Features	50.0	50.0	0.0
03	Training & Eval	57.1	57.1	0.0
04	Pipeline DVC + Hydra	66.7	66.7	0.0
05	MLflow Registry	62.5	62.5	0.0
06	Train ↔ Serve	50.0	50.0	0.0
07	Serving	61.1	61.1	0.0
08	Orchestration	60.0	60.0	0.0
09	UI	60.0	60.0	0.0
10	Ops / Security / Obs	44.4	44.4	0.0
11	Docs & Tests	41.7	41.7	0.0
12	Docs Validation	86.8	n/a (new audit)	—
— Overall		60.7	58.5 (12 audits)	+2.2 ⚠ apparent only

Interpretation: - No regressions and no real improvements — production code is unchanged since baseline. All 00–11 deltas are exactly 0.0. - The +2.2 overall delta is purely an artifact of including audit 12 (newly introduced this cycle) in the mean. When restricted to the 12 baseline audits, the overall score is unchanged at 58.5%. - No subsystem improved or regressed; the cycle re-confirms the prior state.

2. Consolidated risk register (P0 / P1, deduplicated)¶

Each row deduplicated by underlying cause. "Aliases" lists every per-audit ID that maps to the same root cause.

Rank	Severity	ID	Title	Aliases	Owning audits	Status vs baseline
1	P0	R1	`params.yaml` in smoke mode (`fracs_for_train=[0.001, 0.002]`, `tuning.n_trials=2`)	TR-01, TR-02	00, 03	re-confirmed
2	P0	R2	No automated retrain trigger; `dvc repro` is manual	OR-01	00, 08	re-confirmed
3	P0	R5	No DAG for `batch_inference` → serving features go stale silently	OR-02, TS-05	00, 06, 08	re-confirmed
4	P0	D-01	`validate_interim` referenced in contract test but absent from `dvc.yaml` (CI-red)	DOC-01, C-02	01, 11, 12	re-confirmed
5	P0	D-03	`etl_export_matches_to_source` is manual — no automatic raw-data refresh	OR-03	01, 08	re-confirmed
6	P0	UI-01	`src/ui/app/pages/` empty — no prediction UI; `status.md` claims otherwise (C-01)	DOC-02, UI-02	09, 11, 12	re-confirmed (now also a documented contradiction)
7	P1	R3	No model hot-reload on `champion` alias change; manual worker restart required	TS-02, SRV-02, ML-03	00, 05, 06, 07	re-confirmed
8	P1	R6	No metric/champion-vs-challenger gate before promotion to `champion`	ML-01, ML-02	00, 05	re-confirmed
9	P1	R8	All components `replicas=1`; HPA template disabled	OPS-01, OPS-06	00, 10	re-confirmed
10	P1	SRV-01	`/predict/` and `/monitoring/` unauthenticated	OPS-02	07, 10	re-confirmed
11	P1	OPS-03	CORS `allow_origins=["*"]`	—	10	re-confirmed
12	P1	R7	No drift detection (`src/monitoring/` empty; Evidently not wired)	OPS-05	00, 10	re-confirmed
13	P1	OPS-04	Grafana not deployed in K8s	—	10	re-confirmed
14	P1	OR-04	No DAG-level alerting (email/Slack) on failures	—	08	re-confirmed
15	P1	OR-05	No automatic retrain trigger (drift / metric / volume)	—	08	re-confirmed
16	P1	TR-03	Holdout used for model selection in `classification_models` (not blind)	—	03	re-confirmed
17	P1	TR-04	`ablation_study` isolated from `tune_xgb`/`final_train` DAG path	—	03	re-confirmed
18	P1	F-01	`POST /predict/` has no server-side feature contract enforcement	TS-01	02, 06	re-confirmed
19	P1	F-02	No runtime check that `classification.window_sizes ⊆ features.window_sizes`	—	02	re-confirmed
20	P1	D-02	`match.parquet` declared as DVC out but no downstream consumer	P-05	01, 04	re-confirmed
21	P1	D-04	MinIO ETag for multipart objects is not a content hash → silent skip risk	—	01	re-confirmed
22	P1	R4	`stats.py` router not registered in `main.py` (dead endpoint)	—	00	re-confirmed
23	P1	P-01	`split_data` dep references `src/pipelines/validation.py` (likely misname → splitting changes don't invalidate DVC)	—	04	re-confirmed
24	P1	P-02	`ablation_study` lacks dep on `test_ids.parquet`	—	04	re-confirmed
25	P1	P-03	`validate_` stages lack deps on `src/data_quality/.py` (GE edits don't trigger re-run)	—	04	re-confirmed
26	P1	P-04	Hydra `conf/` shipped but unused — parallel config sources	—	04	re-confirmed
27	P1	ML-01	All runs land in `matches_clf_smoke`; no production experiment switch	—	05	re-confirmed
28	P1	UI-02	`APIClient` lacks `/predict/*` methods	—	09	re-confirmed (component of #6)

P2/P3 findings (~20 items: SRV-03..07, OPS-07..08, TR-05..06, TS-03..04, ML-04..05, F-03..05, D-05..07, P-06..07, OR-06..07, UI-03..05, DOC-03..06, claim #9, claim #37, c4-containers wording) are listed in their respective audits and not duplicated here.

3. Delta vs prior baseline (2026-04-24 / 2026-04-26)¶

Production code: unchanged (no commits since c64561d dated 2026-04-22; find -newer over src/, airflow/, dvc.yaml, params.yaml, docker/ returns empty).
Resolved risks: none.
New risks: none in code.
New documentation findings (introduced this cycle by the new audit 12):
C-01 — docs/status.md Streamlit UI claim contradicts code (now formal contradiction; previously surfaced under DOC-02 / UI-01).
C-02 — tests/contract/test_pipeline_contracts.py::EXPECTED_STAGES ↔ dvc.yaml drift (formalises D-01 / DOC-01 as a doc-truth violation).
Overstatement #9 — docs/status.md "Model Training" line lacks a "smoke parameters active by default" caveat.
Wording #37 — docs/status.md says GE runs at "raw / interim / features"; actual gates are at raw / finished / future / features.
Stale docs: docs/status.md last-reviewed date is 2026-04-19.

4. Top must-fix items (≤ 10, ordered)¶

Fix CI-red contract test (D-01 / DOC-01 / C-02) — remove validate_interim from EXPECTED_STAGES in tests/contract/test_pipeline_contracts.py, OR add the missing DVC stage.
Restore production training params (R1 / TR-01 / TR-02) — set classification.fracs_for_train to a real range and tuning.n_trials to a meaningful value before any new model deployment.
Reconcile UI claim with code (UI-01 / DOC-02 / C-01) — either implement the prediction page (and APIClient methods) or downgrade docs/status.md Streamlit UI line to "livescores only; predictions Planned".
Add freshness guard on match_features.parquet (R5 / TS-05) — last_modified SLO with HTTP 503 on staleness.
Add model hot-reload on champion alias change (R3 / TS-02 / SRV-02) — periodic alias check + lazy reload in worker lifecycle, or SIGUSR1 trigger.
Add champion-vs-challenger gate before promotion (R6 / ML-01 / ML-02) — block register_model from setting champion on metric regression.
Automate the retrain loop (R2 / OR-01..03) — single DAG: export → repro → batch_inference → registry validation → worker rolling-restart.
Add auth + tighten CORS on inference endpoints (SRV-01 / OPS-02 / OPS-03) — token middleware on /predict/* and /monitoring/*; restrict CORS origins.
Wire DVC validation deps (P-01, P-03) — fix split_data dep to point at src/data/splitting.py; add src/data_quality/*.py to all validate_* stages.
Refresh docs/status.md — update last-reviewed date, add "smoke params active" caveat to Model Training line, correct GE-gate wording to "raw / finished / future / features".

5. Open questions / unverified areas¶

Current MLflow champion (soccer_clf@champion) — actual run, metrics, lineage tags. Requires live tracking-server query; not validated by this read-only cycle.
Real cadence of batch_inference — last run timestamp on match_features.parquet. Requires live MinIO listing.
docs/evidence/ — placeholder check (e.g. CLI/JSON outputs are real, not stubs). Deferred from audit 12 STEP 6; recommend a dedicated targeted audit.
Celery worker liveness probe — whether celery inspect ping is wired correctly (audit 10 OPS-07).
Dev-only docker/grafana/ and docker/Dockerfile.evidently — untracked; promotion to K8s would require re-running audit 10.
Unstaged edits to docs/ml/mlflow.md, docs/ml/training-pipeline.md, docs/serving/deployment.md — content not re-validated this cycle (no edits since previous audit run); recommended for next cycle if committed.

6. Recommendation¶

Since no production change occurred since 2026-04-22, the next material full cycle should be triggered by an actual commit to src/, dvc.yaml, airflow/, or docker/. Until then, the team should prioritise the must-fix items above (especially the CI-red item #1 and the documentation contradictions in items #3 and #10), all of which can be fixed with surgical, minimal-scope patches that do not require re-running this audit cycle.