System Audit Report — SoccerPredictAI¶
Date: 2026-04-28
Auditor: GitHub Copilot (Claude Opus 4.7) — /skill-ml-system-audit full
Scope: High-level audit — architecture, flows, contracts, risks
Method: Code analysis (dvc.yaml, params.yaml, src/, airflow/, docker/, k8s/) + diff vs prior cycles
Baseline cycle: 2026-04-24 (00_system_audit_v2.md + audits 01–11)
Last weekly check: 2026-04-26 (00_system_audit.md)
⚡ Delta since 2026-04-26¶
| Check | Result |
|---|---|
| New commits since 2026-04-26 | None (HEAD = c64561d, dated 2026-04-22) |
Source files modified since 2026-04-26 (via find -newer) |
None in src/, airflow/, dvc.yaml, params.yaml, docker/ |
| Working-tree status | Same uncommitted modifications as observed in 20260426/00 |
Conclusion: Production code, DVC pipeline, FastAPI/Celery, Airflow DAGs — unchanged. All findings, risks, and contracts from 20260424 baseline remain in force. This full cycle (audits 00–12) re-confirms baseline; only audit 12_docs_validation is new.
1. Architecture (current)¶
[WhoScored.com]
│ (Selenoid browser automation)
▼
[Celery worker-api] ──► [PostgreSQL]
▲ │
[Airflow DAGs ×5] [export task]
@hourly / manual │
X-Token via Variable [MinIO: data-raw/]
│
[DVC Pipeline: 15 stages]
params.yaml / Hydra conf/
│
[MLflow Tracking + Registry]
soccer_clf@champion
│
┌─────────┴──────────┐
[Celery worker-ml] [batch_inference DVC]
PredictionService match_features.parquet
(load on init) → MinIO predictions/
│ │
└──────────┬───────────┘
▼
[FastAPI]
/predict /livescores /monitoring /healthcheck
│
[Nginx / K8s Ingress]
│
[Streamlit UI on external VPS]
No structural change vs 20260424.
2. Layers — status¶
| Layer | Implementation | Status |
|---|---|---|
| Product / Problem | 1×2 classification, target=outcome_1x2 |
✅ |
| Data | scraping, PostgreSQL, MinIO, DVC ingestion | ✅ |
| Feature | rolling stats (5 windows), ELO per tournament, side=diff | ✅ |
| Model | baseline / LogReg / HGBT / XGBoost + Optuna + isotonic | ✅ ⚠️ smoke params |
| Experimentation | MLflow matches_clf_smoke, nested runs |
✅ |
| Pipeline (DVC/Hydra) | 15 stages, GE gates ×3 | ✅ |
| Serving | FastAPI sync/async, FeatureLookupService, Prometheus | ✅ |
| UI | Streamlit + nginx | ✅ |
| Orchestration | Airflow 5 DAGs (livescores ×4, export ×1) | ✅ |
| Ops / Infra | Docker ×10, K8s Helm (single-node), pydantic-settings | ✅ |
| Testing | unit, property, service, contract, load (Locust), GE | ✅ 🚧 live integration |
| Documentation | MkDocs, ADRs, runbooks, status, validation audits | ✅ |
3. Key flows (unchanged from baseline)¶
- Data: WhoScored → Selenoid → worker-api → PostgreSQL → export → MinIO
data-raw/→ DVCload_data_from_sources→ preprocessing → features → splits. - Features:
finished.parquet→stats_matches.py(rolling) +elo.py(per tournamentId, k=32, init=1500, home_adv=50) →features.parquet+features_meta.parquet(single contract). - Model:
dataset.parquet+ folds →classification_models(screening) →tune_xgb→final_train(isotonic, calib_frac=0.15) →register_model→soccer_clf@champion. - Execution (training): manual
dvc repro+ manual worker-ml restart. No auto-trigger. - Execution (serving): UI → FastAPI → Celery ml (sync 30s timeout) / async via Redis polling; batch lookup via FeatureLookupService.
Details and code references in 20260424 baseline 00_system_audit_v2.md §3.
4. Contracts (unchanged)¶
| Type | Contract | Source of truth |
|---|---|---|
| Feature | column names + dtypes | features_meta.parquet |
| Model | sklearn Pipeline + XGBoost via mlflow.sklearn, output proba[0,1,2] |
MLflow Registry soccer_clf@champion |
| API | Pydantic schemas + auto-OpenAPI | src/app/schemas/predict.py |
| Data | .parquet at all boundaries, .minio.json versioning |
DVC outs + MinIO metadata |
5. Risk register (delta: no changes)¶
| ID | Risk | Severity | Status |
|---|---|---|---|
| R1 | params.yaml in smoke mode (n_trials=2, fracs=[0.001, 0.002]) |
🔴 HIGH | Open |
| R2 | DVC pipeline manual; no auto-trigger after ingestion | 🔴 HIGH | Open |
| R3 | No model hot-reload in serving | 🔴 HIGH | Open |
| R4 | stats.py router not registered in main.py (dead endpoint) |
🟡 MED | Open |
| R5 | batch_inference staleness — serving may use stale features silently |
🔴 HIGH | Open |
| R6 | No metric gate before champion promotion |
🔴 HIGH | Open |
| R7 | No drift detection (Evidently not integrated) | 🟡 MED | Open |
| R8 | Single-node K8s; no HA for PostgreSQL / MinIO | 🟡 MED | Open (known) |
6. Component map (unchanged)¶
See 20260424 00_system_audit_v2.md §6 — same set of components in same layers.
7. Summary¶
System maturity: MEDIUM (unchanged)
Strengths:
- Full end-to-end MLOps stack (DVC, MLflow, Celery, Airflow, Prometheus, K8s)
- features_meta.parquet as single feature contract
- GE validation gates at 3 boundaries (raw, finished, features)
- Independent batch_inference DVC branch
- Prometheus metrics on inference latency, confidence, model info
- Detailed documentation and audit trail (3 cycles in docs/validation/)
Top open risks:
- R1 smoke params in params.yaml
- R2 no automated retrain trigger
- R3 no hot-reload of model
- R6 no metric gate before champion promotion
Unknowns (require deeper audits):
- Current champion in MLflow registry and its metrics → see 05
- Real cadence of batch_inference re-runs → see 08
8. References¶
- Baseline detailed audits 01–11:
docs/validation/20260424/*.md - Last weekly check:
docs/validation/20260426/00_system_audit.md - Detailed audits this cycle:
docs/validation/20260428/01..12_*.md
Recommendation: since no production change since 20260422, audits 01–11 in this cycle are confirmation reports against 20260424 baseline. The next material full cycle should be triggered by an actual commit to src/, dvc.yaml, airflow/, or docker/.