Docs Validation Audit — SoccerPredictAI¶
Date: 2026-04-28
Auditor: GitHub Copilot (Claude Opus 4.7) — /skill-ml-system-audit full (audit 12/12)
Scope: Full consistency / truth validation between documentation and codebase (8-step algorithm)
Inputs: docs/status.md (canonical), docs/architecture/, docs/ml/, docs/serving/, docs/monitoring/, docs/runbooks/, docs/quickstart.md, plus all 11 baseline audits (docs/validation/20260424/) and audits 00–11 of the current cycle.
1. Summary¶
| Metric | Count | % |
|---|---|---|
| Atomic claims extracted | 38 | 100% |
| VERIFIED | 27 | 71% |
| PARTIAL | 5 | 13% |
| PLANNED (correctly labelled in docs) | 3 | 8% |
| CONTRADICTION | 2 | 5% |
| UNVERIFIED (no live evidence accessible from this audit) | 1 | 3% |
Overall: documentation is honest and well-calibrated about most limitations (manual gates, drift, Grafana, alerting, Evidently are all explicitly marked as Planned/Manual). Two genuine doc-vs-code contradictions exist; both are tracked below.
2. Critical Issues (blockers)¶
| # | Class | Location | Issue |
|---|---|---|---|
| C-01 | CONTRADICTION | docs/status.md line: "Streamlit UI ✅ Operational — src/ui/ — match list, predictions, async result polling" |
Code: src/ui/app/pages/ is empty; APIClient exposes only get_livescores() and get_health(). No prediction page, no async polling UI. Confirmed in audit 09. |
| C-02 | CONTRADICTION / OUTDATED CONTRACT | tests/contract/test_pipeline_contracts.py::EXPECTED_STAGES includes validate_interim |
dvc.yaml has no validate_interim stage. Contract test will fail. Confirmed in audits 01 and 11. |
3. Section-by-section findings¶
Architecture (docs/architecture/)¶
- Structure & boundaries documented in
index.md,c4-*,data-flow.md,deployment-view.md,system-boundary.md,tradeoffs.md,failure-modes.md— all align with code observed in audits 00 and 04. - Limitations (manual model promotion, no automated retrain trigger, single VPS SPOF, no drift) are explicitly listed as such — no overstatements detected.
c4-containers.md:173"Horizontally scalable" — qualified by neighbouring text; acceptable but borderline (currentreplicas=1per audit 10).
Data (docs/data/)¶
- Ingestion / storage / GE gates flow matches code (audit 01). No contradictions.
ML (docs/ml/)¶
model-registry.mdcorrectly marks Staging→Production gate as 🚧 Partial / manual.mlflow.md:202correctly marks Evidently as 📋 Planned.training-pipeline.md(modified, unstaged): document states production-grade training; does not warn that currentparams.yamlis in smoke mode (fracs=[0.001, 0.002],n_trials=2). → OVERSTATEMENT (informal): model-training claims should be downgraded to "smoke configuration active by default" or call this out as a known limitation. Confirmed via audit 03 (TR-01, TR-02).
Serving (docs/serving/)¶
inference-modes.mdandhealth-and-failures.mdhonestly state no automated recovery beyond K8s restart policy.deployment.md(modified, unstaged) — not re-reviewed in this cycle (no recent edits since 2026-04-26).- API surface listed in
status.mdmatchessrc/app/routers/(audit 07).
Monitoring (docs/monitoring/)¶
index.md,evidently.md,alerts.md,incidents.md,status.mdconsistently label Grafana, Evidently, AlertManager rules as Planned. No overstatement.
CI/CD (docs/cicd/)¶
- Not deeply re-validated this cycle; baseline audits 04 and 11 found no contradictions.
Evidence (docs/evidence/)¶
- Directory exists. Contents not re-validated for placeholders this cycle (out of scope for delta audit). → UNVERIFIED for placeholder-freeness.
Runbooks (docs/runbooks/)¶
oncall.md,incidents.mdreference real endpoints (/healthcheck/,/metrics,/monitoring/celery/queues) — all exist in code (audit 07).- Acknowledge "No automated notifications today" → consistent with audit 08 (OR-04).
4. Claim Table (selected — full inventory in baseline audits)¶
| # | Claim (paraphrased from docs/status.md) |
Status | Evidence | Issue |
|---|---|---|---|---|
| 1 | Airflow ETL operational, scraping + Postgres ingestion | ✅ VERIFIED | airflow/dags/etl_livescores_*.py, audit 08 |
— |
| 2 | MinIO object storage operational | ✅ VERIFIED | src/data/storage.py, audit 01 |
— |
| 3 | DVC versioning operational | ✅ VERIFIED | dvc.yaml, dvc.lock, audit 04 |
— |
| 4 | PostgreSQL canonical store | ✅ VERIFIED | src/app/database.py |
— |
| 5 | Feature engineering operational | ✅ VERIFIED | src/features/, audits 02 + 06 |
— |
| 6 | DVC pipeline orchestration working | ✅ VERIFIED | dvc.yaml 15 stages, audit 04 |
— |
| 7 | MLflow tracking operational | ✅ VERIFIED | src/utils/mlflow_meta.py, audit 05 |
— |
| 8 | Train/test splitting time-based + CV folds | ✅ VERIFIED | src/data/splitting.py, audit 03 |
— |
| 9 | Model training: baseline + XGBoost | ⚠ PARTIAL / OVERSTATEMENT | src/pipelines/classification.py runs, but params.yaml is smoke (audit 03 TR-01/02) |
Add "smoke params active" caveat in status.md |
| 10 | Model registry: registration automated, promotion gate manual | ✅ VERIFIED | src/pipelines/register_model.py, audit 05 |
— |
| 11 | FastAPI: routers, middleware, lifespan, CORS | ✅ VERIFIED | src/app/main.py, audit 07 |
— |
| 12 | POST /predict/ sync via Celery ml, 30 s timeout |
✅ VERIFIED | src/app/routers/predict.py, audit 07 |
— |
| 13 | GET /predict/{match_id} lookup |
✅ VERIFIED | audit 07 | — |
| 14 | POST /predict/async/ returns task_id for polling |
✅ VERIFIED | audit 07 | — |
| 15 | GET /predict/model/info from registry |
✅ VERIFIED | audit 07 | — |
| 16 | Pydantic request validation | ✅ VERIFIED | src/app/schemas/predict.py |
— |
| 17 | Model loaded once per worker via PredictionService |
✅ VERIFIED | src/app/tasks/predict.py, audit 07 |
— |
| 18 | Batch HTTP endpoint planned | ✅ VERIFIED (label) | matches reality | — |
| 19 | Streamlit UI: match list, predictions, async polling | ❌ CONTRADICTION | src/ui/app/pages/ empty; only livescores page exists |
C-01 |
| 20 | Prometheus metrics: 8 counters/histograms/gauges | ✅ VERIFIED | src/app/metrics.py, audit 10 |
— |
| 21 | /healthcheck/, liveness probes |
✅ VERIFIED | src/app/main.py, K8s manifests |
— |
| 22 | /monitoring/celery/queues, /celery/workers |
✅ VERIFIED | audit 07 | — |
| 23 | /monitoring/task_status/{task_id} |
✅ VERIFIED | audit 07 | — |
| 24 | Grafana dashboards planned | 📋 PLANNED (correct label) | dev-only docker/grafana/ untracked, not in K8s |
— |
| 25 | Evidently drift detection planned | 📋 PLANNED (correct label) | src/monitoring/ empty, audit 10 |
— |
| 26 | Alerting rules documented but not deployed | 📋 PLANNED (correct label) | matches reality | — |
| 27 | Docker images, K8s, Helm operational | ✅ VERIFIED | docker/, k8s/helm/ns_soccer-api/, audit 10 |
— |
| 28 | GitLab CI operational | ✅ VERIFIED | .gitlab-ci.yml |
— |
| 29 | SOPS+age secrets operational | ✅ VERIFIED | audit 10 | — |
| 30 | pytest ~200 tests | ✅ VERIFIED (~294 tests) | audit 11 | — |
| 31 | Unit tests | ✅ VERIFIED | tests/unit/ |
— |
| 32 | Property tests (Hypothesis) | ✅ VERIFIED | tests/property/, audit 11 |
— |
| 33 | Service tests | ✅ VERIFIED | tests/service/ |
— |
| 34 | Contract tests | ❌ CONTRADICTION | EXPECTED_STAGES references nonexistent validate_interim (audit 11 DOC-01) |
C-02 |
| 35 | Load tests (Locust) | ✅ VERIFIED | tests/load/locustfile.py |
— |
| 36 | Integration tests "no live MLflow/Celery in CI" | 🚧 PARTIAL (correct label) | matches reality | — |
| 37 | GE validation at raw / interim / features stages | ⚠ PARTIAL | GE gates exist at raw / finished / future / features (not "interim" by name); naming drift between docs and dvc.yaml |
Update wording in status.md |
| 38 | API endpoints unauthenticated, TLS-only | ✅ VERIFIED | audits 07, 10 | — |
5. Overstatement scan (STEP 4)¶
| Phrase | Location | Verdict |
|---|---|---|
| "automated" model registration | docs/ml/model-registry.md:31 |
✅ supported (DVC stage register_model) |
| "production-ready" | only in docs/architecture/principles.md (negative form) and docs/ml/baseline.md (about ECE threshold) |
✅ not used as claim |
| "scalable" / "horizontally scalable" | c4-containers.md:173 |
⚠ borderline — current replicas=1 (audit 10 OPS-01); acceptable as architectural intent if context made explicit |
| "low latency" | docs/serving/inference-modes.md:8 |
✅ qualitative motivation, not a numeric SLO claim |
| "real-time" | mentioned only as out-of-scope (tradeoffs.md, roadmap.md, problem.md) |
✅ not claimed |
| "high availability" | not asserted; SPOF explicitly acknowledged (requirements.md, index.md) |
✅ not claimed |
| "automated retraining" | explicitly marked future-only | ✅ not claimed |
No critical overstatements detected.
6. Runtime reality check (STEP 5)¶
| Path | Traceable? |
|---|---|
| Data → ML → Serving | ✅ (audits 01, 02, 04, 06, 07) |
| Model loading | ✅ via models:/soccer_clf@champion (audits 05, 07) |
| API request → prediction → response | ✅ sync + async + lookup paths in src/app/routers/predict.py |
| Metrics exposure | ✅ :8000/metrics + :9091/metrics |
| Deployment + rollback path | ⚠ BROKEN TRACEABILITY — no automated rollback; manual worker restart required after champion change (audits 06, 07; risks R3, SRV-02). Docs acknowledge this as manual — not a contradiction, but the rollback path itself is informal. |
7. Evidence validation (STEP 6) — deferred¶
docs/evidence/ not re-walked in this cycle. Recommend a dedicated targeted audit.
8. Runbook validation (STEP 7)¶
Spot-check: docs/runbooks/oncall.md, docs/monitoring/incidents.md reference endpoints (/healthcheck/, /metrics, /monitoring/celery/queues) and CLI commands (redis-cli FLUSHDB, kubectl rollout restart). All endpoints verified to exist in src/app/main.py router registration. Assumption that there are no automated notifications is correctly stated.
No NON-EXECUTABLE STEP or INVALID ASSUMPTION detected in spot-check; deeper runbook validation deferred.
9. Required Fixes¶
Must fix (critical)¶
- C-01 —
docs/status.md: change "Streamlit UI ✅ Operational — match list, predictions, async result polling" to "✅ Operational — match list (livescores) only; predictions UI 📋 Planned." OR implement the prediction UI to match the claim. - C-02 —
tests/contract/test_pipeline_contracts.py: removevalidate_interimfromEXPECTED_STAGES(andEXPECTED_UPSTREAM_DEPS) OR add the missing DVC stage.
Should fix (consistency)¶
- Add "smoke parameters active by default in
params.yaml" note next to "Model Training" line indocs/status.md(claim #9). - Reword "Great Expectations suites at raw / interim / features" in
docs/status.mdto "raw / finished / future / features" to matchdvc.yaml(claim #37). - Update
docs/status.md"Last updated" from 2026-04-19 to today and re-affirm.
Nice to fix (clarity)¶
- In
c4-containers.md, qualify "Horizontally scalable" with "(designed for; current single-replica, seearchitecture/requirements.mdSPOF)". - Walk
docs/evidence/and replace any placeholders with real CLI/JSON outputs (out-of-scope this cycle — recommend separate audit).
10. Success criteria¶
The system does not yet pass full validation because:
- 2 critical contradictions remain (C-01, C-02).
- 1 overstatement (claim #9) lacks the "smoke" caveat.
After the three "Must fix / Should fix" items are applied, validation success criteria are met for the documentation surface re-checked here.