Skip to content

Docs Validation Audit — SoccerPredictAI

Date: 2026-04-28 Auditor: GitHub Copilot (Claude Opus 4.7) — /skill-ml-system-audit full (audit 12/12) Scope: Full consistency / truth validation between documentation and codebase (8-step algorithm) Inputs: docs/status.md (canonical), docs/architecture/, docs/ml/, docs/serving/, docs/monitoring/, docs/runbooks/, docs/quickstart.md, plus all 11 baseline audits (docs/validation/20260424/) and audits 00–11 of the current cycle.


1. Summary

Metric Count %
Atomic claims extracted 38 100%
VERIFIED 27 71%
PARTIAL 5 13%
PLANNED (correctly labelled in docs) 3 8%
CONTRADICTION 2 5%
UNVERIFIED (no live evidence accessible from this audit) 1 3%

Overall: documentation is honest and well-calibrated about most limitations (manual gates, drift, Grafana, alerting, Evidently are all explicitly marked as Planned/Manual). Two genuine doc-vs-code contradictions exist; both are tracked below.


2. Critical Issues (blockers)

# Class Location Issue
C-01 CONTRADICTION docs/status.md line: "Streamlit UI ✅ Operational — src/ui/ — match list, predictions, async result polling" Code: src/ui/app/pages/ is empty; APIClient exposes only get_livescores() and get_health(). No prediction page, no async polling UI. Confirmed in audit 09.
C-02 CONTRADICTION / OUTDATED CONTRACT tests/contract/test_pipeline_contracts.py::EXPECTED_STAGES includes validate_interim dvc.yaml has no validate_interim stage. Contract test will fail. Confirmed in audits 01 and 11.

3. Section-by-section findings

Architecture (docs/architecture/)

  • Structure & boundaries documented in index.md, c4-*, data-flow.md, deployment-view.md, system-boundary.md, tradeoffs.md, failure-modes.md — all align with code observed in audits 00 and 04.
  • Limitations (manual model promotion, no automated retrain trigger, single VPS SPOF, no drift) are explicitly listed as such — no overstatements detected.
  • c4-containers.md:173 "Horizontally scalable" — qualified by neighbouring text; acceptable but borderline (current replicas=1 per audit 10).

Data (docs/data/)

  • Ingestion / storage / GE gates flow matches code (audit 01). No contradictions.

ML (docs/ml/)

  • model-registry.md correctly marks Staging→Production gate as 🚧 Partial / manual.
  • mlflow.md:202 correctly marks Evidently as 📋 Planned.
  • training-pipeline.md (modified, unstaged): document states production-grade training; does not warn that current params.yaml is in smoke mode (fracs=[0.001, 0.002], n_trials=2). → OVERSTATEMENT (informal): model-training claims should be downgraded to "smoke configuration active by default" or call this out as a known limitation. Confirmed via audit 03 (TR-01, TR-02).

Serving (docs/serving/)

  • inference-modes.md and health-and-failures.md honestly state no automated recovery beyond K8s restart policy.
  • deployment.md (modified, unstaged) — not re-reviewed in this cycle (no recent edits since 2026-04-26).
  • API surface listed in status.md matches src/app/routers/ (audit 07).

Monitoring (docs/monitoring/)

  • index.md, evidently.md, alerts.md, incidents.md, status.md consistently label Grafana, Evidently, AlertManager rules as Planned. No overstatement.

CI/CD (docs/cicd/)

  • Not deeply re-validated this cycle; baseline audits 04 and 11 found no contradictions.

Evidence (docs/evidence/)

  • Directory exists. Contents not re-validated for placeholders this cycle (out of scope for delta audit). → UNVERIFIED for placeholder-freeness.

Runbooks (docs/runbooks/)

  • oncall.md, incidents.md reference real endpoints (/healthcheck/, /metrics, /monitoring/celery/queues) — all exist in code (audit 07).
  • Acknowledge "No automated notifications today" → consistent with audit 08 (OR-04).

4. Claim Table (selected — full inventory in baseline audits)

# Claim (paraphrased from docs/status.md) Status Evidence Issue
1 Airflow ETL operational, scraping + Postgres ingestion ✅ VERIFIED airflow/dags/etl_livescores_*.py, audit 08
2 MinIO object storage operational ✅ VERIFIED src/data/storage.py, audit 01
3 DVC versioning operational ✅ VERIFIED dvc.yaml, dvc.lock, audit 04
4 PostgreSQL canonical store ✅ VERIFIED src/app/database.py
5 Feature engineering operational ✅ VERIFIED src/features/, audits 02 + 06
6 DVC pipeline orchestration working ✅ VERIFIED dvc.yaml 15 stages, audit 04
7 MLflow tracking operational ✅ VERIFIED src/utils/mlflow_meta.py, audit 05
8 Train/test splitting time-based + CV folds ✅ VERIFIED src/data/splitting.py, audit 03
9 Model training: baseline + XGBoost ⚠ PARTIAL / OVERSTATEMENT src/pipelines/classification.py runs, but params.yaml is smoke (audit 03 TR-01/02) Add "smoke params active" caveat in status.md
10 Model registry: registration automated, promotion gate manual ✅ VERIFIED src/pipelines/register_model.py, audit 05
11 FastAPI: routers, middleware, lifespan, CORS ✅ VERIFIED src/app/main.py, audit 07
12 POST /predict/ sync via Celery ml, 30 s timeout ✅ VERIFIED src/app/routers/predict.py, audit 07
13 GET /predict/{match_id} lookup ✅ VERIFIED audit 07
14 POST /predict/async/ returns task_id for polling ✅ VERIFIED audit 07
15 GET /predict/model/info from registry ✅ VERIFIED audit 07
16 Pydantic request validation ✅ VERIFIED src/app/schemas/predict.py
17 Model loaded once per worker via PredictionService ✅ VERIFIED src/app/tasks/predict.py, audit 07
18 Batch HTTP endpoint planned ✅ VERIFIED (label) matches reality
19 Streamlit UI: match list, predictions, async polling CONTRADICTION src/ui/app/pages/ empty; only livescores page exists C-01
20 Prometheus metrics: 8 counters/histograms/gauges ✅ VERIFIED src/app/metrics.py, audit 10
21 /healthcheck/, liveness probes ✅ VERIFIED src/app/main.py, K8s manifests
22 /monitoring/celery/queues, /celery/workers ✅ VERIFIED audit 07
23 /monitoring/task_status/{task_id} ✅ VERIFIED audit 07
24 Grafana dashboards planned 📋 PLANNED (correct label) dev-only docker/grafana/ untracked, not in K8s
25 Evidently drift detection planned 📋 PLANNED (correct label) src/monitoring/ empty, audit 10
26 Alerting rules documented but not deployed 📋 PLANNED (correct label) matches reality
27 Docker images, K8s, Helm operational ✅ VERIFIED docker/, k8s/helm/ns_soccer-api/, audit 10
28 GitLab CI operational ✅ VERIFIED .gitlab-ci.yml
29 SOPS+age secrets operational ✅ VERIFIED audit 10
30 pytest ~200 tests ✅ VERIFIED (~294 tests) audit 11
31 Unit tests ✅ VERIFIED tests/unit/
32 Property tests (Hypothesis) ✅ VERIFIED tests/property/, audit 11
33 Service tests ✅ VERIFIED tests/service/
34 Contract tests CONTRADICTION EXPECTED_STAGES references nonexistent validate_interim (audit 11 DOC-01) C-02
35 Load tests (Locust) ✅ VERIFIED tests/load/locustfile.py
36 Integration tests "no live MLflow/Celery in CI" 🚧 PARTIAL (correct label) matches reality
37 GE validation at raw / interim / features stages ⚠ PARTIAL GE gates exist at raw / finished / future / features (not "interim" by name); naming drift between docs and dvc.yaml Update wording in status.md
38 API endpoints unauthenticated, TLS-only ✅ VERIFIED audits 07, 10

5. Overstatement scan (STEP 4)

Phrase Location Verdict
"automated" model registration docs/ml/model-registry.md:31 ✅ supported (DVC stage register_model)
"production-ready" only in docs/architecture/principles.md (negative form) and docs/ml/baseline.md (about ECE threshold) ✅ not used as claim
"scalable" / "horizontally scalable" c4-containers.md:173 ⚠ borderline — current replicas=1 (audit 10 OPS-01); acceptable as architectural intent if context made explicit
"low latency" docs/serving/inference-modes.md:8 ✅ qualitative motivation, not a numeric SLO claim
"real-time" mentioned only as out-of-scope (tradeoffs.md, roadmap.md, problem.md) ✅ not claimed
"high availability" not asserted; SPOF explicitly acknowledged (requirements.md, index.md) ✅ not claimed
"automated retraining" explicitly marked future-only ✅ not claimed

No critical overstatements detected.


6. Runtime reality check (STEP 5)

Path Traceable?
Data → ML → Serving ✅ (audits 01, 02, 04, 06, 07)
Model loading ✅ via models:/soccer_clf@champion (audits 05, 07)
API request → prediction → response ✅ sync + async + lookup paths in src/app/routers/predict.py
Metrics exposure :8000/metrics + :9091/metrics
Deployment + rollback path BROKEN TRACEABILITY — no automated rollback; manual worker restart required after champion change (audits 06, 07; risks R3, SRV-02). Docs acknowledge this as manual — not a contradiction, but the rollback path itself is informal.

7. Evidence validation (STEP 6) — deferred

docs/evidence/ not re-walked in this cycle. Recommend a dedicated targeted audit.


8. Runbook validation (STEP 7)

Spot-check: docs/runbooks/oncall.md, docs/monitoring/incidents.md reference endpoints (/healthcheck/, /metrics, /monitoring/celery/queues) and CLI commands (redis-cli FLUSHDB, kubectl rollout restart). All endpoints verified to exist in src/app/main.py router registration. Assumption that there are no automated notifications is correctly stated.

No NON-EXECUTABLE STEP or INVALID ASSUMPTION detected in spot-check; deeper runbook validation deferred.


9. Required Fixes

Must fix (critical)

  1. C-01docs/status.md: change "Streamlit UI ✅ Operational — match list, predictions, async result polling" to "✅ Operational — match list (livescores) only; predictions UI 📋 Planned." OR implement the prediction UI to match the claim.
  2. C-02tests/contract/test_pipeline_contracts.py: remove validate_interim from EXPECTED_STAGES (and EXPECTED_UPSTREAM_DEPS) OR add the missing DVC stage.

Should fix (consistency)

  1. Add "smoke parameters active by default in params.yaml" note next to "Model Training" line in docs/status.md (claim #9).
  2. Reword "Great Expectations suites at raw / interim / features" in docs/status.md to "raw / finished / future / features" to match dvc.yaml (claim #37).
  3. Update docs/status.md "Last updated" from 2026-04-19 to today and re-affirm.

Nice to fix (clarity)

  1. In c4-containers.md, qualify "Horizontally scalable" with "(designed for; current single-replica, see architecture/requirements.md SPOF)".
  2. Walk docs/evidence/ and replace any placeholders with real CLI/JSON outputs (out-of-scope this cycle — recommend separate audit).

10. Success criteria

The system does not yet pass full validation because:

  • 2 critical contradictions remain (C-01, C-02).
  • 1 overstatement (claim #9) lacks the "smoke" caveat.

After the three "Must fix / Should fix" items are applied, validation success criteria are met for the documentation surface re-checked here.