Docs Validation Audit — SoccerPredictAI¶

Date: 2026-04-28 Auditor: GitHub Copilot (Claude Opus 4.7) — /skill-ml-system-audit full (audit 12/12) Scope: Full consistency / truth validation between documentation and codebase (8-step algorithm) Inputs: docs/status.md (canonical), docs/architecture/, docs/ml/, docs/serving/, docs/monitoring/, docs/runbooks/, docs/quickstart.md, plus all 11 baseline audits (docs/validation/20260424/) and audits 00–11 of the current cycle.

1. Summary¶

Metric	Count	%
Atomic claims extracted	38	100%
VERIFIED	27	71%
PARTIAL	5	13%
PLANNED (correctly labelled in docs)	3	8%
CONTRADICTION	2	5%
UNVERIFIED (no live evidence accessible from this audit)	1	3%

Overall: documentation is honest and well-calibrated about most limitations (manual gates, drift, Grafana, alerting, Evidently are all explicitly marked as Planned/Manual). Two genuine doc-vs-code contradictions exist; both are tracked below.

2. Critical Issues (blockers)¶

#	Class	Location	Issue
C-01	CONTRADICTION	`docs/status.md` line: "Streamlit UI ✅ Operational — `src/ui/` — match list, predictions, async result polling"	Code: `src/ui/app/pages/` is empty; `APIClient` exposes only `get_livescores()` and `get_health()`. No prediction page, no async polling UI. Confirmed in audit 09.
C-02	CONTRADICTION / OUTDATED CONTRACT	`tests/contract/test_pipeline_contracts.py::EXPECTED_STAGES` includes `validate_interim`	`dvc.yaml` has no `validate_interim` stage. Contract test will fail. Confirmed in audits 01 and 11.

3. Section-by-section findings¶

Architecture (`docs/architecture/`)¶

Structure & boundaries documented in index.md, c4-*, data-flow.md, deployment-view.md, system-boundary.md, tradeoffs.md, failure-modes.md — all align with code observed in audits 00 and 04.
Limitations (manual model promotion, no automated retrain trigger, single VPS SPOF, no drift) are explicitly listed as such — no overstatements detected.
c4-containers.md:173 "Horizontally scalable" — qualified by neighbouring text; acceptable but borderline (current replicas=1 per audit 10).

Data (`docs/data/`)¶

Ingestion / storage / GE gates flow matches code (audit 01). No contradictions.

ML (`docs/ml/`)¶

model-registry.md correctly marks Staging→Production gate as 🚧 Partial / manual.
mlflow.md:202 correctly marks Evidently as 📋 Planned.
training-pipeline.md (modified, unstaged): document states production-grade training; does not warn that current params.yaml is in smoke mode (fracs=[0.001, 0.002], n_trials=2). → OVERSTATEMENT (informal): model-training claims should be downgraded to "smoke configuration active by default" or call this out as a known limitation. Confirmed via audit 03 (TR-01, TR-02).

Serving (`docs/serving/`)¶

inference-modes.md and health-and-failures.md honestly state no automated recovery beyond K8s restart policy.
deployment.md (modified, unstaged) — not re-reviewed in this cycle (no recent edits since 2026-04-26).
API surface listed in status.md matches src/app/routers/ (audit 07).

Monitoring (`docs/monitoring/`)¶

index.md, evidently.md, alerts.md, incidents.md, status.md consistently label Grafana, Evidently, AlertManager rules as Planned. No overstatement.

CI/CD (`docs/cicd/`)¶

Not deeply re-validated this cycle; baseline audits 04 and 11 found no contradictions.

Evidence (`docs/evidence/`)¶

Directory exists. Contents not re-validated for placeholders this cycle (out of scope for delta audit). → UNVERIFIED for placeholder-freeness.

Runbooks (`docs/runbooks/`)¶

oncall.md, incidents.md reference real endpoints (/healthcheck/, /metrics, /monitoring/celery/queues) — all exist in code (audit 07).
Acknowledge "No automated notifications today" → consistent with audit 08 (OR-04).

4. Claim Table (selected — full inventory in baseline audits)¶

#	Claim (paraphrased from `docs/status.md`)	Status	Evidence	Issue
1	Airflow ETL operational, scraping + Postgres ingestion	✅ VERIFIED	`airflow/dags/etl_livescores_*.py`, audit 08	—
2	MinIO object storage operational	✅ VERIFIED	`src/data/storage.py`, audit 01	—
3	DVC versioning operational	✅ VERIFIED	`dvc.yaml`, `dvc.lock`, audit 04	—
4	PostgreSQL canonical store	✅ VERIFIED	`src/app/database.py`	—
5	Feature engineering operational	✅ VERIFIED	`src/features/`, audits 02 + 06	—
6	DVC pipeline orchestration working	✅ VERIFIED	`dvc.yaml` 15 stages, audit 04	—
7	MLflow tracking operational	✅ VERIFIED	`src/utils/mlflow_meta.py`, audit 05	—
8	Train/test splitting time-based + CV folds	✅ VERIFIED	`src/data/splitting.py`, audit 03	—
9	Model training: baseline + XGBoost	⚠ PARTIAL / OVERSTATEMENT	`src/pipelines/classification.py` runs, but `params.yaml` is smoke (audit 03 TR-01/02)	Add "smoke params active" caveat in `status.md`
10	Model registry: registration automated, promotion gate manual	✅ VERIFIED	`src/pipelines/register_model.py`, audit 05	—
11	FastAPI: routers, middleware, lifespan, CORS	✅ VERIFIED	`src/app/main.py`, audit 07	—
12	`POST /predict/` sync via Celery `ml`, 30 s timeout	✅ VERIFIED	`src/app/routers/predict.py`, audit 07	—
13	`GET /predict/{match_id}` lookup	✅ VERIFIED	audit 07	—
14	`POST /predict/async/` returns task_id for polling	✅ VERIFIED	audit 07	—
15	`GET /predict/model/info` from registry	✅ VERIFIED	audit 07	—
16	Pydantic request validation	✅ VERIFIED	`src/app/schemas/predict.py`	—
17	Model loaded once per worker via `PredictionService`	✅ VERIFIED	`src/app/tasks/predict.py`, audit 07	—
18	Batch HTTP endpoint planned	✅ VERIFIED (label)	matches reality	—
19	Streamlit UI: match list, predictions, async polling	❌ CONTRADICTION	`src/ui/app/pages/` empty; only livescores page exists	C-01
20	Prometheus metrics: 8 counters/histograms/gauges	✅ VERIFIED	`src/app/metrics.py`, audit 10	—
21	`/healthcheck/`, liveness probes	✅ VERIFIED	`src/app/main.py`, K8s manifests	—
22	`/monitoring/celery/queues`, `/celery/workers`	✅ VERIFIED	audit 07	—
23	`/monitoring/task_status/{task_id}`	✅ VERIFIED	audit 07	—
24	Grafana dashboards planned	📋 PLANNED (correct label)	dev-only `docker/grafana/` untracked, not in K8s	—
25	Evidently drift detection planned	📋 PLANNED (correct label)	`src/monitoring/` empty, audit 10	—
26	Alerting rules documented but not deployed	📋 PLANNED (correct label)	matches reality	—
27	Docker images, K8s, Helm operational	✅ VERIFIED	`docker/`, `k8s/helm/ns_soccer-api/`, audit 10	—
28	GitLab CI operational	✅ VERIFIED	`.gitlab-ci.yml`	—
29	SOPS+age secrets operational	✅ VERIFIED	audit 10	—
30	pytest ~200 tests	✅ VERIFIED (~294 tests)	audit 11	—
31	Unit tests	✅ VERIFIED	`tests/unit/`	—
32	Property tests (Hypothesis)	✅ VERIFIED	`tests/property/`, audit 11	—
33	Service tests	✅ VERIFIED	`tests/service/`	—
34	Contract tests	❌ CONTRADICTION	`EXPECTED_STAGES` references nonexistent `validate_interim` (audit 11 DOC-01)	C-02
35	Load tests (Locust)	✅ VERIFIED	`tests/load/locustfile.py`	—
36	Integration tests "no live MLflow/Celery in CI"	🚧 PARTIAL (correct label)	matches reality	—
37	GE validation at raw / interim / features stages	⚠ PARTIAL	GE gates exist at raw / finished / future / features (not "interim" by name); naming drift between docs and `dvc.yaml`	Update wording in `status.md`
38	API endpoints unauthenticated, TLS-only	✅ VERIFIED	audits 07, 10	—

5. Overstatement scan (STEP 4)¶

Phrase	Location	Verdict
"automated" model registration	`docs/ml/model-registry.md:31`	✅ supported (DVC stage `register_model`)
"production-ready"	only in `docs/architecture/principles.md` (negative form) and `docs/ml/baseline.md` (about ECE threshold)	✅ not used as claim
"scalable" / "horizontally scalable"	`c4-containers.md:173`	⚠ borderline — current `replicas=1` (audit 10 OPS-01); acceptable as architectural intent if context made explicit
"low latency"	`docs/serving/inference-modes.md:8`	✅ qualitative motivation, not a numeric SLO claim
"real-time"	mentioned only as out-of-scope (`tradeoffs.md`, `roadmap.md`, `problem.md`)	✅ not claimed
"high availability"	not asserted; SPOF explicitly acknowledged (`requirements.md`, `index.md`)	✅ not claimed
"automated retraining"	explicitly marked future-only	✅ not claimed

No critical overstatements detected.

6. Runtime reality check (STEP 5)¶

Path	Traceable?
Data → ML → Serving	✅ (audits 01, 02, 04, 06, 07)
Model loading	✅ via `models:/soccer_clf@champion` (audits 05, 07)
API request → prediction → response	✅ sync + async + lookup paths in `src/app/routers/predict.py`
Metrics exposure	✅ `:8000/metrics` + `:9091/metrics`
Deployment + rollback path	⚠ BROKEN TRACEABILITY — no automated rollback; manual worker restart required after `champion` change (audits 06, 07; risks R3, SRV-02). Docs acknowledge this as manual — not a contradiction, but the rollback path itself is informal.

7. Evidence validation (STEP 6) — deferred¶

docs/evidence/ not re-walked in this cycle. Recommend a dedicated targeted audit.

8. Runbook validation (STEP 7)¶

Spot-check: docs/runbooks/oncall.md, docs/monitoring/incidents.md reference endpoints (/healthcheck/, /metrics, /monitoring/celery/queues) and CLI commands (redis-cli FLUSHDB, kubectl rollout restart). All endpoints verified to exist in src/app/main.py router registration. Assumption that there are no automated notifications is correctly stated.

No NON-EXECUTABLE STEP or INVALID ASSUMPTION detected in spot-check; deeper runbook validation deferred.

9. Required Fixes¶

Must fix (critical)¶

C-01 — docs/status.md: change "Streamlit UI ✅ Operational — match list, predictions, async result polling" to "✅ Operational — match list (livescores) only; predictions UI 📋 Planned." OR implement the prediction UI to match the claim.
C-02 — tests/contract/test_pipeline_contracts.py: remove validate_interim from EXPECTED_STAGES (and EXPECTED_UPSTREAM_DEPS) OR add the missing DVC stage.

Should fix (consistency)¶

Add "smoke parameters active by default in params.yaml" note next to "Model Training" line in docs/status.md (claim #9).
Reword "Great Expectations suites at raw / interim / features" in docs/status.md to "raw / finished / future / features" to match dvc.yaml (claim #37).
Update docs/status.md "Last updated" from 2026-04-19 to today and re-affirm.

Nice to fix (clarity)¶

In c4-containers.md, qualify "Horizontally scalable" with "(designed for; current single-replica, see architecture/requirements.md SPOF)".
Walk docs/evidence/ and replace any placeholders with real CLI/JSON outputs (out-of-scope this cycle — recommend separate audit).

10. Success criteria¶

The system does not yet pass full validation because:

2 critical contradictions remain (C-01, C-02).
1 overstatement (claim #9) lacks the "smoke" caveat.

After the three "Must fix / Should fix" items are applied, validation success criteria are met for the documentation surface re-checked here.