Test Coverage Implementation Plan — 2026-04-28¶

Cycle: Test strategy execution Target window: 2 weeks (Phase A: 1 day; Phase B–D: 1.5 weeks; Phase E–F: ongoing) Source of truth for scope: Test inventory and gap analysis (chat session 2026-04-28) Related docs: - docs/cicd/testing-strategy.md — high-level strategy (to be updated) - docs/status.md — test count claims (to be updated) - tests/contract/test_pipeline_contracts.py — contract baseline

Current state: pytest --collect-only → 274 collected, 5 collection errors → pytest tests/ is red on collect, hiding ~40+ tests. docs/status.md claims "~200 tests" (stale).

Grouping by phase: A (P0 unblock signal) → B/C/D (P1 server/ML/UI gaps) → E/F (P2/P3 + CI integration).

Phase A — Restore test signal (Day 1, ~4–6 h, P0)¶

Goal: green pytest tests/, honest test counts, runnable make test. Without this, all other work has no feedback loop.

T1 — Diagnose 5 collection errors (~30 min)¶

Files failing to collect (verified via pytest --collect-only):
tests/unit/test_h2h.py — imports add_h2h_features from src/features/stats_matches.py (symbol absent).
tests/unit/test_rest_days.py — imports add_rest_days, rest_days_feature_meta (absent).
tests/unit/test_classification_selection.py — imports _select_best_run from src/models/classification.py (absent).
tests/service/test_prediction_service.py — collection error (root cause TBD; likely service API drift).
tests/test_api.py — collection error (likely _get_feature_lookup or related symbol drift in src/app/routers/predict.py).
Action: git log -p -- src/features/stats_matches.py src/models/classification.py src/app/services/predict.py src/app/routers/predict.py to find when symbols disappeared and why (refactor vs. accidental delete).
Output: a per-file decision (restore symbol vs. delete/rewrite test).
Do not make decisions blindly — git log is mandatory before T2.

T2 — Resolve each collection error (~1.5 h)¶

Per file from T1, choose one: - (a) Restore symbol in src/ if it was incorrectly removed and contracts still demand it. - (b) Rewrite test to current API if the public contract legitimately changed. - (c) Delete test only if the underlying feature is officially deprecated (must be reflected in docs/status.md). - Verification: pytest --collect-only -q shows 0 errors.

T3 — Add `[tool.pytest.ini_options]` to `pyproject.toml` (~15 min)¶

File: pyproject.toml

Block:

[tool.pytest.ini_options]
pythonpath = ["src"]
testpaths = ["tests"]
markers = [
  "slow: tests that take >1s",
  "integration: requires live services (db, mlflow, minio)",
  "load: locust load tests (excluded by default)",
]
addopts = "-q --strict-markers --tb=short"

Verification: python -c "from app.config.security import SecuritySettings" works without PYTHONPATH=src prefix when run via pytest (and ad-hoc imports are documented separately).
Verification: pytest -m "not load and not integration" collects without warnings about unknown markers.

T4 — Add `make test*` targets (~15 min)¶

File: Makefile

Targets:

test:           ## run full test suite (excludes load + integration)
  pytest -m "not load and not integration"
test-fast:      ## unit + property only
  pytest tests/unit tests/property -q
test-contract:  ## DVC + Pydantic + Helm contracts
  pytest tests/contract -q
test-coverage:  ## with coverage report
  pytest --cov=src --cov-report=term-missing --cov-report=html -m "not load and not integration"

Verification: each target runs and exits 0 (after T2).

T5 — Update doc claims (~30 min)¶

docs/status.md — replace "~200 tests" with actual collected count from pytest --collect-only -q | tail -1.
docs/quickstart.md — same update + reference make test.
docs/cicd/testing-strategy.md — note current real coverage vs. aspirational; mark gates as Implemented / Planned.

Phase A DoD¶

[ ] pytest --collect-only -q reports 0 errors.
[ ] pytest tests/ exits green (excluding load/integration).
[ ] make test, make test-fast, make test-contract, make test-coverage all work.
[ ] docs/status.md test count matches reality.

Phase B — Server-side P1 gaps (Days 2–4)¶

Goal: API surface, CORS, UI client are tested. Closes audit findings G-03 through G-10.

T6 — Expand `tests/test_api.py` to all routers (~1 day)¶

File: tests/test_api.py (after T2)
Add tests for:
src/app/routers/livescores.py — happy + invalid-params.
src/app/routers/monitoring.py — /celery/queues (mock celery_app.control.inspect), /task_status/{id} (mock AsyncResult).
src/app/routers/stats.py — /teams/search/, /team/ with in-memory SQLite fixture.
src/app/routers/sources.py, src/app/routers/healthcheck.py — smoke 200.
Each router: at least 1 happy-path + 1 validation/error case.
Verification: pytest tests/test_api.py -q green; pytest --cov=src/app/routers --cov-report=term ≥ 70%.

T7 — CORS env-driven test (~30 min)¶

New: tests/test_api_cors.py (or extend test_api.py)
Use monkeypatch.setenv("CORS_ALLOWED_ORIGINS", "https://allowed.example") then re-import src/app/main:app.
Assert Access-Control-Allow-Origin header for allowed origin = mirrored; for disallowed = absent.
Test default ("") → no permissive CORS; test "*" → wildcard.
Verification: 4 cases green.

T8 — `tests/unit/test_api_client.py` (~3 h, was scheduled in v1 T4)¶

File: new tests/unit/test_api_client.py
Use respx (or httpx.MockTransport) to mock backend.
Cover src/ui/app/api_client.py: list_upcoming_matches(), get_prediction(), get_model_info(stage=...).
Cases: happy 200, 4xx → APIError(status_code=...), 5xx → APIError, timeout → APIError.
Verification: ≥ 80% coverage of api_client.py.

T9 — Helm template/lint smoke (~2 h)¶

Add make helm-test:

helm-test:
  helm lint k8s/helm/ns_soccer-api
  helm template k8s/helm/ns_soccer-api > /tmp/rendered.yaml
  @grep -q "limit-rps" /tmp/rendered.yaml || (echo "rate-limit annotation missing" && exit 1)
  @grep -q "CORS_ALLOWED_ORIGINS" /tmp/rendered.yaml || (echo "CORS env missing" && exit 1)

Optional: wrap in tests/contract/test_helm_chart.py with subprocess.run.
Verification: make helm-test green; intentional misconfig (e.g. rateLimit.enabled: false and grep accordingly) fails the test.

T10 — MinIO storage integration test (~3 h, optional for v1)¶

File: new tests/integration/test_minio_storage.py
Use moto[s3] (in-process) to stub S3 endpoints; or localstack via docker-compose for fuller integration.
Cover src/app/data/storage.py: upload, retry on 5xx, sidecar .minio.json written.
Mark @pytest.mark.integration; excluded from default make test.
Verification: pytest -m integration tests/integration/test_minio_storage.py green.

Phase B DoD¶

[ ] Each src/app/routers/*.py has ≥ 1 happy + 1 negative test.
[ ] CORS env behaviour tested (4 cases).
[ ] src/ui/app/api_client.py ≥ 80% covered.
[ ] make helm-test exists and passes.
[ ] (Optional) MinIO integration test exists, gated by marker.

Phase C — ML/pipeline P1 gaps (Days 5–6)¶

Goal: every DVC stage represented in contract tests; tuning/registration smoke-tested.

T11 — Extend contract tests to all DVC stages (~3 h)¶

File: tests/contract/test_pipeline_contracts.py
Add to EXPECTED_STAGES: validate_finished, validate_future, ablation_study, tune_xgb, final_train, batch_inference, export_metadata (verify exact names against current dvc.yaml).
For each new stage: extend EXPECTED_UPSTREAM_DEPS and STAGE_PARAMS.
Verification: pytest tests/contract -q green; len(EXPECTED_STAGES) matches stage count in dvc.yaml.

T12 — `src/models/tuning.py` smoke test (~2 h)¶

File: new tests/unit/test_tuning_smoke.py
Synthetic 100-row dataset, fixed seed, n_trials=1, mock MLflow (mlflow.start_run no-op).
Assert: study returns a best_params dict with expected keys; no exception; runtime < 5s.
Verification: deterministic across runs (pytest --count=3 if pytest-repeat available).

T13 — `register_model.py` unit (~2 h)¶

File: new tests/unit/test_register_model.py
Mock MLflow client (MlflowClient.create_registered_model, set_registered_model_alias, get_run).
Cover happy path: alias champion set on new run; metrics logged.
Cover failure: when challenger metrics worse than champion → no alias change.
Verification: 4–6 tests, no real MLflow connection.

T14 — DVC reduced-pipeline smoke (~3 h, can defer)¶

New CI job (.gitlab-ci.yml): test:dvc-smoke.
Use a pre-committed reduced fixture (e.g. tests/fixtures/dvc-smoke/) and run a subset: dvc repro split_data feature_engineering.
Failure on schema/IO regression → CI red.
Verification: job runs < 2 min and is green on main.

Phase C DoD¶

[ ] All DVC stages in dvc.yaml are covered by contract tests.
[ ] Optuna tuning has smoke test (1 trial, deterministic).
[ ] Model registration logic has unit tests (no real MLflow).
[ ] (Optional) DVC smoke job exists in CI.

Phase D — UI and Airflow (Day 7)¶

Goal: pages render without exceptions; DAGs validate.

T15 — Streamlit smoke tests (~2 h)¶

File: new tests/unit/test_ui_pages.py

Use streamlit.testing.v1.AppTest:

from streamlit.testing.v1 import AppTest
def test_predictions_page_renders(monkeypatch):
    # mock APIClient
    at = AppTest.from_file("src/ui/app/pages/1_Predictions.py")
    at.run(timeout=10)
    assert not at.exception

Cover: 1_Predictions.py, 2_Model_Metrics.py, disclaimer.py (smoke render).
Mock APIClient to avoid real HTTP.
Verification: 3 smoke tests green; total runtime < 10 s.

T16 — Airflow DAG validation (~2 h)¶

File: new tests/unit/test_airflow_dags.py

Use DagBag:

from airflow.models import DagBag
def test_no_import_errors():
    db = DagBag(dag_folder="airflow/dags", include_examples=False)
    assert db.import_errors == {}
def test_critical_dags_present():
    db = DagBag(dag_folder="airflow/dags", include_examples=False)
    assert "scraper_daily" in db.dag_ids  # adjust to actual DAG IDs

Verification: pytest tests/unit/test_airflow_dags.py -q green; new DAG with import error → test fails.

Phase D DoD¶

[ ] Each src/ui/app/pages/*.py has smoke AppTest test.
[ ] airflow/dags/** validated via DagBag (no import errors, critical DAGs present).

Phase E — P2/P3 expansion (ongoing, opportunistic)¶

Not blocking v1; address as relevant code changes.

T17 — Scraper snapshot tests¶

src/app/scraper/driver.py — saved-HTML fixture parsing test.
src/app/validation/livescores.py — Pydantic schema tests.

T18 — `src/data/source.py`, `src/data/storage.py` units (mock S3, file IO).¶

T19 — `src/features/select.py` unit (purity + schema).¶

T20 — Property tests for `src/data/preprocess.py` (Hypothesis metadata round-trip).¶

T21 — Coverage gating¶

Add --cov-fail-under=70 to make test-coverage and CI.
Per-package thresholds (e.g. src/features ≥ 90%, src/app/routers ≥ 80%).

T22 — Mutation testing pilot on `src/models/metrics.py` (`mutmut`).¶

T23 — `docs/testing.md` (canonical inventory + conventions)¶

Test pyramid (actual numbers).
Layer ↔ test file ↔ type matrix.
Conventions (purity, determinism, marker discipline).
How to run / how to add tests.
Replaces ad-hoc inventories in docs/validation/*.

Phase F — CI integration (after A–C)¶

T24 — `.gitlab-ci.yml` job matrix¶

test:fast:      pytest -m "not slow and not integration and not load" --cov=src
test:contract:  pytest tests/contract
test:helm:      make helm-test
test:dvc-smoke: <reduced repro>      # T14
test:integration: pytest -m integration  # nightly only

T25 — Coverage badge + branch protection¶

Coverage report uploaded as CI artifact + badge in README.md.
Required checks: test:fast, test:contract, test:helm on every MR.

Cross-cutting conventions (codify in `docs/testing.md` — T23)¶

Unit: pure, no IO, < 100 ms.
Property: Hypothesis for invariants (no-leakage, schema, range). max_examples ≥ 50 in CI.
Service: mocks for external systems (Celery, MLflow, MinIO, DB).
Contract: doc-truth tests for dvc.yaml, params.yaml, Pydantic schemas, Helm templates.
Integration: @pytest.mark.integration, separate CI job, may need docker-compose.
Load: @pytest.mark.load, never in default suite.
Forbidden: live network without mocks, reading data/raw/* in unit tests, randomness without seed.
Naming: tests/<layer>/test_<module>_<aspect>.py; test function name = invariant (e.g. test_no_leakage_h2h).

Definition of Done — overall¶

[ ] DoD-T1 pytest tests/ green, no collection errors.
[ ] DoD-T2 Every src/app/routers/*.py has ≥ 1 happy + 1 negative test.
[ ] DoD-T3 Every stage in dvc.yaml is in EXPECTED_STAGES.
[ ] DoD-T4 Every src/ui/app/pages/*.py collects via AppTest without exception.
[ ] DoD-T5 airflow/dags/** validated via DagBag.
[ ] DoD-T6 helm lint + helm template green in CI.
[ ] DoD-T7 make test exists and is referenced in docs/quickstart.md.
[ ] DoD-T8 docs/testing.md exists with current inventory and conventions.
[ ] DoD-T9 Coverage ≥ 70% on src/ (gated in CI after Phase E).

Suggested 2-day sprint slice¶

If only 2 days are budgeted, do T1–T8 + T11. This delivers: - green CI (T1–T2), - correct pyproject.toml + make ergonomics (T3–T4), - honest doc claims (T5), - new API endpoints covered (T6–T7), - the v1 APIClient.get_model_info covered (T8), - complete DVC contract coverage (T11).

This closes the v1-related risks (UI ↔ API contract is actually verified) and removes the "fake green" 200-tests-passing claim from status.md.