Test Coverage Implementation Plan — 2026-04-28¶
Cycle: Test strategy execution
Target window: 2 weeks (Phase A: 1 day; Phase B–D: 1.5 weeks; Phase E–F: ongoing)
Source of truth for scope: Test inventory and gap analysis (chat session 2026-04-28)
Related docs:
- docs/cicd/testing-strategy.md — high-level strategy (to be updated)
- docs/status.md — test count claims (to be updated)
- tests/contract/test_pipeline_contracts.py — contract baseline
Current state: pytest --collect-only → 274 collected, 5 collection errors → pytest tests/ is red on collect, hiding ~40+ tests. docs/status.md claims "~200 tests" (stale).
Grouping by phase: A (P0 unblock signal) → B/C/D (P1 server/ML/UI gaps) → E/F (P2/P3 + CI integration).
Phase A — Restore test signal (Day 1, ~4–6 h, P0)¶
Goal: green pytest tests/, honest test counts, runnable make test. Without this, all other work has no feedback loop.
T1 — Diagnose 5 collection errors (~30 min)¶
- Files failing to collect (verified via
pytest --collect-only): tests/unit/test_h2h.py— importsadd_h2h_featuresfromsrc/features/stats_matches.py(symbol absent).tests/unit/test_rest_days.py— importsadd_rest_days,rest_days_feature_meta(absent).tests/unit/test_classification_selection.py— imports_select_best_runfromsrc/models/classification.py(absent).tests/service/test_prediction_service.py— collection error (root cause TBD; likely service API drift).tests/test_api.py— collection error (likely_get_feature_lookupor related symbol drift insrc/app/routers/predict.py).- Action:
git log -p -- src/features/stats_matches.py src/models/classification.py src/app/services/predict.py src/app/routers/predict.pyto find when symbols disappeared and why (refactor vs. accidental delete). - Output: a per-file decision (restore symbol vs. delete/rewrite test).
- Do not make decisions blindly — git log is mandatory before T2.
T2 — Resolve each collection error (~1.5 h)¶
Per file from T1, choose one:
- (a) Restore symbol in src/ if it was incorrectly removed and contracts still demand it.
- (b) Rewrite test to current API if the public contract legitimately changed.
- (c) Delete test only if the underlying feature is officially deprecated (must be reflected in docs/status.md).
- Verification: pytest --collect-only -q shows 0 errors.
T3 — Add [tool.pytest.ini_options] to pyproject.toml (~15 min)¶
- File:
pyproject.toml - Block:
- Verification:
python -c "from app.config.security import SecuritySettings"works withoutPYTHONPATH=srcprefix when run viapytest(and ad-hoc imports are documented separately). - Verification:
pytest -m "not load and not integration"collects without warnings about unknown markers.
T4 — Add make test* targets (~15 min)¶
- File:
Makefile - Targets:
test: ## run full test suite (excludes load + integration) pytest -m "not load and not integration" test-fast: ## unit + property only pytest tests/unit tests/property -q test-contract: ## DVC + Pydantic + Helm contracts pytest tests/contract -q test-coverage: ## with coverage report pytest --cov=src --cov-report=term-missing --cov-report=html -m "not load and not integration" - Verification: each target runs and exits 0 (after T2).
T5 — Update doc claims (~30 min)¶
docs/status.md— replace "~200 tests" with actual collected count frompytest --collect-only -q | tail -1.docs/quickstart.md— same update + referencemake test.docs/cicd/testing-strategy.md— note current real coverage vs. aspirational; mark gates asImplemented/Planned.
Phase A DoD¶
- [ ]
pytest --collect-only -qreports 0 errors. - [ ]
pytest tests/exits green (excluding load/integration). - [ ]
make test,make test-fast,make test-contract,make test-coverageall work. - [ ]
docs/status.mdtest count matches reality.
Phase B — Server-side P1 gaps (Days 2–4)¶
Goal: API surface, CORS, UI client are tested. Closes audit findings G-03 through G-10.
T6 — Expand tests/test_api.py to all routers (~1 day)¶
- File:
tests/test_api.py(after T2) - Add tests for:
src/app/routers/livescores.py— happy + invalid-params.src/app/routers/monitoring.py—/celery/queues(mockcelery_app.control.inspect),/task_status/{id}(mockAsyncResult).src/app/routers/stats.py—/teams/search/,/team/with in-memory SQLite fixture.src/app/routers/sources.py,src/app/routers/healthcheck.py— smoke 200.- Each router: at least 1 happy-path + 1 validation/error case.
- Verification:
pytest tests/test_api.py -qgreen;pytest --cov=src/app/routers --cov-report=term≥ 70%.
T7 — CORS env-driven test (~30 min)¶
- New:
tests/test_api_cors.py(or extendtest_api.py) - Use
monkeypatch.setenv("CORS_ALLOWED_ORIGINS", "https://allowed.example")then re-importsrc/app/main:app. - Assert
Access-Control-Allow-Originheader for allowed origin = mirrored; for disallowed = absent. - Test default (
"") → no permissive CORS; test"*"→ wildcard. - Verification: 4 cases green.
T8 — tests/unit/test_api_client.py (~3 h, was scheduled in v1 T4)¶
- File: new
tests/unit/test_api_client.py - Use
respx(orhttpx.MockTransport) to mock backend. - Cover
src/ui/app/api_client.py:list_upcoming_matches(),get_prediction(),get_model_info(stage=...). - Cases: happy 200, 4xx →
APIError(status_code=...), 5xx →APIError, timeout →APIError. - Verification: ≥ 80% coverage of
api_client.py.
T9 — Helm template/lint smoke (~2 h)¶
- Add
make helm-test: - Optional: wrap in
tests/contract/test_helm_chart.pywithsubprocess.run. - Verification:
make helm-testgreen; intentional misconfig (e.g.rateLimit.enabled: falseand grep accordingly) fails the test.
T10 — MinIO storage integration test (~3 h, optional for v1)¶
- File: new
tests/integration/test_minio_storage.py - Use
moto[s3](in-process) to stub S3 endpoints; orlocalstackvia docker-compose for fuller integration. - Cover
src/app/data/storage.py: upload, retry on 5xx, sidecar.minio.jsonwritten. - Mark
@pytest.mark.integration; excluded from defaultmake test. - Verification:
pytest -m integration tests/integration/test_minio_storage.pygreen.
Phase B DoD¶
- [ ] Each
src/app/routers/*.pyhas ≥ 1 happy + 1 negative test. - [ ] CORS env behaviour tested (4 cases).
- [ ]
src/ui/app/api_client.py≥ 80% covered. - [ ]
make helm-testexists and passes. - [ ] (Optional) MinIO integration test exists, gated by marker.
Phase C — ML/pipeline P1 gaps (Days 5–6)¶
Goal: every DVC stage represented in contract tests; tuning/registration smoke-tested.
T11 — Extend contract tests to all DVC stages (~3 h)¶
- File:
tests/contract/test_pipeline_contracts.py - Add to
EXPECTED_STAGES:validate_finished,validate_future,ablation_study,tune_xgb,final_train,batch_inference,export_metadata(verify exact names against currentdvc.yaml). - For each new stage: extend
EXPECTED_UPSTREAM_DEPSandSTAGE_PARAMS. - Verification:
pytest tests/contract -qgreen;len(EXPECTED_STAGES)matches stage count indvc.yaml.
T12 — src/models/tuning.py smoke test (~2 h)¶
- File: new
tests/unit/test_tuning_smoke.py - Synthetic 100-row dataset, fixed seed,
n_trials=1, mock MLflow (mlflow.start_runno-op). - Assert: study returns a
best_paramsdict with expected keys; no exception; runtime < 5s. - Verification: deterministic across runs (
pytest --count=3ifpytest-repeatavailable).
T13 — register_model.py unit (~2 h)¶
- File: new
tests/unit/test_register_model.py - Mock MLflow client (
MlflowClient.create_registered_model,set_registered_model_alias,get_run). - Cover happy path: alias
championset on new run; metrics logged. - Cover failure: when challenger metrics worse than champion → no alias change.
- Verification: 4–6 tests, no real MLflow connection.
T14 — DVC reduced-pipeline smoke (~3 h, can defer)¶
- New CI job (
.gitlab-ci.yml):test:dvc-smoke. - Use a pre-committed reduced fixture (e.g.
tests/fixtures/dvc-smoke/) and run a subset:dvc repro split_data feature_engineering. - Failure on schema/IO regression → CI red.
- Verification: job runs < 2 min and is green on main.
Phase C DoD¶
- [ ] All DVC stages in
dvc.yamlare covered by contract tests. - [ ] Optuna tuning has smoke test (1 trial, deterministic).
- [ ] Model registration logic has unit tests (no real MLflow).
- [ ] (Optional) DVC smoke job exists in CI.
Phase D — UI and Airflow (Day 7)¶
Goal: pages render without exceptions; DAGs validate.
T15 — Streamlit smoke tests (~2 h)¶
- File: new
tests/unit/test_ui_pages.py - Use
streamlit.testing.v1.AppTest: - Cover:
1_Predictions.py,2_Model_Metrics.py,disclaimer.py(smoke render). - Mock
APIClientto avoid real HTTP. - Verification: 3 smoke tests green; total runtime < 10 s.
T16 — Airflow DAG validation (~2 h)¶
- File: new
tests/unit/test_airflow_dags.py - Use
DagBag:from airflow.models import DagBag def test_no_import_errors(): db = DagBag(dag_folder="airflow/dags", include_examples=False) assert db.import_errors == {} def test_critical_dags_present(): db = DagBag(dag_folder="airflow/dags", include_examples=False) assert "scraper_daily" in db.dag_ids # adjust to actual DAG IDs - Verification:
pytest tests/unit/test_airflow_dags.py -qgreen; new DAG with import error → test fails.
Phase D DoD¶
- [ ] Each
src/ui/app/pages/*.pyhas smokeAppTesttest. - [ ]
airflow/dags/**validated viaDagBag(no import errors, critical DAGs present).
Phase E — P2/P3 expansion (ongoing, opportunistic)¶
Not blocking v1; address as relevant code changes.
T17 — Scraper snapshot tests¶
src/app/scraper/driver.py— saved-HTML fixture parsing test.src/app/validation/livescores.py— Pydantic schema tests.
T18 — src/data/source.py, src/data/storage.py units (mock S3, file IO).¶
T19 — src/features/select.py unit (purity + schema).¶
T20 — Property tests for src/data/preprocess.py (Hypothesis metadata round-trip).¶
T21 — Coverage gating¶
- Add
--cov-fail-under=70tomake test-coverageand CI. - Per-package thresholds (e.g.
src/features≥ 90%,src/app/routers≥ 80%).
T22 — Mutation testing pilot on src/models/metrics.py (mutmut).¶
T23 — docs/testing.md (canonical inventory + conventions)¶
- Test pyramid (actual numbers).
- Layer ↔ test file ↔ type matrix.
- Conventions (purity, determinism, marker discipline).
- How to run / how to add tests.
- Replaces ad-hoc inventories in
docs/validation/*.
Phase F — CI integration (after A–C)¶
T24 — .gitlab-ci.yml job matrix¶
test:fast: pytest -m "not slow and not integration and not load" --cov=src
test:contract: pytest tests/contract
test:helm: make helm-test
test:dvc-smoke: <reduced repro> # T14
test:integration: pytest -m integration # nightly only
T25 — Coverage badge + branch protection¶
- Coverage report uploaded as CI artifact + badge in
README.md. - Required checks:
test:fast,test:contract,test:helmon every MR.
Cross-cutting conventions (codify in docs/testing.md — T23)¶
- Unit: pure, no IO, < 100 ms.
- Property: Hypothesis for invariants (no-leakage, schema, range).
max_examples ≥ 50in CI. - Service: mocks for external systems (Celery, MLflow, MinIO, DB).
- Contract: doc-truth tests for
dvc.yaml,params.yaml, Pydantic schemas, Helm templates. - Integration:
@pytest.mark.integration, separate CI job, may need docker-compose. - Load:
@pytest.mark.load, never in default suite. - Forbidden: live network without mocks, reading
data/raw/*in unit tests, randomness without seed. - Naming:
tests/<layer>/test_<module>_<aspect>.py; test function name = invariant (e.g.test_no_leakage_h2h).
Definition of Done — overall¶
- [ ] DoD-T1
pytest tests/green, no collection errors. - [ ] DoD-T2 Every
src/app/routers/*.pyhas ≥ 1 happy + 1 negative test. - [ ] DoD-T3 Every stage in
dvc.yamlis inEXPECTED_STAGES. - [ ] DoD-T4 Every
src/ui/app/pages/*.pycollects viaAppTestwithout exception. - [ ] DoD-T5
airflow/dags/**validated viaDagBag. - [ ] DoD-T6
helm lint+helm templategreen in CI. - [ ] DoD-T7
make testexists and is referenced indocs/quickstart.md. - [ ] DoD-T8
docs/testing.mdexists with current inventory and conventions. - [ ] DoD-T9 Coverage ≥ 70% on
src/(gated in CI after Phase E).
Suggested 2-day sprint slice¶
If only 2 days are budgeted, do T1–T8 + T11. This delivers:
- green CI (T1–T2),
- correct pyproject.toml + make ergonomics (T3–T4),
- honest doc claims (T5),
- new API endpoints covered (T6–T7),
- the v1 APIClient.get_model_info covered (T8),
- complete DVC contract coverage (T11).
This closes the v1-related risks (UI ↔ API contract is actually verified) and removes the "fake green" 200-tests-passing claim from status.md.