Skip to content

Test Coverage Implementation Plan — 2026-04-28

Cycle: Test strategy execution Target window: 2 weeks (Phase A: 1 day; Phase B–D: 1.5 weeks; Phase E–F: ongoing) Source of truth for scope: Test inventory and gap analysis (chat session 2026-04-28) Related docs: - docs/cicd/testing-strategy.md — high-level strategy (to be updated) - docs/status.md — test count claims (to be updated) - tests/contract/test_pipeline_contracts.py — contract baseline

Current state: pytest --collect-only274 collected, 5 collection errorspytest tests/ is red on collect, hiding ~40+ tests. docs/status.md claims "~200 tests" (stale).

Grouping by phase: A (P0 unblock signal) → B/C/D (P1 server/ML/UI gaps) → E/F (P2/P3 + CI integration).


Phase A — Restore test signal (Day 1, ~4–6 h, P0)

Goal: green pytest tests/, honest test counts, runnable make test. Without this, all other work has no feedback loop.

T1 — Diagnose 5 collection errors (~30 min)

  • Files failing to collect (verified via pytest --collect-only):
  • tests/unit/test_h2h.py — imports add_h2h_features from src/features/stats_matches.py (symbol absent).
  • tests/unit/test_rest_days.py — imports add_rest_days, rest_days_feature_meta (absent).
  • tests/unit/test_classification_selection.py — imports _select_best_run from src/models/classification.py (absent).
  • tests/service/test_prediction_service.py — collection error (root cause TBD; likely service API drift).
  • tests/test_api.py — collection error (likely _get_feature_lookup or related symbol drift in src/app/routers/predict.py).
  • Action: git log -p -- src/features/stats_matches.py src/models/classification.py src/app/services/predict.py src/app/routers/predict.py to find when symbols disappeared and why (refactor vs. accidental delete).
  • Output: a per-file decision (restore symbol vs. delete/rewrite test).
  • Do not make decisions blindly — git log is mandatory before T2.

T2 — Resolve each collection error (~1.5 h)

Per file from T1, choose one: - (a) Restore symbol in src/ if it was incorrectly removed and contracts still demand it. - (b) Rewrite test to current API if the public contract legitimately changed. - (c) Delete test only if the underlying feature is officially deprecated (must be reflected in docs/status.md). - Verification: pytest --collect-only -q shows 0 errors.

T3 — Add [tool.pytest.ini_options] to pyproject.toml (~15 min)

  • File: pyproject.toml
  • Block:
    [tool.pytest.ini_options]
    pythonpath = ["src"]
    testpaths = ["tests"]
    markers = [
      "slow: tests that take >1s",
      "integration: requires live services (db, mlflow, minio)",
      "load: locust load tests (excluded by default)",
    ]
    addopts = "-q --strict-markers --tb=short"
    
  • Verification: python -c "from app.config.security import SecuritySettings" works without PYTHONPATH=src prefix when run via pytest (and ad-hoc imports are documented separately).
  • Verification: pytest -m "not load and not integration" collects without warnings about unknown markers.

T4 — Add make test* targets (~15 min)

  • File: Makefile
  • Targets:
    test:           ## run full test suite (excludes load + integration)
      pytest -m "not load and not integration"
    test-fast:      ## unit + property only
      pytest tests/unit tests/property -q
    test-contract:  ## DVC + Pydantic + Helm contracts
      pytest tests/contract -q
    test-coverage:  ## with coverage report
      pytest --cov=src --cov-report=term-missing --cov-report=html -m "not load and not integration"
    
  • Verification: each target runs and exits 0 (after T2).

T5 — Update doc claims (~30 min)

Phase A DoD

  • [ ] pytest --collect-only -q reports 0 errors.
  • [ ] pytest tests/ exits green (excluding load/integration).
  • [ ] make test, make test-fast, make test-contract, make test-coverage all work.
  • [ ] docs/status.md test count matches reality.

Phase B — Server-side P1 gaps (Days 2–4)

Goal: API surface, CORS, UI client are tested. Closes audit findings G-03 through G-10.

T6 — Expand tests/test_api.py to all routers (~1 day)

T7 — CORS env-driven test (~30 min)

  • New: tests/test_api_cors.py (or extend test_api.py)
  • Use monkeypatch.setenv("CORS_ALLOWED_ORIGINS", "https://allowed.example") then re-import src/app/main:app.
  • Assert Access-Control-Allow-Origin header for allowed origin = mirrored; for disallowed = absent.
  • Test default ("") → no permissive CORS; test "*" → wildcard.
  • Verification: 4 cases green.

T8 — tests/unit/test_api_client.py (~3 h, was scheduled in v1 T4)

  • File: new tests/unit/test_api_client.py
  • Use respx (or httpx.MockTransport) to mock backend.
  • Cover src/ui/app/api_client.py: list_upcoming_matches(), get_prediction(), get_model_info(stage=...).
  • Cases: happy 200, 4xx → APIError(status_code=...), 5xx → APIError, timeout → APIError.
  • Verification: ≥ 80% coverage of api_client.py.

T9 — Helm template/lint smoke (~2 h)

  • Add make helm-test:
    helm-test:
      helm lint k8s/helm/ns_soccer-api
      helm template k8s/helm/ns_soccer-api > /tmp/rendered.yaml
      @grep -q "limit-rps" /tmp/rendered.yaml || (echo "rate-limit annotation missing" && exit 1)
      @grep -q "CORS_ALLOWED_ORIGINS" /tmp/rendered.yaml || (echo "CORS env missing" && exit 1)
    
  • Optional: wrap in tests/contract/test_helm_chart.py with subprocess.run.
  • Verification: make helm-test green; intentional misconfig (e.g. rateLimit.enabled: false and grep accordingly) fails the test.

T10 — MinIO storage integration test (~3 h, optional for v1)

  • File: new tests/integration/test_minio_storage.py
  • Use moto[s3] (in-process) to stub S3 endpoints; or localstack via docker-compose for fuller integration.
  • Cover src/app/data/storage.py: upload, retry on 5xx, sidecar .minio.json written.
  • Mark @pytest.mark.integration; excluded from default make test.
  • Verification: pytest -m integration tests/integration/test_minio_storage.py green.

Phase B DoD

  • [ ] Each src/app/routers/*.py has ≥ 1 happy + 1 negative test.
  • [ ] CORS env behaviour tested (4 cases).
  • [ ] src/ui/app/api_client.py ≥ 80% covered.
  • [ ] make helm-test exists and passes.
  • [ ] (Optional) MinIO integration test exists, gated by marker.

Phase C — ML/pipeline P1 gaps (Days 5–6)

Goal: every DVC stage represented in contract tests; tuning/registration smoke-tested.

T11 — Extend contract tests to all DVC stages (~3 h)

  • File: tests/contract/test_pipeline_contracts.py
  • Add to EXPECTED_STAGES: validate_finished, validate_future, ablation_study, tune_xgb, final_train, batch_inference, export_metadata (verify exact names against current dvc.yaml).
  • For each new stage: extend EXPECTED_UPSTREAM_DEPS and STAGE_PARAMS.
  • Verification: pytest tests/contract -q green; len(EXPECTED_STAGES) matches stage count in dvc.yaml.

T12 — src/models/tuning.py smoke test (~2 h)

  • File: new tests/unit/test_tuning_smoke.py
  • Synthetic 100-row dataset, fixed seed, n_trials=1, mock MLflow (mlflow.start_run no-op).
  • Assert: study returns a best_params dict with expected keys; no exception; runtime < 5s.
  • Verification: deterministic across runs (pytest --count=3 if pytest-repeat available).

T13 — register_model.py unit (~2 h)

  • File: new tests/unit/test_register_model.py
  • Mock MLflow client (MlflowClient.create_registered_model, set_registered_model_alias, get_run).
  • Cover happy path: alias champion set on new run; metrics logged.
  • Cover failure: when challenger metrics worse than champion → no alias change.
  • Verification: 4–6 tests, no real MLflow connection.

T14 — DVC reduced-pipeline smoke (~3 h, can defer)

  • New CI job (.gitlab-ci.yml): test:dvc-smoke.
  • Use a pre-committed reduced fixture (e.g. tests/fixtures/dvc-smoke/) and run a subset: dvc repro split_data feature_engineering.
  • Failure on schema/IO regression → CI red.
  • Verification: job runs < 2 min and is green on main.

Phase C DoD

  • [ ] All DVC stages in dvc.yaml are covered by contract tests.
  • [ ] Optuna tuning has smoke test (1 trial, deterministic).
  • [ ] Model registration logic has unit tests (no real MLflow).
  • [ ] (Optional) DVC smoke job exists in CI.

Phase D — UI and Airflow (Day 7)

Goal: pages render without exceptions; DAGs validate.

T15 — Streamlit smoke tests (~2 h)

  • File: new tests/unit/test_ui_pages.py
  • Use streamlit.testing.v1.AppTest:
    from streamlit.testing.v1 import AppTest
    def test_predictions_page_renders(monkeypatch):
        # mock APIClient
        at = AppTest.from_file("src/ui/app/pages/1_Predictions.py")
        at.run(timeout=10)
        assert not at.exception
    
  • Cover: 1_Predictions.py, 2_Model_Metrics.py, disclaimer.py (smoke render).
  • Mock APIClient to avoid real HTTP.
  • Verification: 3 smoke tests green; total runtime < 10 s.

T16 — Airflow DAG validation (~2 h)

  • File: new tests/unit/test_airflow_dags.py
  • Use DagBag:
    from airflow.models import DagBag
    def test_no_import_errors():
        db = DagBag(dag_folder="airflow/dags", include_examples=False)
        assert db.import_errors == {}
    def test_critical_dags_present():
        db = DagBag(dag_folder="airflow/dags", include_examples=False)
        assert "scraper_daily" in db.dag_ids  # adjust to actual DAG IDs
    
  • Verification: pytest tests/unit/test_airflow_dags.py -q green; new DAG with import error → test fails.

Phase D DoD

  • [ ] Each src/ui/app/pages/*.py has smoke AppTest test.
  • [ ] airflow/dags/** validated via DagBag (no import errors, critical DAGs present).

Phase E — P2/P3 expansion (ongoing, opportunistic)

Not blocking v1; address as relevant code changes.

T17 — Scraper snapshot tests

  • src/app/scraper/driver.py — saved-HTML fixture parsing test.
  • src/app/validation/livescores.py — Pydantic schema tests.

T18 — src/data/source.py, src/data/storage.py units (mock S3, file IO).

T19 — src/features/select.py unit (purity + schema).

T20 — Property tests for src/data/preprocess.py (Hypothesis metadata round-trip).

T21 — Coverage gating

  • Add --cov-fail-under=70 to make test-coverage and CI.
  • Per-package thresholds (e.g. src/features ≥ 90%, src/app/routers ≥ 80%).

T22 — Mutation testing pilot on src/models/metrics.py (mutmut).

T23 — docs/testing.md (canonical inventory + conventions)

  • Test pyramid (actual numbers).
  • Layer ↔ test file ↔ type matrix.
  • Conventions (purity, determinism, marker discipline).
  • How to run / how to add tests.
  • Replaces ad-hoc inventories in docs/validation/*.

Phase F — CI integration (after A–C)

T24 — .gitlab-ci.yml job matrix

test:fast:      pytest -m "not slow and not integration and not load" --cov=src
test:contract:  pytest tests/contract
test:helm:      make helm-test
test:dvc-smoke: <reduced repro>      # T14
test:integration: pytest -m integration  # nightly only

T25 — Coverage badge + branch protection

  • Coverage report uploaded as CI artifact + badge in README.md.
  • Required checks: test:fast, test:contract, test:helm on every MR.

Cross-cutting conventions (codify in docs/testing.md — T23)

  • Unit: pure, no IO, < 100 ms.
  • Property: Hypothesis for invariants (no-leakage, schema, range). max_examples ≥ 50 in CI.
  • Service: mocks for external systems (Celery, MLflow, MinIO, DB).
  • Contract: doc-truth tests for dvc.yaml, params.yaml, Pydantic schemas, Helm templates.
  • Integration: @pytest.mark.integration, separate CI job, may need docker-compose.
  • Load: @pytest.mark.load, never in default suite.
  • Forbidden: live network without mocks, reading data/raw/* in unit tests, randomness without seed.
  • Naming: tests/<layer>/test_<module>_<aspect>.py; test function name = invariant (e.g. test_no_leakage_h2h).

Definition of Done — overall

  • [ ] DoD-T1 pytest tests/ green, no collection errors.
  • [ ] DoD-T2 Every src/app/routers/*.py has ≥ 1 happy + 1 negative test.
  • [ ] DoD-T3 Every stage in dvc.yaml is in EXPECTED_STAGES.
  • [ ] DoD-T4 Every src/ui/app/pages/*.py collects via AppTest without exception.
  • [ ] DoD-T5 airflow/dags/** validated via DagBag.
  • [ ] DoD-T6 helm lint + helm template green in CI.
  • [ ] DoD-T7 make test exists and is referenced in docs/quickstart.md.
  • [ ] DoD-T8 docs/testing.md exists with current inventory and conventions.
  • [ ] DoD-T9 Coverage ≥ 70% on src/ (gated in CI after Phase E).

Suggested 2-day sprint slice

If only 2 days are budgeted, do T1–T8 + T11. This delivers: - green CI (T1–T2), - correct pyproject.toml + make ergonomics (T3–T4), - honest doc claims (T5), - new API endpoints covered (T6–T7), - the v1 APIClient.get_model_info covered (T8), - complete DVC contract coverage (T11).

This closes the v1-related risks (UI ↔ API contract is actually verified) and removes the "fake green" 200-tests-passing claim from status.md.