Testing Strategy¶
This page documents the testing approach used in Time2Bet and how it is enforced via CI/CD. The goal is to ensure that changes cannot silently break: - data pipelines, - model training, - serving contracts, - or production reliability.
Testing pyramid (practical)¶
We follow a pragmatic testing pyramid:
1) Unit tests (fast) - pure feature functions (determinism, schema invariants) - preprocessing utilities - config parsing / validation - small IO wrappers via mocks
2) Property-based tests (Hypothesis) - invariants over feature builders and transforms - robustness to missing values / edge cases - shape/type stability
3) Contract tests - Great Expectations suites on raw/processed/features (blocking for critical checks) - API schema validation (Pydantic/OpenAPI)
4) Integration / smoke tests
- dvc repro on a reduced dataset (CI)
- minimal end-to-end: data → model → /predict
What is considered a "quality gate"¶
A change is allowed to proceed to build/deploy only if all blocking gates pass.
Blocking gates¶
- lint/format checks
- unit + property-based tests
- critical Great Expectations checks (schema + critical constraints)
- pipeline smoke run (reduced data)
- API contract test(s): at least one happy path + one invalid schema case
Non-blocking (signal-only)¶
- drift warnings (Evidently)
- non-critical GE checks (distribution or advisory expectations)
- performance regression checks (initially informational)
CI execution model¶
Recommended CI jobs:
1) Lint & static checks¶
- ruff (lint + format)
- optional: type checking (mypy/pyright)
2) Unit + property tests¶
- pytest (unit)
- hypothesis integrated into pytest
3) Data contracts¶
- run Great Expectations checkpoints on tracked datasets
4) Pipeline smoke¶
dvc pull(reduced dataset or a small DVC target)dvc repro(smoke pipeline)
5) Serving contract tests¶
- start API in test mode
- call
/predictwith known-good payload - call
/predictwith invalid payload → expect 4xx
Suggested test coverage map¶
| Layer | What we test | Examples |
|---|---|---|
| Data | contracts & schema | required cols exist, key uniqueness, null bounds |
| Features | invariants | no NaN introduced, stable dtype, fixed output shape |
| Training | determinism & sanity | metrics within expected bounds, no crash on smoke data |
| Registry | integration | model can be loaded by model_uri |
| Serving | API contract | 200 on happy path, 422 on schema errors |
| Ops | readiness | /healthcheck returns healthy when deps available |
Examples of strong invariants (Hypothesis)¶
Feature pipeline invariants should include: - no NaN/inf introduced unexpectedly - output columns set is stable - types are stable (or explicitly cast) - aggregates do not use future leakage windows
Invariants are more valuable than “many example fixtures”. They encode correctness properties that survive refactors.
What to read next¶
- CI/CD → Quality Gates & Release Policy
- Data → Data Contracts
- Reference → API / Pipelines / Config