Testing Strategy¶

This page documents the testing approach used in Time2Bet and how it is enforced via CI/CD. The goal is to ensure that changes cannot silently break: - data pipelines, - model training, - serving contracts, - or production reliability.

Testing pyramid (practical)¶

We follow a pragmatic testing pyramid:

1) Unit tests (fast) - pure feature functions (determinism, schema invariants) - preprocessing utilities - config parsing / validation - small IO wrappers via mocks

2) Property-based tests (Hypothesis) - invariants over feature builders and transforms - robustness to missing values / edge cases - shape/type stability

3) Contract tests - Great Expectations suites on raw/processed/features (blocking for critical checks) - API schema validation (Pydantic/OpenAPI)

4) Integration / smoke tests - dvc repro on a reduced dataset (CI) - minimal end-to-end: data → model → /predict

What is considered a "quality gate"¶

A change is allowed to proceed to build/deploy only if all blocking gates pass.

Blocking gates¶

lint/format checks
unit + property-based tests
critical Great Expectations checks (schema + critical constraints)
pipeline smoke run (reduced data)
API contract test(s): at least one happy path + one invalid schema case

Non-blocking (signal-only)¶

drift warnings (Evidently)
non-critical GE checks (distribution or advisory expectations)
performance regression checks (initially informational)

CI execution model¶

Recommended CI jobs:

1) Lint & static checks¶

ruff (lint + format)
optional: type checking (mypy/pyright)

2) Unit + property tests¶

pytest (unit)
hypothesis integrated into pytest

3) Data contracts¶

run Great Expectations checkpoints on tracked datasets

4) Pipeline smoke¶

dvc pull (reduced dataset or a small DVC target)
dvc repro (smoke pipeline)

5) Serving contract tests¶

start API in test mode
call /predict with known-good payload
call /predict with invalid payload → expect 4xx

Suggested test coverage map¶

Layer	What we test	Examples
Data	contracts & schema	required cols exist, key uniqueness, null bounds
Features	invariants	no NaN introduced, stable dtype, fixed output shape
Training	determinism & sanity	metrics within expected bounds, no crash on smoke data
Registry	integration	model can be loaded by `model_uri`
Serving	API contract	200 on happy path, 422 on schema errors
Ops	readiness	`/healthcheck` returns healthy when deps available

Examples of strong invariants (Hypothesis)¶

Feature pipeline invariants should include: - no NaN/inf introduced unexpectedly - output columns set is stable - types are stable (or explicitly cast) - aggregates do not use future leakage windows

Invariants are more valuable than “many example fixtures”. They encode correctness properties that survive refactors.

What to read next¶

CI/CD → Quality Gates & Release Policy
Data → Data Contracts
Reference → API / Pipelines / Config