Skip to content

Testing Strategy

This page documents the testing approach used in Time2Bet and how it is enforced via CI/CD. The goal is to ensure that changes cannot silently break: - data pipelines, - model training, - serving contracts, - or production reliability.


Testing pyramid (practical)

We follow a pragmatic testing pyramid:

1) Unit tests (fast) - pure feature functions (determinism, schema invariants) - preprocessing utilities - config parsing / validation - small IO wrappers via mocks

2) Property-based tests (Hypothesis) - invariants over feature builders and transforms - robustness to missing values / edge cases - shape/type stability

3) Contract tests - Great Expectations suites on raw/processed/features (blocking for critical checks) - API schema validation (Pydantic/OpenAPI)

4) Integration / smoke tests - dvc repro on a reduced dataset (CI) - minimal end-to-end: data → model → /predict


What is considered a "quality gate"

A change is allowed to proceed to build/deploy only if all blocking gates pass.

Blocking gates

  • lint/format checks
  • unit + property-based tests
  • critical Great Expectations checks (schema + critical constraints)
  • pipeline smoke run (reduced data)
  • API contract test(s): at least one happy path + one invalid schema case

Non-blocking (signal-only)

  • drift warnings (Evidently)
  • non-critical GE checks (distribution or advisory expectations)
  • performance regression checks (initially informational)

CI execution model

Recommended CI jobs:

1) Lint & static checks

  • ruff (lint + format)
  • optional: type checking (mypy/pyright)

2) Unit + property tests

  • pytest (unit)
  • hypothesis integrated into pytest

3) Data contracts

  • run Great Expectations checkpoints on tracked datasets

4) Pipeline smoke

  • dvc pull (reduced dataset or a small DVC target)
  • dvc repro (smoke pipeline)

5) Serving contract tests

  • start API in test mode
  • call /predict with known-good payload
  • call /predict with invalid payload → expect 4xx

Suggested test coverage map

Layer What we test Examples
Data contracts & schema required cols exist, key uniqueness, null bounds
Features invariants no NaN introduced, stable dtype, fixed output shape
Training determinism & sanity metrics within expected bounds, no crash on smoke data
Registry integration model can be loaded by model_uri
Serving API contract 200 on happy path, 422 on schema errors
Ops readiness /healthcheck returns healthy when deps available

Examples of strong invariants (Hypothesis)

Feature pipeline invariants should include: - no NaN/inf introduced unexpectedly - output columns set is stable - types are stable (or explicitly cast) - aggregates do not use future leakage windows

Invariants are more valuable than “many example fixtures”. They encode correctness properties that survive refactors.


  • CI/CD → Quality Gates & Release Policy
  • Data → Data Contracts
  • Reference → API / Pipelines / Config