Skip to content

Pipelines Reference¶

This page documents how to run the core workflows of the system locally and in CI.

Time2Bet uses: - Airflow for external ingestion/ETL - DVC pipelines for reproducible offline ML workflows

DVC pipeline entrypoints¶

Pull versioned data¶

dvc pull
````

### Reproduce the full ML pipeline

```bash
dvc repro

Re-run a specific stage (example)¶

dvc repro <stage_name>

Show pipeline graph¶

dvc dag

Show pipeline status¶

dvc status

Common workflows¶

Full offline training cycle¶

dvc pull
dvc repro
inspect MLflow runs
start API and run /predict

Smoke run (CI-friendly)¶

use a reduced dataset or subset target
run dvc repro and ensure:
pipeline completes
basic metrics sanity checks pass
artifacts logged to MLflow

Airflow workflows (operational)¶

Airflow is responsible for:

scraping WhoScored.com,
loading normalized data into PostgreSQL,
exporting raw parquet snapshots to MinIO.

Airflow jobs are not expected to be reproducible in the strict ML sense, but their outputs are materialized and versioned downstream.

Make targets (developer ergonomics)¶

If Make targets are provided, they should map to:

environment setup
requirements export
docs build
encryption/decryption operations

Example:

make docs-build
make export
make encrypt

Data → ETL / Raw Export / Versioning
ML → Training Pipeline
CI/CD → Testing Strategy