Pipelines Reference¶
This page documents how to run the core workflows of the system locally and in CI.
Time2Bet uses: - Airflow for external ingestion/ETL - DVC pipelines for reproducible offline ML workflows
DVC pipeline entrypoints¶
Pull versioned data¶
Re-run a specific stage (example)¶
Show pipeline graph¶
Show pipeline status¶
Common workflows¶
Full offline training cycle¶
dvc pulldvc repro- inspect MLflow runs
- start API and run
/predict
Smoke run (CI-friendly)¶
- use a reduced dataset or subset target
-
run
dvc reproand ensure: -
pipeline completes
- basic metrics sanity checks pass
- artifacts logged to MLflow
Airflow workflows (operational)¶
Airflow is responsible for:
- scraping WhoScored.com,
- loading normalized data into PostgreSQL,
- exporting raw parquet snapshots to MinIO.
Airflow jobs are not expected to be reproducible in the strict ML sense, but their outputs are materialized and versioned downstream.
Make targets (developer ergonomics)¶
If Make targets are provided, they should map to:
- environment setup
- requirements export
- docs build
- encryption/decryption operations
Example:
Related docs¶
- Data → ETL / Raw Export / Versioning
- ML → Training Pipeline
- CI/CD → Testing Strategy