Skip to content

Time2Bet — End-to-End MLOps System

Time2Bet is a production-style end-to-end MLOps system for football match prediction. The project demonstrates how raw web data can be transformed into a monitored, versioned, and reproducible machine learning service running in production.

The goal of this project is not only to train a model, but to design, build, deploy, and operate a full ML system according to modern MLOps best practices.


What problem does this system solve?

Football match prediction requires:

  • heterogeneous and unreliable data sources,
  • strict leakage-safe validation,
  • continuous retraining as new matches appear,
  • low-latency online inference,
  • and constant monitoring of data and model quality.

Time2Bet addresses these challenges by combining data engineering, ML engineering, and platform engineering into a single coherent system.


System at a glance

flowchart LR A[WhoScored.com] -->|Scraping| B[Airflow ETL] B --> C[(PostgreSQL)] C -->|Export| D[MinIO / S3] D -->|DVC| E[Versioned Datasets] E --> F[DVC ML Pipeline] F --> G[MLflow Tracking & Registry] G -->|Model URI| H[FastAPI Inference Service] H -->|Sync / Async| I[Users / UI] H --> J[Prometheus] J --> K[Grafana] E --> L[Evidently] H --> L

Note: This diagram shows the target architecture. Components with solid lines are implemented. Components connected to monitoring (Prometheus, Grafana, Evidently) are designed but integration is in progress. See Implementation Status for current state of each component.


Key MLOps principles demonstrated

This project explicitly focuses on the following best practices:

  • Reproducibility ✅ Every experiment can be reproduced using versioned data (DVC), versioned code (Git), and versioned models (MLflow). Status: Fully implemented and tested.

  • Clear separation of concerns ✅ Data ingestion, feature engineering, training, serving, and monitoring are isolated but connected via explicit contracts. Status: Architectural pattern implemented throughout codebase.

  • Train / Serve parity 🚧 The same feature logic and model artifacts are designed to be used offline and online. Status: Feature code reusable; API integration in progress.

  • Production-ready serving infrastructure 🚧 FastAPI service with healthcheck endpoints, Docker images, and K8s manifests. Inference endpoints under development. Status: Infrastructure complete; model integration pending.

  • Observability architecture 📋 System designed for monitoring at service, data, and model levels using Prometheus, Grafana, and Evidently. Status: Architecture documented; integration planned.

  • Automation and safety ✅ CI/CD pipelines, quality gates, and encrypted secrets (SOPS + age) are part of the default workflow. Status: GitLab CI operational; secrets encrypted and versioned.


Demonstration

This project prioritizes local reproducibility over live deployment:

Reproducible Training Pipeline ✅

Run the full ML pipeline locally:

dvc pull    # Get versioned datasets
dvc repro   # Run full pipeline
mlflow ui   # Inspect experiments

Result: Deterministic model training with tracked experiments.

Live API Demo 🚧

Inference API is under development. Infrastructure is deployment-ready: - Docker images built - Kubernetes manifests configured - Healthcheck endpoints operational

See Quickstart for local pipeline demo and DEMO.md for interview walkthrough.


Project scope

What is in scope:

  • End-to-end ML lifecycle (data → model → service → monitoring)
  • Reproducible pipelines and artifacts
  • Production-style deployment and observability

What is out of scope:

  • Betting advice or guaranteed prediction accuracy
  • Commercial optimization or profit claims

Where to go next