Skip to content

Project Roadmap

This page describes the evolution of the Time2Bet system from a data experiment to a production-style MLOps platform.

The roadmap reflects engineering priorities, not feature promises.

Last updated: February 27, 2026


✅ Completed — Inference MVP

Goal: Operational end-to-end prediction pipeline

Completed: - [x] POST /predict — sync inference via Celery ml queue - [x] GET /predict/{match_id} — lookup from precomputed batch features - [x] POST /predict/async/ — async Celery job with task_id returned - [x] GET /predict/model/info — MLflow model metadata - [x] MLflow model loading in PredictionService (one-time per worker process) - [x] Pydantic request/response schemas - [x] Prometheus metrics: 8 counters/histograms/gauges, GET /metrics endpoint - [x] ~200 tests: unit, property (Hypothesis), service, contract, load (Locust)

Evidence: curl http://localhost:8000/predict/matches/ returns upcoming matches; curl http://localhost:8000/metrics returns Prometheus exposition format.


Phase 1 — Data foundation ✅ Completed

Implemented: - Web scraping of football statistics (WhoScored) - Airflow-based ingestion pipelines - PostgreSQL as the canonical data store - Raw data export to MinIO - Dataset versioning with DVC

Outcome: Reliable and reproducible data ingestion pipeline operational.

Evidence:

dvc status         # Shows tracked datasets
ls airflow/dags/   # ETL workflows


Phase 2 — ML experimentation ✅ Completed

Implemented: - Feature engineering pipelines (src/features/stats_matches.py) - Leakage-safe validation strategy (time-based splits + CV) - Baseline and candidate models (Logistic Regression, XGBoost) - Experiment tracking with MLflow - Deterministic training via DVC pipelines

Outcome: Reproducible model development with full experiment tracking.

Evidence:

dvc repro      # Runs full pipeline
mlflow ui      # Shows tracked experiments


Phase 3 — Production serving 🚧 In Progress (~90%)

Implemented: - FastAPI application with routers, middleware, lifespan ✅ - Health check endpoints ✅ - Docker images for API and ML workers ✅ - Kubernetes manifests ✅ - Helm charts with configurable values ✅ - Gunicorn + Uvicorn production setup ✅ - Celery worker infrastructure ✅ - POST /predict — sync inference ✅ - GET /predict/{match_id} — batch feature lookup ✅ - POST /predict/async/ — async Celery job ✅ - GET /predict/model/info — registry metadata ✅ - Prometheus metrics exported at GET /metrics ✅ - Streamlit UI for match prediction ✅

Remaining: - HTTP batch prediction endpoint 📋 - A/B testing infrastructure 📋

Planned: - HTTP batch prediction endpoint 📋 - A/B testing infrastructure 📋

Outcome Target: ✅ Production-ready inference layer with model-backed predictions achieved.

Current State: - Infrastructure: Complete (Docker, K8s, CI/CD all working) - Runtime Integration: Complete (sync + async inference, Prometheus metrics, Streamlit UI)


Phase 4 — Monitoring & operations � In Progress (~40%)

Implemented: - Prometheus metrics: GET /metrics, _PrometheusMiddleware, 8 metrics ✅ - Healthcheck + liveness probes ✅ - Celery queue stats: GET /monitoring/celery/queues, /celery/workers ✅ - Task status polling: GET /monitoring/task_status/{task_id} ✅ - Incident runbooks documented ✅

Remaining Work: - Grafana dashboard deployment 📋 - Evidently drift detection integration 📋 - Alerting rules (Prometheus Alertmanager) 📋 - Backfill and rollback procedures 📋

Outcome Target: Full observability — metrics ✅, dashboards and drift detection pending.


Phase 5 — MLOps hardening 📋 Planned

Scope: - Automated quality gates in CI/CD (metric thresholds block merge) - Model promotion rules (automated Staging→Production based on eval metrics) - Improved backfill automation - Stress testing validation from Locust benchmarks - Great Expectations GE Data Docs in CI

Outcome Target: Increased reliability and automated promotion pipeline.

Timeline: After Phase 4 monitoring foundation.


Phase 6 — MLOps Maturity 📋 Planned (post-launch)

These components are standard in mature ML platforms and represent the next natural evolution of this system.

Hydra — Multi-environment configuration management

Current state: params.yaml + DVC params works well for single-environment use.

Motivation: Multiple deployment environments (dev/staging/prod) with different parameters, hyperparameter sweeps, and config composition require a proper config system.

Plan: Migrate pipeline configuration to Hydra with structured configs. DVC params: declarations remain as the reproducibility contract; Hydra generates them.

Trade-off: Adds abstraction overhead; justified only if config complexity grows.


Evidently — Data and model drift detection

Current state: Prometheus metrics for latency/throughput are live. No feature or prediction distribution monitoring.

Motivation: Silent model degradation is a primary production risk. Without drift detection, model performance can erode undetected.

Plan: 1. Add Evidently DataDrift and TargetDrift reports on prediction logs 2. Export drift metrics to Prometheus (custom exporter) 3. Alert on drift threshold breach via Alertmanager

Reference: Evidently docs


Feast — Feature Store

Current state: Features stored as versioned Parquet files via DVC. Train/serve parity enforced by shared src/features/stats_matches.py.

Motivation: As feature count grows, file-based storage creates: - point-in-time correctness complexity, - no feature reuse across models, - no online low-latency feature serving.

Plan: Migrate offline features to Feast with: - offline store: existing Parquet / PostgreSQL, - online store: Redis for sub-millisecond inference lookups, - feature views replacing current DVC batch_inference output.

Trade-off: Significant operational overhead; justified at >3 models or >50 features.

Outcome Target: Reusable, versioned features with online/offline parity by design.


Implementation Priorities

Execution order based on value and dependencies:

  1. Phase 4 completion (Current): Grafana + Evidently + alerting
  2. Quality hardening: GE Data Docs in CI, automated promotion gates
  3. Phase 5: MLOps hardening, load test validation
  4. Phase 6: Hydra config management, Evidently drift detection, Feast feature store
  5. Advanced features: A/B testing, batch HTTP endpoint, multi-model registry

Roadmap Adjustments

What Changed vs. Original Plan

Originally marked as Pending: - Inference layer (sync + async) - Prometheus metrics export - Test suite structure

Actual Reality (Feb 27, 2026): - Phase 3 is ~90% complete: sync + async predict, Prometheus metrics, Streamlit UI all operational - ~200 tests across unit/property/service/contract/load - Great Expectations running as mandatory DVC pipeline stages - Model registration automated via register_model DVC stage

Lesson: Incremental documentation updates lagged behind implementation. All status claims are now verified against code evidence.


Non-goals

The following are intentionally out of scope:

  • Commercial betting optimization
  • Financial profitability claims
  • User growth or monetization features
  • Real-time (sub-second) predictions
  • Multi-sport expansion

Success Criteria

Each phase is considered complete when:

  1. Code works: Feature is implemented and tested
  2. Docs updated: Reflects actual capability
  3. Reproducible: Another engineer can run it
  4. Observable: Logs, metrics, or checks exist

Guiding principle

Each roadmap step is evaluated by a single question:

Can this system be reliably reproduced, operated, and debugged by someone else?

If the answer is "no", the work is not considered complete.


References