Project Roadmap¶
This page describes the evolution of the Time2Bet system from a data experiment to a production-style MLOps platform.
The roadmap reflects engineering priorities, not feature promises.
Last updated: February 27, 2026
✅ Completed — Inference MVP¶
Goal: Operational end-to-end prediction pipeline
Completed:
- [x] POST /predict — sync inference via Celery ml queue
- [x] GET /predict/{match_id} — lookup from precomputed batch features
- [x] POST /predict/async/ — async Celery job with task_id returned
- [x] GET /predict/model/info — MLflow model metadata
- [x] MLflow model loading in PredictionService (one-time per worker process)
- [x] Pydantic request/response schemas
- [x] Prometheus metrics: 8 counters/histograms/gauges, GET /metrics endpoint
- [x] ~200 tests: unit, property (Hypothesis), service, contract, load (Locust)
Evidence: curl http://localhost:8000/predict/matches/ returns upcoming matches;
curl http://localhost:8000/metrics returns Prometheus exposition format.
Phase 1 — Data foundation ✅ Completed¶
Implemented: - Web scraping of football statistics (WhoScored) - Airflow-based ingestion pipelines - PostgreSQL as the canonical data store - Raw data export to MinIO - Dataset versioning with DVC
Outcome: Reliable and reproducible data ingestion pipeline operational.
Evidence:
Phase 2 — ML experimentation ✅ Completed¶
Implemented:
- Feature engineering pipelines (src/features/stats_matches.py)
- Leakage-safe validation strategy (time-based splits + CV)
- Baseline and candidate models (Logistic Regression, XGBoost)
- Experiment tracking with MLflow
- Deterministic training via DVC pipelines
Outcome: Reproducible model development with full experiment tracking.
Evidence:
Phase 3 — Production serving 🚧 In Progress (~90%)¶
Implemented:
- FastAPI application with routers, middleware, lifespan ✅
- Health check endpoints ✅
- Docker images for API and ML workers ✅
- Kubernetes manifests ✅
- Helm charts with configurable values ✅
- Gunicorn + Uvicorn production setup ✅
- Celery worker infrastructure ✅
- POST /predict — sync inference ✅
- GET /predict/{match_id} — batch feature lookup ✅
- POST /predict/async/ — async Celery job ✅
- GET /predict/model/info — registry metadata ✅
- Prometheus metrics exported at GET /metrics ✅
- Streamlit UI for match prediction ✅
Remaining: - HTTP batch prediction endpoint 📋 - A/B testing infrastructure 📋
Planned: - HTTP batch prediction endpoint 📋 - A/B testing infrastructure 📋
Outcome Target: ✅ Production-ready inference layer with model-backed predictions achieved.
Current State: - Infrastructure: Complete (Docker, K8s, CI/CD all working) - Runtime Integration: Complete (sync + async inference, Prometheus metrics, Streamlit UI)
Phase 4 — Monitoring & operations � In Progress (~40%)¶
Implemented:
- Prometheus metrics: GET /metrics, _PrometheusMiddleware, 8 metrics ✅
- Healthcheck + liveness probes ✅
- Celery queue stats: GET /monitoring/celery/queues, /celery/workers ✅
- Task status polling: GET /monitoring/task_status/{task_id} ✅
- Incident runbooks documented ✅
Remaining Work: - Grafana dashboard deployment 📋 - Evidently drift detection integration 📋 - Alerting rules (Prometheus Alertmanager) 📋 - Backfill and rollback procedures 📋
Outcome Target: Full observability — metrics ✅, dashboards and drift detection pending.
Phase 5 — MLOps hardening 📋 Planned¶
Scope: - Automated quality gates in CI/CD (metric thresholds block merge) - Model promotion rules (automated Staging→Production based on eval metrics) - Improved backfill automation - Stress testing validation from Locust benchmarks - Great Expectations GE Data Docs in CI
Outcome Target: Increased reliability and automated promotion pipeline.
Timeline: After Phase 4 monitoring foundation.
Phase 6 — MLOps Maturity 📋 Planned (post-launch)¶
These components are standard in mature ML platforms and represent the next natural evolution of this system.
Hydra — Multi-environment configuration management¶
Current state: params.yaml + DVC params works well for single-environment use.
Motivation: Multiple deployment environments (dev/staging/prod) with different parameters, hyperparameter sweeps, and config composition require a proper config system.
Plan: Migrate pipeline configuration to Hydra with structured configs.
DVC params: declarations remain as the reproducibility contract; Hydra generates them.
Trade-off: Adds abstraction overhead; justified only if config complexity grows.
Evidently — Data and model drift detection¶
Current state: Prometheus metrics for latency/throughput are live. No feature or prediction distribution monitoring.
Motivation: Silent model degradation is a primary production risk. Without drift detection, model performance can erode undetected.
Plan:
1. Add Evidently DataDrift and TargetDrift reports on prediction logs
2. Export drift metrics to Prometheus (custom exporter)
3. Alert on drift threshold breach via Alertmanager
Reference: Evidently docs
Feast — Feature Store¶
Current state: Features stored as versioned Parquet files via DVC.
Train/serve parity enforced by shared src/features/stats_matches.py.
Motivation: As feature count grows, file-based storage creates: - point-in-time correctness complexity, - no feature reuse across models, - no online low-latency feature serving.
Plan: Migrate offline features to Feast with:
- offline store: existing Parquet / PostgreSQL,
- online store: Redis for sub-millisecond inference lookups,
- feature views replacing current DVC batch_inference output.
Trade-off: Significant operational overhead; justified at >3 models or >50 features.
Outcome Target: Reusable, versioned features with online/offline parity by design.
Implementation Priorities¶
Execution order based on value and dependencies:
- Phase 4 completion (Current): Grafana + Evidently + alerting
- Quality hardening: GE Data Docs in CI, automated promotion gates
- Phase 5: MLOps hardening, load test validation
- Phase 6: Hydra config management, Evidently drift detection, Feast feature store
- Advanced features: A/B testing, batch HTTP endpoint, multi-model registry
Roadmap Adjustments¶
What Changed vs. Original Plan¶
Originally marked as Pending: - Inference layer (sync + async) - Prometheus metrics export - Test suite structure
Actual Reality (Feb 27, 2026):
- Phase 3 is ~90% complete: sync + async predict, Prometheus metrics, Streamlit UI all operational
- ~200 tests across unit/property/service/contract/load
- Great Expectations running as mandatory DVC pipeline stages
- Model registration automated via register_model DVC stage
Lesson: Incremental documentation updates lagged behind implementation. All status claims are now verified against code evidence.
Non-goals¶
The following are intentionally out of scope:
- Commercial betting optimization
- Financial profitability claims
- User growth or monetization features
- Real-time (sub-second) predictions
- Multi-sport expansion
Success Criteria¶
Each phase is considered complete when:
- Code works: Feature is implemented and tested
- Docs updated: Reflects actual capability
- Reproducible: Another engineer can run it
- Observable: Logs, metrics, or checks exist
Guiding principle¶
Each roadmap step is evaluated by a single question:
Can this system be reliably reproduced, operated, and debugged by someone else?
If the answer is "no", the work is not considered complete.
References¶
- Implementation Status - Detailed component matrix
- ADRs - Architectural decisions and rationale
- DEMO Guide - 5-minute walkthrough