Model Serving Overview¶
Status: 🚧 In Development (Infrastructure ready, endpoint integration in progress)
The serving layer is responsible for exposing trained models as reliable, observable, and scalable services.
Current State¶
✅ Implemented¶
- FastAPI application scaffold
- Health check endpoints (
/healthcheck/) - Docker images for API and Celery workers
- Kubernetes manifests (Deployments, Services, ConfigMaps)
- Helm charts withconfigurable values
- Gunicorn + Uvicorn production setup
- Celery worker infrastructure (RabbitMQ + Redis)
🚧 In Development¶
POST /predictendpoint implementation- Model loading from MLflow
- Request/response Pydantic schemas
- Input validation logic
📋 Planned¶
- Async inference via Celery
- Batch prediction API
- Model A/B testing
- Prediction caching
Design Goals¶
In Time2Bet, serving is designed to:
- consume models from the MLflow Model Registry,
- provide low-latency synchronous predictions,
- support asynchronous inference for heavier workloads,
- enforce explicit API and model contracts,
- integrate tightly with monitoring and alerting.
Serving is treated as a first-class production system, not as an afterthought to model training.