Skip to content

Model Serving Overview¶

Status: 🚧 In Development (Infrastructure ready, endpoint integration in progress)

The serving layer is responsible for exposing trained models as reliable, observable, and scalable services.

Current State¶

✅ Implemented¶

FastAPI application scaffold
Health check endpoints (/healthcheck/)
Docker images for API and Celery workers
Kubernetes manifests (Deployments, Services, ConfigMaps)
Helm charts withconfigurable values
Gunicorn + Uvicorn production setup
Celery worker infrastructure (RabbitMQ + Redis)

🚧 In Development¶

POST /predict endpoint implementation
Model loading from MLflow
Request/response Pydantic schemas
Input validation logic

📋 Planned¶

Async inference via Celery
Batch prediction API
Model A/B testing
Prediction caching

Design Goals¶

In Time2Bet, serving is designed to:

consume models from the MLflow Model Registry,
provide low-latency synchronous predictions,
support asynchronous inference for heavier workloads,
enforce explicit API and model contracts,
integrate tightly with monitoring and alerting.

Serving is treated as a first-class production system, not as an afterthought to model training.