Skip to content

Model Serving Overview

Status: 🚧 In Development (Infrastructure ready, endpoint integration in progress)

The serving layer is responsible for exposing trained models as reliable, observable, and scalable services.


Current State

✅ Implemented

  • FastAPI application scaffold
  • Health check endpoints (/healthcheck/)
  • Docker images for API and Celery workers
  • Kubernetes manifests (Deployments, Services, ConfigMaps)
  • Helm charts withconfigurable values
  • Gunicorn + Uvicorn production setup
  • Celery worker infrastructure (RabbitMQ + Redis)

🚧 In Development

  • POST /predict endpoint implementation
  • Model loading from MLflow
  • Request/response Pydantic schemas
  • Input validation logic

📋 Planned

  • Async inference via Celery
  • Batch prediction API
  • Model A/B testing
  • Prediction caching

Design Goals

In Time2Bet, serving is designed to:

  • consume models from the MLflow Model Registry,
  • provide low-latency synchronous predictions,
  • support asynchronous inference for heavier workloads,
  • enforce explicit API and model contracts,
  • integrate tightly with monitoring and alerting.

Serving is treated as a first-class production system, not as an afterthought to model training.