Skip to content

Live Demo — Production Inference

This page demonstrates real production inference using the deployed Time2Bet system.

The demo is intended to show: - how the trained model is exposed as a service, - how predictions are generated for real matches, - and how the serving layer behaves in production.


Demo URL

👉 http://time2bet.ru

The demo uses the same inference service that is described in the Serving and Monitoring sections of the documentation.


What the demo represents

The live demo showcases only the serving and monitoring layers of the system.

Specifically, it demonstrates:

  • model loading via MLflow Model Registry,
  • online feature preparation,
  • synchronous and/or asynchronous inference,
  • request validation and schema enforcement,
  • latency-aware API behavior,
  • production monitoring and logging.

⚠️ The demo does not demonstrate reproducibility. Reproducibility is covered by the Quickstart (Golden Path).


How predictions are generated

  1. A user requests a prediction for an upcoming football match.
  2. The request is validated by the FastAPI layer.
  3. The inference service:
  4. loads the active model via a stable model_uri,
  5. applies the same feature logic used during training,
  6. produces a prediction.
  7. The result is returned to the user and logged for monitoring.

Operational characteristics

  • Inference mode: synchronous (low-latency) and asynchronous (background jobs)
  • Deployment: Kubernetes + Helm
  • Observability:
  • HTTP and service metrics via Prometheus
  • dashboards in Grafana
  • data and model drift monitoring via Evidently

Limitations

The live demo is intentionally limited:

  • it does not guarantee prediction accuracy,
  • it does not provide betting advice,
  • it may use rate limiting or reduced capacity.

The goal is to demonstrate system design, not business performance.


  • 👉 Quickstart – Reproducible Golden Path
  • 👉 Serving / Inference API Contract
  • 👉 Monitoring / Dashboards