Live Demo — Production Inference¶

This page demonstrates real production inference using the deployed Time2Bet system.

The demo is intended to show: - how the trained model is exposed as a service, - how predictions are generated for real matches, - and how the serving layer behaves in production.

Demo URL¶

👉 http://time2bet.ru

The demo uses the same inference service that is described in the Serving and Monitoring sections of the documentation.

What the demo represents¶

The live demo showcases only the serving and monitoring layers of the system.

Specifically, it demonstrates:

model loading via MLflow Model Registry,
online feature preparation,
synchronous and/or asynchronous inference,
request validation and schema enforcement,
latency-aware API behavior,
production monitoring and logging.

⚠️ The demo does not demonstrate reproducibility. Reproducibility is covered by the Quickstart (Golden Path).

How predictions are generated¶

A user requests a prediction for an upcoming football match.
The request is validated by the FastAPI layer.
The inference service:
loads the active model via a stable model_uri,
applies the same feature logic used during training,
produces a prediction.
The result is returned to the user and logged for monitoring.

Operational characteristics¶

Inference mode: synchronous (low-latency) and asynchronous (background jobs)
Deployment: Kubernetes + Helm
Observability:
HTTP and service metrics via Prometheus
dashboards in Grafana
data and model drift monitoring via Evidently

Limitations¶

The live demo is intentionally limited:

it does not guarantee prediction accuracy,
it does not provide betting advice,
it may use rate limiting or reduced capacity.

The goal is to demonstrate system design, not business performance.

👉 Quickstart – Reproducible Golden Path
👉 Serving / Inference API Contract
👉 Monitoring / Dashboards