Serving Deployment¶
This page covers the serving-specific deployment structure: runtime components, configuration, model loading, and current operational state.
For the full physical topology and traffic routing, see Architecture: Deployment View.
Serving components¶
All serving components run in the soccer-api Kubernetes namespace
on the single-node healserver cluster.
| Component | Role | Current state |
|---|---|---|
FastAPI (worker-api pod) |
HTTP inference service | 2 pods via Deployment |
celery-worker-ml |
Executes predict_match tasks from the ml queue |
2 pods via Deployment |
| RabbitMQ | Message broker for Celery task queues | Single broker pod |
| Redis | Prediction cache + Celery result backend | Single pod |
| Helm chart | All of the above managed via k8s/helm/soccer-api/ |
Parameterized |
| HPA | Horizontal pod autoscaler | Deployed (queue-depth signal) |
Traffic path¶
Internet
→ host-level Nginx (TLS termination, port 443)
→ K8s NodePort 31390
→ Nginx Ingress Controller (ingress-nginx namespace)
→ FastAPI service (soccer-api namespace)
→ RabbitMQ → celery-worker-ml
The Streamlit UI (external VPS time2bet.ru) also routes prediction requests
through this path over public HTTPS.
Configuration and secrets¶
Runtime configuration is provided via:
- Helm values (
k8s/helm/soccer-api/values.yaml) - Kubernetes ConfigMaps (non-sensitive settings)
- Kubernetes Secrets (credentials, decrypted from SOPS-encrypted
values-*.enc.yamlat deploy time)
No secrets are baked into Docker images. The age private key used for SOPS decryption
is stored as a protected CI variable.
Model loading¶
PredictionService in src/app/services/predict.py loads the model lazily
on first inference request in each worker process:
model_nameandstagecome from application settings (settings.mlflow.*) for the serving layer. The pipeline registers the model using values fromparams.yaml → register_model.*.- The model is loaded once per
celery-worker-mlprocess via theworker_process_initsignal. - Subsequent tasks within the same worker process reuse the loaded model — no per-request reload.
- If the MLflow Registry is unreachable at load time, the worker fails to start and the pod restarts.
The model serving any request is always a registered MLflow artifact. No local file paths are used. See ML: Model Registry.
Current vs target scaling¶
| Aspect | Current state | Notes |
|---|---|---|
| API pods | 2 (Deployment) | HPA deployed; scales on request load |
| ML worker pods | 2 (Deployment) | HPA scales on queue depth |
| RabbitMQ | Single broker, no clustering | Single point of failure for inference path |
| Redis | Single pod | Cache miss on pod restart; inference continues |
| Node | Single-node K8s | Node failure = full service outage |
The single-node, single-broker constraints are documented tradeoffs for portfolio scope. See Architecture: Deployment View.
Rollback¶
Model-level rollback: update the Production alias in the MLflow Registry to the previous version.
Running workers detect the alias change on next process startup.
Helm-level rollback: helm rollback soccer-api <revision> — restores the previous
Kubernetes resource state including image tags and config.