Skip to content

Serving Deployment

This page covers the serving-specific deployment structure: runtime components, configuration, model loading, and current operational state.

For the full physical topology and traffic routing, see Architecture: Deployment View.


Serving components

All serving components run in the soccer-api Kubernetes namespace on the single-node healserver cluster.

Component Role Current state
FastAPI (worker-api pod) HTTP inference service 2 pods via Deployment
celery-worker-ml Executes predict_match tasks from the ml queue 2 pods via Deployment
RabbitMQ Message broker for Celery task queues Single broker pod
Redis Prediction cache + Celery result backend Single pod
Helm chart All of the above managed via k8s/helm/soccer-api/ Parameterized
HPA Horizontal pod autoscaler Deployed (queue-depth signal)

Traffic path

Internet
  → host-level Nginx (TLS termination, port 443)
    → K8s NodePort 31390
      → Nginx Ingress Controller (ingress-nginx namespace)
        → FastAPI service (soccer-api namespace)
          → RabbitMQ → celery-worker-ml

The Streamlit UI (external VPS time2bet.ru) also routes prediction requests through this path over public HTTPS.


Configuration and secrets

Runtime configuration is provided via:

  • Helm values (k8s/helm/soccer-api/values.yaml)
  • Kubernetes ConfigMaps (non-sensitive settings)
  • Kubernetes Secrets (credentials, decrypted from SOPS-encrypted values-*.enc.yaml at deploy time)

No secrets are baked into Docker images. The age private key used for SOPS decryption is stored as a protected CI variable.


Model loading

PredictionService in src/app/services/predict.py loads the model lazily on first inference request in each worker process:

mlflow.pyfunc.load_model(f"models:/{model_name}/{stage}")
  • model_name and stage come from application settings (settings.mlflow.*) for the serving layer. The pipeline registers the model using values from params.yaml → register_model.*.
  • The model is loaded once per celery-worker-ml process via the worker_process_init signal.
  • Subsequent tasks within the same worker process reuse the loaded model — no per-request reload.
  • If the MLflow Registry is unreachable at load time, the worker fails to start and the pod restarts.

The model serving any request is always a registered MLflow artifact. No local file paths are used. See ML: Model Registry.


Current vs target scaling

Aspect Current state Notes
API pods 2 (Deployment) HPA deployed; scales on request load
ML worker pods 2 (Deployment) HPA scales on queue depth
RabbitMQ Single broker, no clustering Single point of failure for inference path
Redis Single pod Cache miss on pod restart; inference continues
Node Single-node K8s Node failure = full service outage

The single-node, single-broker constraints are documented tradeoffs for portfolio scope. See Architecture: Deployment View.


Rollback

Model-level rollback: update the Production alias in the MLflow Registry to the previous version. Running workers detect the alias change on next process startup.

Helm-level rollback: helm rollback soccer-api <revision> — restores the previous Kubernetes resource state including image tags and config.