Skip to content

Deployment & Runtime Architecture¶

Platform¶

Serving components are deployed on Kubernetes using Helm for configuration and templating.

Deployed components¶

FastAPI inference service,
Celery worker deployment,
RabbitMQ message broker,
Redis cache (optional),
Prometheus scraping targets.

Configuration management¶

Runtime configuration is provided via: - environment variables, - Helm values, - Kubernetes secrets (decrypted at deploy time).

No secrets are baked into images.

Model loading strategy¶

models are loaded from MLflow via model_uri,
startup fails fast if the model is unavailable,
model version is logged on startup.

This ensures: - explicit dependency on registry availability, - clear observability of active model version.

Scaling strategy¶

API scaled horizontally based on request load,
workers scaled based on queue depth,
scaling policies are independent per component.

Failure handling¶

readiness probes block traffic to unhealthy pods,
crash loops surface immediately via alerts,
rollback is performed by switching model version or Helm release. ```