Skip to content

Deployment View

This page describes where the system runs, how components are physically distributed, and how traffic flows from the internet to individual services.


Physical Topology

flowchart TB subgraph Internet[Public Internet] User[End User] end subgraph ExtVPS[External VPS — time2bet.ru] StreamlitUI[Streamlit Web UI] end subgraph ExtSelenoid[External Host — Selenoid] SelenoidGrid[Selenoid Browser Grid] end subgraph HealServer[VPS — healserver — single node] HostNginx[Host-level Nginx\nTLS termination — port 443] subgraph K8s[Kubernetes — single-node cluster] subgraph NS_Ingress[namespace: ingress-nginx] Ingress[Nginx Ingress Controller\nNodePort 31390] end subgraph NS_DS[namespace: ds] Airflow[Airflow\nScheduler + Workers] PG[PostgreSQL] MinIO[MinIO S3] MLflow[MLflow\nTracking + Registry] Prom[Prometheus] Graf[Grafana\n📋 dashboards planned] end subgraph NS_Soccer[namespace: soccer-api] API[FastAPI\nInference Service] MQ[RabbitMQ] WorkerAPI[Celery worker-api] WorkerML[Celery worker-ml] Redis[Redis Cache] end subgraph NS_Mon[namespace: monitoring] KSM[kube-state-metrics] NE[node-exporter] end end end User -->|HTTPS| ExtVPS ExtVPS -->|HTTPS /predict| HostNginx User -->|HTTPS /predict direct| HostNginx HostNginx -->|NodePort 31390| Ingress Ingress -->|/predict, /healthcheck, /metrics| API API -->|enqueue task| MQ MQ --> WorkerAPI MQ --> WorkerML WorkerAPI -->|browser session| SelenoidGrid WorkerAPI --> PG WorkerAPI --> Redis WorkerML --> Redis PG --> MinIO MinIO -.->|dvc pull| MLpipeline[Offline ML Pipeline\nCI / local] MLpipeline --> MLflow API -->|model_uri| MLflow KSM --> Prom NE --> Prom API --> Prom WorkerAPI --> Prom WorkerML --> Prom Prom --> Graf

Namespace Layout

Namespace Services Purpose
ingress-nginx Nginx Ingress Controller Routes inbound traffic to cluster services by hostname/path
ds Airflow, PostgreSQL, MinIO, MLflow, Prometheus, Grafana Data platform and ML infrastructure
soccer-api FastAPI, RabbitMQ, Celery worker-api, Celery worker-ml, Redis Inference service and async task infrastructure
monitoring kube-state-metrics, node-exporter K8s cluster and host-level metrics

Ingress Path

Traffic from the public internet follows this path:

Internet
  → host-level Nginx (port 443, TLS termination, VPS)
    → K8s NodePort 31390
      → Nginx Ingress Controller (namespace: ingress-nginx)
        → FastAPI service (namespace: soccer-api)

Key notes: - TLS is terminated at the host-level Nginx, which acts as a reverse proxy to the K8s NodePort. - The Ingress Controller routes requests to services by hostname and path prefix. - No service in ds or monitoring is publicly exposed; internal-cluster access only.


External Services

Service Host Role K8s integration
Selenoid Browser Grid Dedicated external host Headless Chrome sessions for WhoScored scraping Called by celery-worker-api over HTTP; not inside K8s cluster
Streamlit Web UI External VPS (time2bet.ru) User-facing prediction interface Calls FastAPI over public HTTPS; no direct cluster access
GitLab CI/CD GitLab.com SaaS Build, test, and deploy pipeline Pushes Helm charts and secrets to healserver via SSH

Helm Chart Structure

All Kubernetes resources are managed via Helm charts in k8s/helm/.

k8s/helm/
  soccer-api/        — FastAPI + Celery + RabbitMQ + Redis
  airflow/           — Airflow deployment (custom values)
  monitoring/        — Prometheus + Grafana + exporters
  ...

Secrets are provided as SOPS-encrypted Helm values files (values-*.enc.yaml). CI decrypts them at deploy time using the age private key from a protected CI variable.


Deployment Constraints

These constraints are architectural facts, not temporary limitations:

Constraint Architectural consequence
Single-node Kubernetes No pod rescheduling across nodes; node failure is a full-service outage. Designed for portfolio/demo scale.
No High Availability No replicated control plane; no multi-node worker pool. Accepted tradeoff against infrastructure cost.
Self-hosted VPS Full operational responsibility: K8s upgrades, disk management, TLS renewal, backup.
External Selenoid host Browser automation is outside the cluster network boundary; an independent failure domain not covered by K8s health probes.
Single RabbitMQ broker Message queue is a single point of failure for the inference path. Acceptable at current throughput.

These constraints are documented explicitly because they affect reasoning about failure modes, scaling, and future migration.


Known Limitations

Limitation Impact Mitigation
Single-node K8s cluster No HA; node failure = full outage Manual recovery via runbook; acceptable for portfolio scope
No cluster autoscaling Cannot scale under load Workload is light; manual scaling if needed
Selenoid runs outside K8s Separate ops boundary; no K8s health probes Monitored externally; scraping failures surface via Airflow
Single RabbitMQ broker No message queue HA Acceptable at current throughput; documented as known limit
No automated certificate renewal (if LE not configured) TLS certificate expiry Operator runbook; or Let's Encrypt with certbot/cert-manager

Portability Note

The Helm charts are parameterized with no hardcoded values specific to healserver. Migration to a managed Kubernetes cluster (GKE, EKS, AKS) requires:

  1. Update DNS and TLS entries in chart values.
  2. Replace MinIO with cloud object storage (update DVC remote config).
  3. Replace self-managed PostgreSQL with a managed instance if desired.
  4. Re-encrypt SOPS secrets with an updated age key.

No code changes are required.