Skip to content

Container View (C4 — Level 2)

This view shows all deployable containers and services, their responsibilities, and how they communicate. For physical placement see Deployment View. For runtime behavior see Runtime View.


Implementation Status Summary

Container Status
Nginx Ingress Controller ✅ Implemented
Airflow ✅ Implemented
PostgreSQL ✅ Implemented
MinIO S3 ✅ Implemented
MLflow Tracking + Registry ✅ Implemented
Prometheus ✅ Implemented
Grafana 🚧 Deployed; dashboards not yet defined
FastAPI Inference Service ✅ Implemented
RabbitMQ ✅ Implemented
Celery worker-api ✅ Implemented
Celery worker-ml ✅ Implemented
Redis (prediction + feature cache) ✅ Implemented
kube-state-metrics ✅ Implemented
node-exporter ✅ Implemented

Container Diagram

flowchart LR UI["Time2Bet Web UI\n(Streamlit — External VPS)"] Selenoid["Selenoid Server\n(External host — scraping)"] Source[WhoScored.com] subgraph K8s[Kubernetes — single-node — healserver] subgraph ns_ingress[namespace: ingress-nginx] Ingress[Nginx Ingress Controller] end subgraph ns_ds[namespace: ds] Airflow[Airflow Scheduler / Workers] DB[PostgreSQL] S3[MinIO / S3] MLflow[MLflow Tracking and Registry] Prom[Prometheus] Graf["Grafana 🚧"] end subgraph ns_soccer[namespace: soccer-api] API[FastAPI Inference Service] MQ[RabbitMQ] WorkerAPI["Celery worker-api\n(short tasks, scraping, cache ops)"] WorkerML["Celery worker-ml\n(heavy inference, batch features)"] Cache["Redis\n(prediction + feature cache)"] end subgraph ns_monitoring[namespace: monitoring] KSM[kube-state-metrics] NE[node-exporter] end end subgraph ML[ML Engineering — offline / CI] DVC[DVC Pipelines] end UI -->|HTTPS| Ingress Ingress --> API Airflow -->|HTTP trigger| API API -->|Enqueue task| MQ MQ --> WorkerAPI WorkerAPI -->|Browser automation| Selenoid Source -->|scraped via browser| Selenoid WorkerAPI --> DB DB --> S3 S3 --> DVC DVC --> MLflow API -->|Load model_uri| MLflow API --> Cache API -->|Enqueue ML task| MQ MQ --> WorkerML WorkerML --> Cache API --> Prom WorkerAPI --> Prom WorkerML --> Prom KSM --> Prom NE --> Prom Prom --> Graf

Container Responsibilities

Containers are grouped by architectural role: - Ingress routing — inbound traffic management - Data platform — storage, ETL, ML infrastructure (namespace: ds) - Inference service — serving, async tasks, caching (namespace: soccer-api) - Observability — cluster and host metrics (namespace: monitoring) - External — Selenoid, Streamlit UI (outside K8s cluster) - Offline — DVC pipeline (local / CI execution)


Nginx Ingress Controller — namespace: ingress-nginx — ✅ Implemented

  • Handles all inbound traffic forwarded from the host-level Nginx reverse proxy (NodePort 31390).
  • Routes requests to services within the cluster by hostname and path prefix.

Airflow — namespace: ds — ✅ Implemented

  • Schedules and triggers scraping workflows by calling the FastAPI endpoint on a configurable schedule.
  • Orchestrates ETL steps: PostgreSQL export, MinIO upload.
  • DAG-level failure visibility via Airflow UI.

PostgreSQL — namespace: ds — ✅ Implemented

  • Authoritative store for scraped and normalized match data.
  • Source for all raw parquet exports to MinIO.

MinIO S3 — namespace: ds — ✅ Implemented

  • Stores raw parquet exports and ML artifacts (model files, plots, reports).
  • S3-compatible API used as the DVC remote storage backend.

MLflow Tracking and Registry — namespace: ds — ✅ Implemented

  • Tracks experiments: parameters, metrics, and artifacts per run.
  • Manages model versions and the promotion workflow via champion / challenger aliases.
  • FastAPI resolves model_uri from the registry on first inference request per worker.

Prometheus — namespace: ds — ✅ Implemented

  • Scrapes metrics from FastAPI, Celery workers, RabbitMQ, kube-state-metrics, and node-exporter.
  • Metrics cover request rate, latency histograms, error rate, active tasks, and queue depth.

Grafana — namespace: ds — 🚧 Deployed; dashboards not yet defined

  • Pod is running; no dashboard definitions are provisioned yet.
  • See Roadmap for near-term plan.

kube-state-metrics + node-exporter — namespace: monitoring — ✅ Implemented

  • kube-state-metrics: pod status, resource requests/limits, deployment health.
  • node-exporter: host-level CPU, memory, disk, network metrics.

FastAPI Inference Service — namespace: soccer-api — ✅ Implemented

  • Exposes prediction, health, monitoring, livescores, sources, and stats endpoints.
  • Loads model via MLflow model_uri (lazy-loaded once per worker process, resolved via champion alias).
  • Dispatches inference tasks to Celery ml queue; reads from Redis cache before dispatching.
  • Prometheus middleware instruments every request for observability.
  • Sync inference: POST /predict — dispatches to Celery ml queue; blocks until result or timeout.
  • Async inference: POST /predict/async/ — returns task_id; client polls GET /monitoring/task_status/{task_id}.
  • Pre-computed lookup: GET /predict/{match_id} — reads from batch-inference parquet; no model call.

RabbitMQ — namespace: soccer-api — ✅ Implemented

  • Message broker for all Celery tasks.
  • Two logical queues: api (short tasks) and ml (inference tasks).
  • Single broker; no clustering today — acceptable at current throughput.

Celery worker-api — namespace: soccer-api — ✅ Implemented

  • Handles short-latency tasks: scraping orchestration (Selenoid calls), cache operations, request pre-processing.
  • Horizontally scalable.

Celery worker-ml — namespace: soccer-api — ✅ Implemented

  • Handles compute-intensive tasks: online inference, batch feature assembly, scoring pipelines.
  • Runs with higher resource limits than worker-api.
  • Feature assembly reuses src/features/ — same code path as the offline DVC pipeline.

Redis — namespace: soccer-api — ✅ Implemented

Architectural role: Caching optimization layer. Redis reduces redundant inference for repeated queries but is not a required component for correctness — the inference path functions correctly (at higher latency) when Redis is unavailable.

  • Prediction results are cached keyed on a hash of the input; TTL-based expiry.
  • Cache unavailability degrades to cache-miss behavior on every request; inference remains functional.
  • Single Redis instance; no HA today — acceptable given the degraded-not-broken failure mode.
  • See Failure Modes and Trade-offs — Prediction cache.

Observability — Planned

Evidently drift reports are not yet implemented. Planned: offline batch reports from logged prediction data, stored in MinIO, linked from docs.

Grafana dashboards are not yet defined. Planned near-term: inference service, Celery queue, and infrastructure dashboards.