Skip to content

System Context (C4 — Level 1)

This diagram defines the system boundary, external actors, and integrations. For the detailed boundary analysis see System Boundary. For physical deployment topology see Deployment View.


Context Diagram

flowchart TB subgraph External[External — outside system boundary] User[End User / Viewer] Source[WhoScored.com\nexternal data provider] Selenoid["Selenoid Server\nexternal host — browser automation"] StreamlitUI["Time2Bet Web UI\nStreamlit — external VPS"] CI["GitLab CI/CD\nbuild · test · deploy"] end subgraph Offline[Offline execution context] DVC[DVC Pipeline\ntraining · validation · registration] end subgraph System[SoccerPredictAI — K8s cluster — healserver] API[FastAPI Inference API] Airflow[Airflow ETL Scheduler] Workers[Celery Workers] DB[PostgreSQL] S3[MinIO S3] MLflow[MLflow Registry] Prom[Prometheus] end User -->|views predictions| StreamlitUI StreamlitUI -->|HTTPS /predict| API Airflow -->|HTTP trigger scraping| API API -->|enqueue Celery task| Workers Workers -->|browser session| Selenoid Source -->|scraped via browser| Selenoid Workers -->|normalized data| DB DB -->|raw parquet export| S3 S3 -->|versioned data| DVC DVC -->|model artifacts + metrics| MLflow API -->|load model_uri| MLflow API --> Prom Workers --> Prom CI -->|Helm deploy| System

Actors and Roles

End User / Viewer

Consumes match outcome predictions via the Streamlit web interface hosted on an independent external VPS (time2bet.ru). Has no direct access to internal cluster services.

System Operator

Deploys, monitors, and maintains the system. Has SSH access to healserver, access to GitLab CI protected variables, and the age private key. The only human actor with direct cluster access.

WhoScored.com

Third-party source of football match statistics. Treated as an untrusted external input — all data is validated via Great Expectations before use. Subject to layout changes, rate limiting, and availability issues outside operator control.

Selenoid Server

Dedicated external host running a Selenoid browser grid. Invoked by celery-worker-api to perform headless browser scraping against WhoScored. Operator-managed, but runs outside the Kubernetes cluster — a separate operational boundary.

Time2Bet Web UI (Streamlit)

User-facing prediction frontend hosted on an external VPS. Calls the inference API over public HTTPS. Outside the system boundary; dependent on API availability.

DVC Pipeline (Offline Execution Context)

The ML training pipeline. Runs outside the K8s cluster — locally or in CI — against MinIO (data) and MLflow (model artifacts). Not a runtime component; produces the versioned model artifacts that the serving layer consumes.

GitLab CI/CD

Manages build, test, and deployment pipelines. Part of the delivery boundary: pushes Helm deployments to the cluster and handles secret decryption during the deploy phase. Does not participate in the runtime execution path.


System Responsibilities

SoccerPredictAI is responsible for:

  • scraping and ingesting match data from WhoScored via Selenoid,
  • normalizing and storing structured data in PostgreSQL,
  • exporting versioned datasets to MinIO via DVC,
  • training match outcome prediction models reproducibly,
  • tracking experiments and managing model lifecycle in MLflow,
  • serving predictions synchronously and asynchronously via FastAPI + Celery,
  • exposing service health and Prometheus metrics for observability.

Non-Goals

  • The system does not guarantee betting profitability.
  • It is not a general sports analytics platform.
  • It does not support multiple data providers or sports.
  • It does not provide user authentication or multi-tenant access control.