System Boundary¶
This page defines what is inside the SoccerPredictAI system, what is outside it, and how the two interact. Understanding the boundary is essential for reasoning about ownership, trust, and failure modes.
What Is Inside the System¶
The runtime system boundary includes all services responsible for prediction serving, data ingestion, model lifecycle, and observability. The offline ML pipeline is part of the system when it produces artifacts consumed at runtime (models in MLflow, data in DVC/MinIO).
Runtime services (Kubernetes — healserver)¶
| Component | Namespace | Responsibility |
|---|---|---|
| Nginx Ingress Controller | ingress-nginx |
Routes inbound HTTPS traffic to internal services |
| Airflow Scheduler + Workers | ds |
Schedules ETL and scraping triggers |
| PostgreSQL | ds |
Authoritative store for normalized scraped data |
| MinIO (S3-compatible) | ds |
DVC remote: raw parquet exports, ML artifacts |
| MLflow Tracking + Registry | ds |
Experiment records, model versions, promotion lifecycle |
| Prometheus | ds |
Metrics collection |
| Grafana | ds |
Dashboards (📋 Planned: dashboards defined) |
| kube-state-metrics | monitoring |
K8s cluster metrics |
| node-exporter | monitoring |
Host-level metrics |
| FastAPI Inference Service | soccer-api |
REST API, sync + async predictions, health + metrics endpoints |
| RabbitMQ | soccer-api |
Message broker for Celery task queues |
| Celery worker-api | soccer-api |
Short tasks: scraping trigger, cache operations, request pre-processing |
| Celery worker-ml | soccer-api |
Heavy tasks: feature assembly at inference, batch scoring |
| Redis | soccer-api |
Prediction and feature vector cache (caching optimization layer) |
Offline execution context¶
| Component | Boundary | Responsibility |
|---|---|---|
| DVC pipeline | Local / CI execution | Reproducible ML pipeline: preprocessing through model registration |
Offline Pipeline Boundary¶
The DVC pipeline occupies a deliberate position in the system boundary: it executes outside the runtime cluster (locally or in CI), but it is part of the system as the authoritative producer of all ML artifacts consumed at runtime.
This is not an omission — it is an explicit architectural decision.
Why DVC is outside the runtime boundary:
- The pipeline is artifact-driven and reproducible, not service-based. It does not run continuously.
- Executing training inside Kubernetes would add operational complexity (GPU scheduling, ephemeral storage, long-running job management) without benefit at the current scale.
- CI execution provides a clean, reproducible environment without cluster-side state entanglement.
Why DVC is still part of the system:
- Every model in the runtime registry was produced by a tracked, versioned DVC run.
- Every dataset consumed by training is content-addressed and reproducible via
dvc checkout. - The DVC pipeline is the explicit handoff point from data to models: it reads from MinIO and writes registered artifacts into MLflow, which the runtime cluster reads.
Architectural consequence:
The boundary crossing happens at the MLflow Registry: DVC pushes a model artifact and assigns
a champion alias; the serving layer loads it. This is the only coupling point between the offline
pipeline and the runtime system. They share no runtime infrastructure, only contracts (model signature,
feature schema, MLflow alias convention).
[DVC pipeline — local/CI]
│
│ writes model artifact + champion alias
▼
[MLflow Registry — runtime cluster]
│
│ model_uri resolved by champion alias
▼
[FastAPI + Celery workers — runtime serving]
Limitation: there is no automated handoff — model promotion is a manual operation today. See Known Architectural Limitations and Roadmap.
What Is Outside the System¶
External runtime dependencies¶
| External component | Owner | Role | Trust level |
|---|---|---|---|
| WhoScored.com | Third party | Source of football match statistics | Untrusted; validated after ingestion |
| Selenoid Server | Operator (external host) | Headless browser grid for scraping; called by celery-worker-api |
Trusted operator; separate ops boundary |
Streamlit Web UI (time2bet.ru) |
Operator (external VPS) | User-facing prediction frontend | Trusted; calls the inference API over HTTPS |
| Host-level Nginx (VPS) | Operator | Reverse proxy in front of K8s NodePort; handles TLS termination | Trusted operator |
Delivery and tooling boundary¶
| External component | Owner | Role | Trust level |
|---|---|---|---|
| GitLab CI/CD | SaaS (GitLab.com) | Build, test, and Helm deployment pipeline | Trusted for delivery; accesses encrypted secrets via protected variables |
GitLab CI/CD is outside the runtime system boundary: it does not participate in normal system operation. It crosses the boundary only during deployment events — at which point it decrypts SOPS-encrypted secrets and pushes Helm releases to the cluster.
External Dependency Trust Model¶
Trust Boundaries¶
Public Internet¶
All requests from the public internet are untrusted by default. - WhoScored.com data is treated as untrusted input; Great Expectations validates it before use. - User requests to the API pass through Nginx TLS termination and Pydantic schema validation.
K8s Cluster Internal¶
Services within the same namespace can communicate freely via cluster DNS. Cross-namespace communication is restricted via Kubernetes NetworkPolicy (where defined). No service inside the cluster exposes a plaintext secret to application code — all secrets are injected via Kubernetes Secrets from SOPS-decrypted manifests.
External Scraping Host (Selenoid)¶
The Selenoid host is operator-controlled but runs outside the K8s network boundary.
Traffic from celery-worker-api to Selenoid crosses the network boundary.
This is an accepted operational dependency; Selenoid unavailability is a known failure mode.
CI/CD Boundary¶
GitLab CI has access to: - the source code repository, - encrypted SOPS secret files (committed to git), - the age private key (stored as a protected CI variable).
CI decrypts secrets only in scoped deployment steps. No secret appears in CI logs (masked variables enforced).
Related¶
- Deployment View — physical topology and namespace layout
- C4 Context Diagram — external actors and system responsibilities
- Security — threat model and secret lifecycle
- Failure Modes — what happens when external dependencies fail