Architecture Principles¶
This page documents the guiding principles behind the SoccerPredictAI architecture. Each principle is a deliberate design choice with direct consequences on how the system is built and operated.
1. Reproducibility First¶
Statement: Any historical model must be reproducible from a clean checkout.
Anchors: pdm.lock (dependencies) + DVC content-addressed storage (data + model artifacts) + MLflow run metadata (training parameters + metrics).
Consequences:
- All randomness is explicitly seeded.
- No path, parameter, or version is hardcoded in code — all come from params.yaml or Hydra configs.
- dvc repro from any git commit and matching dataset version must yield the same artifacts.
- Docker images pin dependency hashes, not floating ranges.
Where this appears: Environments, Data & ML Flow, Trade-offs — DVC + MinIO
2. Explicit Contracts at Boundaries¶
Statement: Every boundary between subsystems has a formal, validated contract. No contract = no boundary.
Contracts in this system:
- data/raw → Great Expectations suite (validate_raw)
- data/interim → Great Expectations suite (validate_finished, validate_future)
- data/features → Great Expectations suite (validate_features)
- Training output → MLflow pyfunc model signature (input + output schema)
- API → Pydantic PredictRequest / PredictResponse schemas
Consequences: - Broken data fails fast at the validation gate, not silently downstream. - Model serving rejects malformed inputs at schema validation, not at inference time. - Contract files are versioned alongside code.
Where this appears: Component View, Data & ML Flow
3. Architecture over Implementation Details¶
Statement: Architecture documentation describes structure, decisions, and constraints — not
algorithm configurations, model hyperparameters, or performance thresholds.
Implementation specifics belong in implementation-level docs (docs/ml/, docs/reference/), not architectural views.
Design distinction:
| Concept | Meaning | Examples in this system |
|---|---|---|
| Architectural invariant | A property that must hold regardless of how implementation evolves | All models loaded via MLflow Registry; feature logic shared between offline and online paths |
| Implementation optimization | A decision that improves performance but is replaceable | Redis caching strategy; specific calibration method; exact window sizes |
This distinction keeps architecture stable across implementation changes and prevents docs from degrading into snapshots of the current configuration.
Consequences:
- Architecture docs remain useful across code refactors and parameter tuning.
- Reviewers can distinguish what is structurally binding from what is operationally tunable.
- Status labels (Implemented, Partial, Planned) apply to architectural elements, not to config values.
Where this appears: This distinction is applied throughout all architecture pages.
4. Separate Offline and Online Concerns¶
Statement: The offline training pipeline (DVC) and the online serving path (FastAPI + Celery) are independent execution environments that share contracts and logic, but not runtime infrastructure.
Consequences:
- DVC stages never import or call FastAPI/Celery code.
- Serving code never triggers DVC stages.
- Shared logic (feature engineering functions) lives in src/features/ and is imported by both, but the execution paths are separate.
- Model promotion is the explicit handoff point between offline and online.
Where this appears: Data & ML Flow, Container View, Runtime View
5. Prefer Operational Clarity Over Platform Sprawl¶
Statement: Choose the right tool per job. Do not add infrastructure that adds complexity without proportionate benefit at current scale.
Consequences: - Airflow for scheduling (calendar-driven); DVC for ML pipelines (artifact-driven). Not both for the same job. - Celery + RabbitMQ for async tasks; Kafka is not justified at current throughput. - MinIO provides S3-compatible storage on-prem; no dependency on AWS. - Monitoring via Prometheus + Grafana (standard stack), not a proprietary SaaS.
Where this appears: Trade-offs, Deployment View
6. Single Source of Truth per Responsibility¶
Statement: Each category of data has exactly one authoritative store. No duplication of authority.
| Responsibility | Authoritative store |
|---|---|
| Structured scraped data | PostgreSQL (namespace: ds) |
| Raw and interim datasets | MinIO (via DVC) |
| Model artifacts and runs | MLflow Registry |
| Live prediction/feature cache | Redis (namespace: soccer-api) |
| Configuration and parameters | params.yaml / Hydra configs |
| Secret values | SOPS-encrypted files in git |
Consequences: - No reconciliation problems across stores for the same data type. - Debugging always starts from the same place.
Where this appears: System Boundary, Container View
7. Documentation-First Architecture¶
Statement: Architectural decisions are documented before or alongside implementation, not retrospectively.
Mechanism:
- ADRs in docs/adr/ capture the decision context, alternatives, and consequences.
- This architecture section documents the intended design, with explicit status labels.
- status.md is the authoritative implementation status tracker.
Consequences: - Intent and implementation can diverge; the docs distinguish them honestly. - Reviewers can trace why a decision was made without reading the commit history.
Where this appears: Implementation Status, Trade-offs, ADR Index
8. Honest Current-State Labeling¶
Statement: Every architecturally significant element in documentation carries an explicit status label.
Labels used throughout this documentation:
| Label | Meaning |
|---|---|
| ✅ Implemented | Exists in code, deployed or tested |
| 🚧 Partially implemented | Core exists; some parts missing or manual |
| 📋 Planned | Architecturally designed; not yet built |
Consequences: - Documentation is useful for code review and interviews, not just aspirational. - No component is described as production-ready unless it actually is.
Where this appears: Implementation Status, all architecture pages
9. Security by Design¶
Statement: Secrets are never plaintext in code, config files, Docker images, or CI logs. Access is bounded by namespace isolation and least-privilege service accounts.
Mechanisms:
- SOPS + age encryption for all credentials committed to git.
- K8s namespace isolation: ds, soccer-api, monitoring, ingress-nginx are separate trust zones.
- No plaintext .env files in the repository.
- CI decrypts secrets only in scoped, ephemeral steps.
- Kubernetes secrets are namespace-scoped.
Consequences:
- Secret rotation requires re-encrypting SOPS files and redeploying.
- Operational complexity is slightly higher than plaintext .env, but the security posture is correct.
Where this appears: Security, Environments, Trade-offs — SOPS + age
Related¶
- Requirements — functional and non-functional requirements these principles serve
- Trade-offs — specific decisions where these principles shaped the choice
- ADR Index — formal decision records
- Implementation Status — current vs target state