Environments & Dependency Strategy¶
SoccerPredictAI uses a layered dependency approach to maximize reproducibility and eliminate "works on my machine" issues across dev, CI, training, and production.
Dependency Layering Strategy¶
Layer 1 — System + Python runtime
conda / mamba (environment.yml)
→ exported to requirements-mamba-base.txt
Layer 2 — Python application dependencies
PDM groups: api / ml / dev / prod
→ exported per group to requirements-pdm-*.txt
Layer 3 — Final pinned artifacts
Merged into requirements-*.txt
→ used for deterministic Docker builds
Why this design?
- conda handles system-level and compiled library dependencies reliably.
- PDM provides modern dependency resolution and group-based separation (api/ml/dev).
- Exporting to pinned requirements-*.txt ensures Docker images are reproducible and auditable without requiring conda in the container build chain.
Environment Matrix¶
| Environment | Purpose | Dependency anchor | Python version | Activation |
|---|---|---|---|---|
| Local development | Code authoring, debugging, test runs | conda env + pdm install --dev |
3.13 (from environment.yml) |
conda activate soccer |
| CI (GitLab) | Lint, test, build, deploy | pdm install from pdm.lock |
3.13 (pinned in CI image) | CI runner environment |
| Offline ML training | dvc repro, experiment runs |
requirements-ml.txt (pinned) |
3.13 | conda or Docker container |
| Deployed runtime (API) | FastAPI + Celery workers serving predictions | requirements-prod.txt (pinned) |
3.13 | K8s pod from Docker image |
| Docs / reporting | MkDocs build, Quarto reports | requirements-dev.txt subset |
3.13 | Local dev env |
Reproducibility Anchors¶
Every deployed model and dataset is traceable to four anchors:
| Anchor | What it pins |
|---|---|
git commit |
Code version |
pdm.lock |
All Python dependency versions |
| DVC content hash | Exact dataset version used for training |
| MLflow run ID | All training parameters, metrics, and artifacts |
A deployment is fully reproducible when all four anchors are recorded.
Deployment manifests in k8s/ reference the Docker image tag, which maps to a specific git commit and pdm.lock.
Dependency Groups (PDM)¶
| Group | Contents | Used by |
|---|---|---|
api |
FastAPI, Pydantic, Celery, Redis client | API Docker image |
ml |
scikit-learn, XGBoost, Optuna, MLflow, DVC | Training pipeline Docker image / local |
dev |
pytest, hypothesis, ruff, pre-commit, mypy | CI + local development |
prod |
Combined api + ml for production deployment | Production Docker image |
How to Rebuild Pinned Requirements¶
This regenerates:
- PDM exports per group (requirements-pdm-*.txt)
- Base pip freeze from conda env (requirements-mamba-base.txt)
- Merged final requirements-*.txt for Docker builds
Run this whenever pdm.lock or environment.yml changes and before building new Docker images.
Operational Note¶
The system treats pdm.lock and DVC content hashes as the primary reproducibility anchors.
All production deployments should be traceable to:
- git commit
- dataset version (DVC hash)
- model version (MLflow run ID + registered version)
- dependency lock (
pdm.lock)
No deployment should be performed from an environment where any of these anchors is unresolved.
Related¶
- Architecture Principles — Reproducibility First
- Deployment View — how Docker images are deployed
- Trade-offs — DVC + MinIO
- Security — how secrets are injected at runtime (not baked into images)