Skip to content

Environments & Dependency Strategy

SoccerPredictAI uses a layered dependency approach to maximize reproducibility and eliminate "works on my machine" issues across dev, CI, training, and production.


Dependency Layering Strategy

Layer 1 — System + Python runtime
    conda / mamba (environment.yml)
    → exported to requirements-mamba-base.txt

Layer 2 — Python application dependencies
    PDM groups: api / ml / dev / prod
    → exported per group to requirements-pdm-*.txt

Layer 3 — Final pinned artifacts
    Merged into requirements-*.txt
    → used for deterministic Docker builds

Why this design? - conda handles system-level and compiled library dependencies reliably. - PDM provides modern dependency resolution and group-based separation (api/ml/dev). - Exporting to pinned requirements-*.txt ensures Docker images are reproducible and auditable without requiring conda in the container build chain.


Environment Matrix

Environment Purpose Dependency anchor Python version Activation
Local development Code authoring, debugging, test runs conda env + pdm install --dev 3.13 (from environment.yml) conda activate soccer
CI (GitLab) Lint, test, build, deploy pdm install from pdm.lock 3.13 (pinned in CI image) CI runner environment
Offline ML training dvc repro, experiment runs requirements-ml.txt (pinned) 3.13 conda or Docker container
Deployed runtime (API) FastAPI + Celery workers serving predictions requirements-prod.txt (pinned) 3.13 K8s pod from Docker image
Docs / reporting MkDocs build, Quarto reports requirements-dev.txt subset 3.13 Local dev env

Reproducibility Anchors

Every deployed model and dataset is traceable to four anchors:

Anchor What it pins
git commit Code version
pdm.lock All Python dependency versions
DVC content hash Exact dataset version used for training
MLflow run ID All training parameters, metrics, and artifacts

A deployment is fully reproducible when all four anchors are recorded. Deployment manifests in k8s/ reference the Docker image tag, which maps to a specific git commit and pdm.lock.


Dependency Groups (PDM)

Group Contents Used by
api FastAPI, Pydantic, Celery, Redis client API Docker image
ml scikit-learn, XGBoost, Optuna, MLflow, DVC Training pipeline Docker image / local
dev pytest, hypothesis, ruff, pre-commit, mypy CI + local development
prod Combined api + ml for production deployment Production Docker image

How to Rebuild Pinned Requirements

make requirements

This regenerates: - PDM exports per group (requirements-pdm-*.txt) - Base pip freeze from conda env (requirements-mamba-base.txt) - Merged final requirements-*.txt for Docker builds

Run this whenever pdm.lock or environment.yml changes and before building new Docker images.


Operational Note

The system treats pdm.lock and DVC content hashes as the primary reproducibility anchors. All production deployments should be traceable to:

  • git commit
  • dataset version (DVC hash)
  • model version (MLflow run ID + registered version)
  • dependency lock (pdm.lock)

No deployment should be performed from an environment where any of these anchors is unresolved.