Skip to content

Code Structure

Directory layout, naming conventions, and where to find what.

Top-Level Layout

soccer/
├── src/                   # All production Python code
│   ├── data/              # Data access, schemas, splits, storage abstractions
│   ├── features/          # Feature engineering (pure, deterministic)
│   ├── models/            # Models, losses, metrics — NO IO
│   ├── pipelines/         # DVC / CLI orchestration entrypoints
│   └── app/               # FastAPI service layer
│       └── tasks/         # Celery async jobs
├── airflow/
│   └── dags/              # Scheduled production pipelines
├── configs/               # Hydra configuration files
├── data/                  # DVC-versioned datasets & artifacts
│   ├── raw/
│   ├── processed/
│   ├── features/
│   ├── splits/
│   ├── models/
│   └── predictions/
├── docs/                  # MkDocs documentation
├── reports/               # Quarto reports (EDA, evaluation) — not production
├── tests/                 # pytest test suite
├── docker/                # Dockerfiles per service
├── k8s/                   # Kubernetes manifests / Helm charts
├── dvc.yaml               # DVC pipeline definition
├── params.yaml            # DVC / Hydra parameters
└── pyproject.toml         # Project metadata and tool configuration

src/ Layer Rules

Layer Allowed Forbidden
src/data/ DB access, MinIO, schema validation ML logic, feature code
src/features/ Pure feature transforms IO, model calls
src/models/ Model classes, metrics, losses IO of any kind
src/pipelines/ Orchestration, CLI entrypoints Business logic
src/app/ FastAPI routers, dependency injection Training, feature engineering
src/app/tasks/ Celery task definitions Inline ML logic

Naming Conventions

  • Files: snake_case.py
  • Classes: PascalCase
  • Functions / variables: snake_case
  • Constants: UPPER_SNAKE_CASE
  • Hydra configs: snake_case.yaml

Configuration

  • All parameters in params.yaml (DVC) and configs/ (Hydra)
  • No hardcoded paths, seeds, or credentials anywhere in src/
  • Secrets via SOPS + age (see Security)

Tests

tests/
├── unit/          # Pure function tests (fast, no IO)
├── integration/   # Tests requiring DB or external services
└── conftest.py    # Shared fixtures

Run all tests: pytest tests/

See Testing Strategy for full policy.