Configuration Reference (Hydra)¶

Time2Bet uses Hydra to manage configuration for: - data paths and dataset selection, - feature and training parameters, - evaluation behavior, - model registration settings, - environment-dependent overrides.

Hydra configuration is treated as part of the reproducibility story: every run logs its full resolved config.

Config structure (recommended)¶

Hydra is typically structured as:

configs/
config.yaml (root/defaults)
data/ (dataset sources, versions, filters)
features/ (feature flags, windows)
model/ (model type + hyperparams)
train/ (splits, seeds, CV strategy)
eval/ (metrics, reports)
registry/ (MLflow model name, aliases)
env/ (dev/staging/prod overrides)

Key principles¶

Defaults define a safe baseline.
Overrides are explicit and traceable.
Environment-specific config is isolated (no hidden branching logic).

Common usage patterns¶

Run with defaults¶

python -m src.pipelines.train
````

### Override model type

```bash
python -m src.pipelines.train model=xgboost

Override train seed and split date¶

python -m src.pipelines.train train.seed=42 train.cutoff_date=2026-01-01

Environment override¶

python -m src.pipelines.train env=prod

What gets logged to MLflow¶

Each run logs:

resolved Hydra config (as YAML artifact)
git commit hash
DVC dataset revision
parameters and metrics

This enables full traceability.

ML → Training Pipeline
ML → Model Registry
Data → Dataset Versioning