Hyperparameter Tuning¶

Purpose¶

Document why tuning is part of this system, what is tuned vs. what is fixed, how the search uses time-aware CV to prevent leakage, and how the best parameters flow into final training.

Why tuning is necessary¶

XGBoost's performance is sensitive to regularisation strength, tree depth, and learning rate. The correct values depend on dataset size, feature count, and class imbalance — all of which vary between experiments as window sizes, feature families, and data cutoffs change. Manual defaults produce a suboptimal model; systematic search with a correct evaluation protocol produces a model that generalises better to future matches.

Tuning is constrained by the same temporal validation rules as training. A search that uses random CV or a non-temporal holdout will overfit to temporal artifacts.

What is tuned¶

The following XGBoost hyperparameters are searched by Optuna:

Parameter	Search space
`n_estimators`	int, 100–600, step 50
`max_depth`	int, 3–8
`learning_rate`	float, 1e-3–0.2 (log scale)
`subsample`	float, 0.5–1.0
`colsample_bytree`	float, 0.5–1.0
`min_child_weight`	int, 1–20
`reg_alpha`	float, 1e-4–10.0 (log scale)
`reg_lambda`	float, 1e-4–10.0 (log scale)

The number of trials is controlled by params.yaml → tuning.n_trials (default: 20).

What is fixed (not tuned)¶

Feature families and window sizes — controlled by features.* in params.yaml
Categorical feature handling — cat_cols passed as a fixed column list
Class weighting — fixed in the training wrapper to address imbalance
Preprocessing (imputation strategy) — fixed as median imputation
Random seed — fixed at 42 for reproducibility across all trials

Tuning protocol: time-aware CV¶

Each Optuna trial is evaluated using the same walk-forward CV folds produced by the split_data stage (data/splits/folds.parquet). For each fold:

XGBoost is fit on the training window.
Log-loss is computed on the validation window.
Mean log-loss across all folds is returned as the objective value.

This is the same evaluation protocol used in classification_models. Using any other evaluation strategy (e.g., random holdout) during tuning would leak temporal information and invalidate the search results.

MLflow logging¶

Each Optuna trial is logged as a nested MLflow run under a parent run named xgb_tuning_frac-{frac}. Each trial child run records:

XGBoost parameters for that trial (xgb.*)
Mean CV log-loss (cv.logloss_mean)
Trial number

The parent run records the best parameters (best.*) and the best CV log-loss at the end of the study. This makes the full trial history visible in the MLflow UI alongside standard training runs.

How best parameters flow into final training¶

After tuning completes, best parameters are written to data/models/xgb_best_params.json (DVC-tracked). The final_train DVC stage reads this file and applies the parameters to the full training run:

[tune_xgb] → data/models/xgb_best_params.json
                        ↓
                  [final_train]

If xgb_best_params.json is not present or empty, final_train falls back to default hyperparameters and logs a warning. This allows final_train to run even when tune_xgb is skipped.

Reproducibility¶

Optuna TPESampler is seeded with seed=42 (hardcoded in the study).
n_trials and frac are tracked as DVC parameters — changing either triggers a re-run.
Best parameters are DVC-tracked artifacts — downstream stages re-run if they change.

Implementation status¶

Aspect	Status
Optuna study with TPE sampler	✅ Implemented — `src/models/tuning.py`
Walk-forward CV objective	✅ Implemented
Nested MLflow trial logging	✅ Implemented
Best params to `xgb_best_params.json`	✅ Implemented
DVC stage `tune_xgb`	✅ Implemented
Tuning report	📋 Planned — `reports/qmd/08_hyperparameter_tuning.html`

Validation Strategy — CV fold protocol
Training Pipeline — stage dependencies
MLflow — trial run structure
Evidence