Skip to content

Hyperparameter Tuning

Purpose

Document why tuning is part of this system, what is tuned vs. what is fixed, how the search uses time-aware CV to prevent leakage, and how the best parameters flow into final training.


Why tuning is necessary

XGBoost's performance is sensitive to regularisation strength, tree depth, and learning rate. The correct values depend on dataset size, feature count, and class imbalance — all of which vary between experiments as window sizes, feature families, and data cutoffs change. Manual defaults produce a suboptimal model; systematic search with a correct evaluation protocol produces a model that generalises better to future matches.

Tuning is constrained by the same temporal validation rules as training. A search that uses random CV or a non-temporal holdout will overfit to temporal artifacts.


What is tuned

The following XGBoost hyperparameters are searched by Optuna:

Parameter Search space
n_estimators int, 100–600, step 50
max_depth int, 3–8
learning_rate float, 1e-3–0.2 (log scale)
subsample float, 0.5–1.0
colsample_bytree float, 0.5–1.0
min_child_weight int, 1–20
reg_alpha float, 1e-4–10.0 (log scale)
reg_lambda float, 1e-4–10.0 (log scale)

The number of trials is controlled by params.yaml → tuning.n_trials (default: 20).

What is fixed (not tuned)

  • Feature families and window sizes — controlled by features.* in params.yaml
  • Categorical feature handling — cat_cols passed as a fixed column list
  • Class weighting — fixed in the training wrapper to address imbalance
  • Preprocessing (imputation strategy) — fixed as median imputation
  • Random seed — fixed at 42 for reproducibility across all trials

Tuning protocol: time-aware CV

Each Optuna trial is evaluated using the same walk-forward CV folds produced by the split_data stage (data/splits/folds.parquet). For each fold:

  1. XGBoost is fit on the training window.
  2. Log-loss is computed on the validation window.
  3. Mean log-loss across all folds is returned as the objective value.

This is the same evaluation protocol used in classification_models. Using any other evaluation strategy (e.g., random holdout) during tuning would leak temporal information and invalidate the search results.


MLflow logging

Each Optuna trial is logged as a nested MLflow run under a parent run named xgb_tuning_frac-{frac}. Each trial child run records:

  • XGBoost parameters for that trial (xgb.*)
  • Mean CV log-loss (cv.logloss_mean)
  • Trial number

The parent run records the best parameters (best.*) and the best CV log-loss at the end of the study. This makes the full trial history visible in the MLflow UI alongside standard training runs.


How best parameters flow into final training

After tuning completes, best parameters are written to data/models/xgb_best_params.json (DVC-tracked). The final_train DVC stage reads this file and applies the parameters to the full training run:

[tune_xgb] → data/models/xgb_best_params.json
                  [final_train]

If xgb_best_params.json is not present or empty, final_train falls back to default hyperparameters and logs a warning. This allows final_train to run even when tune_xgb is skipped.


Reproducibility

  • Optuna TPESampler is seeded with seed=42 (hardcoded in the study).
  • n_trials and frac are tracked as DVC parameters — changing either triggers a re-run.
  • Best parameters are DVC-tracked artifacts — downstream stages re-run if they change.

Implementation status

Aspect Status
Optuna study with TPE sampler ✅ Implemented — src/models/tuning.py
Walk-forward CV objective ✅ Implemented
Nested MLflow trial logging ✅ Implemented
Best params to xgb_best_params.json ✅ Implemented
DVC stage tune_xgb ✅ Implemented
Tuning report 📋 Planned — reports/qmd/08_hyperparameter_tuning.html