Hyperparameter Tuning¶
Purpose¶
Document why tuning is part of this system, what is tuned vs. what is fixed, how the search uses time-aware CV to prevent leakage, and how the best parameters flow into final training.
Why tuning is necessary¶
XGBoost's performance is sensitive to regularisation strength, tree depth, and learning rate. The correct values depend on dataset size, feature count, and class imbalance — all of which vary between experiments as window sizes, feature families, and data cutoffs change. Manual defaults produce a suboptimal model; systematic search with a correct evaluation protocol produces a model that generalises better to future matches.
Tuning is constrained by the same temporal validation rules as training. A search that uses random CV or a non-temporal holdout will overfit to temporal artifacts.
What is tuned¶
The following XGBoost hyperparameters are searched by Optuna:
| Parameter | Search space |
|---|---|
n_estimators |
int, 100–600, step 50 |
max_depth |
int, 3–8 |
learning_rate |
float, 1e-3–0.2 (log scale) |
subsample |
float, 0.5–1.0 |
colsample_bytree |
float, 0.5–1.0 |
min_child_weight |
int, 1–20 |
reg_alpha |
float, 1e-4–10.0 (log scale) |
reg_lambda |
float, 1e-4–10.0 (log scale) |
The number of trials is controlled by params.yaml → tuning.n_trials (default: 20).
What is fixed (not tuned)¶
- Feature families and window sizes — controlled by
features.*inparams.yaml - Categorical feature handling —
cat_colspassed as a fixed column list - Class weighting — fixed in the training wrapper to address imbalance
- Preprocessing (imputation strategy) — fixed as
medianimputation - Random seed — fixed at 42 for reproducibility across all trials
Tuning protocol: time-aware CV¶
Each Optuna trial is evaluated using the same walk-forward CV folds produced by the split_data
stage (data/splits/folds.parquet). For each fold:
- XGBoost is fit on the training window.
- Log-loss is computed on the validation window.
- Mean log-loss across all folds is returned as the objective value.
This is the same evaluation protocol used in classification_models. Using any other evaluation
strategy (e.g., random holdout) during tuning would leak temporal information and invalidate the
search results.
MLflow logging¶
Each Optuna trial is logged as a nested MLflow run under a parent run named
xgb_tuning_frac-{frac}. Each trial child run records:
- XGBoost parameters for that trial (
xgb.*) - Mean CV log-loss (
cv.logloss_mean) - Trial number
The parent run records the best parameters (best.*) and the best CV log-loss at the end of
the study. This makes the full trial history visible in the MLflow UI alongside standard
training runs.
How best parameters flow into final training¶
After tuning completes, best parameters are written to data/models/xgb_best_params.json
(DVC-tracked). The final_train DVC stage reads this file and applies the parameters to the
full training run:
If xgb_best_params.json is not present or empty, final_train falls back to default
hyperparameters and logs a warning. This allows final_train to run even when tune_xgb
is skipped.
Reproducibility¶
- Optuna
TPESampleris seeded withseed=42(hardcoded in the study). n_trialsandfracare tracked as DVC parameters — changing either triggers a re-run.- Best parameters are DVC-tracked artifacts — downstream stages re-run if they change.
Implementation status¶
| Aspect | Status |
|---|---|
| Optuna study with TPE sampler | ✅ Implemented — src/models/tuning.py |
| Walk-forward CV objective | ✅ Implemented |
| Nested MLflow trial logging | ✅ Implemented |
Best params to xgb_best_params.json |
✅ Implemented |
DVC stage tune_xgb |
✅ Implemented |
| Tuning report | 📋 Planned — reports/qmd/08_hyperparameter_tuning.html |
Related¶
- Validation Strategy — CV fold protocol
- Training Pipeline — stage dependencies
- MLflow — trial run structure
- Evidence