Model Retraining¶
How to trigger, monitor, and validate a model retraining run.
Work in progress
This page is a placeholder. Will be updated after the first automated retraining cycle.
When to Retrain¶
Retraining should be triggered when any of the following conditions are met:
- Scheduled: Weekly retraining job via Airflow DAG (
retrain_model_dag) - Drift detected: Evidently reports feature drift above threshold (see Monitoring)
- Manual: New season data available or model performance drops below baseline
Steps¶
1. Pull Latest Data¶
2. Run Full Pipeline¶
This re-runs all changed stages: feature engineering → training → evaluation.
3. Review Metrics in MLflow¶
Compare the new run against the current champion alias.
4. Promote if Criteria Pass¶
# Via MLflow Python client
client.set_registered_model_alias("soccer-predictor", "champion", version=<new_version>)
Promotion rules are defined in Model Registry & Promotion Rules.
5. Redeploy API¶
After promotion, restart the API pod to load the new model:
kubectl rollout restart deployment/soccer-api -n soccer
kubectl rollout status deployment/soccer-api -n soccer
Validation Checklist¶
- [ ]
dvc reprocompletes without errors - [ ] New model log-loss ≤ current champion
- [ ] No data leakage detected (temporal split audit)
- [ ] API
/readyreturns 200 after pod restart - [ ] Smoke test prediction returns valid probabilities
Automated Retraining¶
Automated retraining via Airflow is planned. See Airflow DAGs.