Skip to content

Model Retraining

How to trigger, monitor, and validate a model retraining run.

Work in progress

This page is a placeholder. Will be updated after the first automated retraining cycle.

When to Retrain

Retraining should be triggered when any of the following conditions are met:

  1. Scheduled: Weekly retraining job via Airflow DAG (retrain_model_dag)
  2. Drift detected: Evidently reports feature drift above threshold (see Monitoring)
  3. Manual: New season data available or model performance drops below baseline

Steps

1. Pull Latest Data

dvc pull data/processed/

2. Run Full Pipeline

dvc repro

This re-runs all changed stages: feature engineering → training → evaluation.

3. Review Metrics in MLflow

mlflow ui --backend-store-uri ./mlruns

Compare the new run against the current champion alias.

4. Promote if Criteria Pass

# Via MLflow Python client
client.set_registered_model_alias("soccer-predictor", "champion", version=<new_version>)

Promotion rules are defined in Model Registry & Promotion Rules.

5. Redeploy API

After promotion, restart the API pod to load the new model:

kubectl rollout restart deployment/soccer-api -n soccer
kubectl rollout status deployment/soccer-api -n soccer

Validation Checklist

  • [ ] dvc repro completes without errors
  • [ ] New model log-loss ≤ current champion
  • [ ] No data leakage detected (temporal split audit)
  • [ ] API /ready returns 200 after pod restart
  • [ ] Smoke test prediction returns valid probabilities

Automated Retraining

Automated retraining via Airflow is planned. See Airflow DAGs.