Model Rollback & Recovery¶
This runbook describes how to recover from model-related incidents.
When to rollback a model¶
- performance regression detected,
- data or prediction drift alerts,
- increased error rates or latency,
- unexpected business behavior.
Rollback strategy¶
Model rollback is performed via MLflow Model Registry.
Steps: 1. Identify last known good model version. 2. Update model alias or stage (e.g., Production). 3. Redeploy serving service if required.
No retraining is required.
Verification¶
After rollback: - verify active model version in logs and dashboards, - monitor latency and error rate, - confirm alert resolution.
Escalation¶
If rollback does not resolve the issue: - disable affected inference endpoints, - fall back to degraded mode if available, - escalate to investigation.