Model Promotion Policy¶
Status: ✅ Implemented —
candidatepromotion is automated with a quality gate.championpromotion requires manual sign-off (see rationale below).
Alias scheme¶
| Alias | Assigned by | Gate | CI job |
|---|---|---|---|
ci-smoke |
register_model — toy data (frac=0.001) |
None | train:smoke |
smoke |
register_model — real data |
None | train:test |
candidate |
promote_model DVC stage — automatic |
final.logloss ≤ current_candidate + 0.005 |
train:test |
champion |
Developer / scheduled DAG — manual | Hard gates + manual checklist (below) | — |
ci-smoke is a CI wiring check only — toy model, never used by serving.
smoke is the lifecycle entry point for real models.
See Model Registry for full lifecycle description.
Automated candidate gate (promote_model stage)¶
Runs automatically after register_model as a DVC stage.
Params in params.yaml under promote_model:
| Parameter | Value | Meaning |
|---|---|---|
metric |
final.logloss |
Metric to compare |
tolerance |
0.005 |
Max allowed regression vs current candidate |
candidate_alias |
candidate |
Alias to assign on pass |
Gate failure is not a pipeline error — data/models/promoted_model.json records
promoted: false and the candidate alias is left unchanged.
Hard gates for champion promotion¶
All three gates must be met before manual review is considered.
| Gate | Threshold | Rationale |
|---|---|---|
Log-loss ≤ logreg_full baseline |
log-loss must not exceed logistic regression on full features | Ensures the proposed champion is strictly better than the simplest meaningful baseline |
| Brier ≤ current champion − 0.002 | challenger Brier at least 0.002 lower than champion | Forces a meaningful improvement, not just noise |
| ECE ≤ 0.05 | calibration error ≤ 5% | Ensures probabilities are usable for downstream betting simulation and API consumers |
How to check gates¶
Key metrics: final.logloss, final.brier, final.ece.
Baseline reference values¶
Baseline metrics (from matches_clf_v1 experiment after Stage 1 prod run):
| Model | Log-loss | Brier | ECE |
|---|---|---|---|
| Marginal (class prior) | TBD — fill after prod run |
TBD |
TBD |
| Elo-only (logreg) | TBD |
TBD |
TBD |
| LogReg full features | TBD |
TBD |
TBD |
| XGBoost (tuned, champion) | TBD |
TBD |
TBD |
Fill these values after running
dvc exp run -S 'experiment=prod'.
Manual review checklist (before champion)¶
- [ ] All hard gates pass (logloss, brier, ECE)
- [ ] Calibration curve artifact (
calibration_curves.png) — no severe over/under-confidence in any class - [ ] Confusion matrix (
confusion_matrix_final.png) — no unexpected degradation on specific outcome classes - [ ] Segment metrics (
segment_metrics.csv) — no severe performance drop in major leagues vs previous champion - [ ] Error analysis report (
reports/error_analysis/v1/index.md) — no newly introduced systematic error pattern - [ ] Holdout slice is strictly 2024+ (no train/test contamination)
- [ ] Calibration used temporal split (not random split)
Why champion promotion is NOT automated¶
- Dataset size — the football match dataset is small enough that a single bad season can produce gate-passing metrics by chance. Human review of segment metrics provides a sanity check that automated thresholds cannot.
- Calibration matters for downstream use — the API serves raw probabilities. A model with ECE just below 0.05 but with structural miscalibration in specific outcome classes should be caught in manual review.
- Operational safety — for a portfolio project with a live demo, a human checkpoint before swapping the champion model is appropriate.
Champion promotion procedure¶
- Verify all hard gates pass.
- Complete the manual review checklist.
- Run:
import mlflow
client = mlflow.MlflowClient()
candidate = client.get_model_version_by_alias("soccer-match-outcome", "candidate")
client.set_registered_model_alias("soccer-match-outcome", "champion", candidate.version)
print(f"champion → version {candidate.version}")
- Restart the serving worker to reload the new model:
- Monitor the first 24h in Prometheus/Grafana for latency regressions.
See Model Registry for the full registry workflow and Known Limitations for current manual gate status.