Skip to content

Model Promotion Policy

Status: ✅ Implementedcandidate promotion is automated with a quality gate. champion promotion requires manual sign-off (see rationale below).


Alias scheme

Alias Assigned by Gate CI job
ci-smoke register_model — toy data (frac=0.001) None train:smoke
smoke register_model — real data None train:test
candidate promote_model DVC stage — automatic final.logloss ≤ current_candidate + 0.005 train:test
champion Developer / scheduled DAG — manual Hard gates + manual checklist (below)

ci-smoke is a CI wiring check only — toy model, never used by serving. smoke is the lifecycle entry point for real models.

See Model Registry for full lifecycle description.


Automated candidate gate (promote_model stage)

Runs automatically after register_model as a DVC stage. Params in params.yaml under promote_model:

Parameter Value Meaning
metric final.logloss Metric to compare
tolerance 0.005 Max allowed regression vs current candidate
candidate_alias candidate Alias to assign on pass

Gate failure is not a pipeline error — data/models/promoted_model.json records promoted: false and the candidate alias is left unchanged.


Hard gates for champion promotion

All three gates must be met before manual review is considered.

Gate Threshold Rationale
Log-loss ≤ logreg_full baseline log-loss must not exceed logistic regression on full features Ensures the proposed champion is strictly better than the simplest meaningful baseline
Brier ≤ current champion − 0.002 challenger Brier at least 0.002 lower than champion Forces a meaningful improvement, not just noise
ECE ≤ 0.05 calibration error ≤ 5% Ensures probabilities are usable for downstream betting simulation and API consumers

How to check gates

# Compare candidate vs champion metrics in MLflow CLI
mlflow runs get --run-id <final_run_id>

Key metrics: final.logloss, final.brier, final.ece.


Baseline reference values

Baseline metrics (from matches_clf_v1 experiment after Stage 1 prod run):

Model Log-loss Brier ECE
Marginal (class prior) TBD — fill after prod run TBD TBD
Elo-only (logreg) TBD TBD TBD
LogReg full features TBD TBD TBD
XGBoost (tuned, champion) TBD TBD TBD

Fill these values after running dvc exp run -S 'experiment=prod'.


Manual review checklist (before champion)

  • [ ] All hard gates pass (logloss, brier, ECE)
  • [ ] Calibration curve artifact (calibration_curves.png) — no severe over/under-confidence in any class
  • [ ] Confusion matrix (confusion_matrix_final.png) — no unexpected degradation on specific outcome classes
  • [ ] Segment metrics (segment_metrics.csv) — no severe performance drop in major leagues vs previous champion
  • [ ] Error analysis report (reports/error_analysis/v1/index.md) — no newly introduced systematic error pattern
  • [ ] Holdout slice is strictly 2024+ (no train/test contamination)
  • [ ] Calibration used temporal split (not random split)

Why champion promotion is NOT automated

  1. Dataset size — the football match dataset is small enough that a single bad season can produce gate-passing metrics by chance. Human review of segment metrics provides a sanity check that automated thresholds cannot.
  2. Calibration matters for downstream use — the API serves raw probabilities. A model with ECE just below 0.05 but with structural miscalibration in specific outcome classes should be caught in manual review.
  3. Operational safety — for a portfolio project with a live demo, a human checkpoint before swapping the champion model is appropriate.

Champion promotion procedure

  1. Verify all hard gates pass.
  2. Complete the manual review checklist.
  3. Run:
import mlflow
client = mlflow.MlflowClient()
candidate = client.get_model_version_by_alias("soccer-match-outcome", "candidate")
client.set_registered_model_alias("soccer-match-outcome", "champion", candidate.version)
print(f"champion → version {candidate.version}")
  1. Restart the serving worker to reload the new model:
kubectl rollout restart deployment/soccer-worker -n soccer
  1. Monitor the first 24h in Prometheus/Grafana for latency regressions.

See Model Registry for the full registry workflow and Known Limitations for current manual gate status.