Model Promotion Policy¶

Status: ✅ Implemented — candidate promotion is automated with a quality gate. champion promotion requires manual sign-off (see rationale below).

Alias scheme¶

Alias	Assigned by	Gate	CI job
`ci-smoke`	`register_model` — toy data (`frac=0.001`)	None	`train:smoke`
`smoke`	`register_model` — real data	None	`train:test`
`candidate`	`promote_model` DVC stage — automatic	`final.logloss ≤ current_candidate + 0.005`	`train:test`
`champion`	Developer / scheduled DAG — manual	Hard gates + manual checklist (below)	—

ci-smoke is a CI wiring check only — toy model, never used by serving. smoke is the lifecycle entry point for real models.

See Model Registry for full lifecycle description.

Automated candidate gate (`promote_model` stage)¶

Runs automatically after register_model as a DVC stage. Params in params.yaml under promote_model:

Parameter	Value	Meaning
`metric`	`final.logloss`	Metric to compare
`tolerance`	`0.005`	Max allowed regression vs current candidate
`candidate_alias`	`candidate`	Alias to assign on pass

Gate failure is not a pipeline error — data/models/promoted_model.json records promoted: false and the candidate alias is left unchanged.

Hard gates for `champion` promotion¶

All three gates must be met before manual review is considered.

Gate	Threshold	Rationale
Log-loss ≤ `logreg_full` baseline	log-loss must not exceed logistic regression on full features	Ensures the proposed champion is strictly better than the simplest meaningful baseline
Brier ≤ current champion − 0.002	challenger Brier at least 0.002 lower than champion	Forces a meaningful improvement, not just noise
ECE ≤ 0.05	calibration error ≤ 5%	Ensures probabilities are usable for downstream betting simulation and API consumers

How to check gates¶

# Compare candidate vs champion metrics in MLflow CLI
mlflow runs get --run-id <final_run_id>

Key metrics: final.logloss, final.brier, final.ece.

Baseline reference values¶

Baseline metrics (from matches_clf_v1 experiment after Stage 1 prod run):

Model	Log-loss	Brier	ECE
Marginal (class prior)	`TBD — fill after prod run`	`TBD`	`TBD`
Elo-only (logreg)	`TBD`	`TBD`	`TBD`
LogReg full features	`TBD`	`TBD`	`TBD`
XGBoost (tuned, champion)	`TBD`	`TBD`	`TBD`

Fill these values after running dvc exp run -S 'experiment=prod'.

Manual review checklist (before `champion`)¶

[ ] All hard gates pass (logloss, brier, ECE)
[ ] Calibration curve artifact (calibration_curves.png) — no severe over/under-confidence in any class
[ ] Confusion matrix (confusion_matrix_final.png) — no unexpected degradation on specific outcome classes
[ ] Segment metrics (segment_metrics.csv) — no severe performance drop in major leagues vs previous champion
[ ] Error analysis report (reports/error_analysis/v1/index.md) — no newly introduced systematic error pattern
[ ] Holdout slice is strictly 2024+ (no train/test contamination)
[ ] Calibration used temporal split (not random split)

Why `champion` promotion is NOT automated¶

Dataset size — the football match dataset is small enough that a single bad season can produce gate-passing metrics by chance. Human review of segment metrics provides a sanity check that automated thresholds cannot.
Calibration matters for downstream use — the API serves raw probabilities. A model with ECE just below 0.05 but with structural miscalibration in specific outcome classes should be caught in manual review.
Operational safety — for a portfolio project with a live demo, a human checkpoint before swapping the champion model is appropriate.

Champion promotion procedure¶

Verify all hard gates pass.
Complete the manual review checklist.
Run:

import mlflow
client = mlflow.MlflowClient()
candidate = client.get_model_version_by_alias("soccer-match-outcome", "candidate")
client.set_registered_model_alias("soccer-match-outcome", "champion", candidate.version)
print(f"champion → version {candidate.version}")

Restart the serving worker to reload the new model:

kubectl rollout restart deployment/soccer-worker -n soccer

Monitor the first 24h in Prometheus/Grafana for latency regressions.

See Model Registry for the full registry workflow and Known Limitations for current manual gate status.

Model Promotion Policy¶

Alias scheme¶

Automated candidate gate (promote_model stage)¶

Hard gates for champion promotion¶