Skip to content

Model Registry

Status: 🚧 Partial (Registration automated via DVC; Staging→Production gate is manual)

Role of the registry

The MLflow Model Registry serves as the single source of truth for deployable models.


Current Implementation

✅ MLflow Tracking

All training runs are logged to MLflow with: - Hyperparameters - Metrics (accuracy, precision, recall, F1) - Artifacts (models, plots, confusion matrices) - Code version (git commit)

Evidence:

mlflow ui --port 5001
# Navigate to experiments tab

✅ Automated Registration via DVC

Model registration is the final stage of the DVC pipeline (dvc.yaml):

register_model:
  cmd: python -m src.pipelines cli-register-model data/models/run_id.json
  deps:
    - data/models/run_id.json
    - src/pipelines/register_model.py

src/pipelines/register_model.py performs: 1. Creates the registered model if it doesn’t exist (idempotent). 2. Creates a new model version from the training run. 3. Transitions the version to settings.mlflow.model_stage (default: Staging).

Re-running dvc repro with the same run_id is safe (idempotent by design).

🚧 Manual: Staging → Production Promotion

Promoting a model from Staging to Production requires manual approval. This is a deliberate quality gate — not a missing feature.

Future workflow (Phase 5): 1. Validation metrics compared against thresholds. 2. If better than current production model → auto-promote to Staging. 3. After canary test period → Production. 4. Old model → Archived.


Model Lifecycle

Stage Meaning
None Freshly registered from training run
Staging Registered by register_model DVC stage; ready for testing
Production Currently served by API workers
Archived Deprecated; kept for reproducibility

Deployment Coupling

PredictionService (in src/app/services/predict.py) loads the model via:

mlflow.pyfunc.load_model(f"models:/{model_name}/{stage}")

Stage is read from settings.mlflow.model_stage — changing the stage in the registry switches the served model without redeployment.


Rollback

Rollback: transition a previous version to Production in MLflow UI or API. No retraining required.