Skip to content

Glossary

This glossary defines key terms as they are used within this project. Definitions are pragmatic and system-specific.


MLOps

A set of practices for building, deploying, monitoring, and maintaining machine learning systems in production.


DVC (Data Version Control)

A tool used to version datasets and pipeline stages, ensuring reproducibility of experiments.


MLflow

A platform for tracking experiments, storing artifacts, and managing model versions via a registry.


Model Registry

A centralized service that stores trained models, their metadata, and their lifecycle stages (e.g. staging, production).


Train / Serve Parity

A principle stating that the same feature logic and model artifacts must be used during training and online inference.


Data Contract

A formal definition of expected data schema and quality constraints, validated using tools such as Great Expectations.


Leakage

The use of information during training that would not be available at prediction time, leading to overly optimistic evaluation results.


Golden Path

A minimal, deterministic sequence of steps that reproduces the full ML lifecycle from data to inference.


Sync Inference

A low-latency inference mode where predictions are returned immediately within the request lifecycle.


Async Inference

A background inference mode using message queues and workers, suitable for heavier or batch workloads.


Drift

A change in data or model behavior over time that can degrade model performance.


SLO (Service Level Objective)

A target level for service reliability or performance, such as latency or error rate.


Runbook

Operational documentation describing how to diagnose and resolve common failures in a production system.