Skip to content

Implementation Status

This page provides an honest assessment of what's implemented, in progress, and planned.

Last updated: February 27, 2026


Implementation Matrix

Component Status Notes
Data Engineering
Airflow ETL ✅ Operational Scraping + PostgreSQL ingestion
MinIO Storage ✅ Operational S3-compatible object storage
DVC Versioning ✅ Operational Data + model artifacts tracked
PostgreSQL ✅ Operational Canonical data store
ML Pipeline
Feature Engineering ✅ Complete stats_matches.py - time-windowed stats
DVC Pipeline ✅ Complete dvc.yaml orchestration working
MLflow Tracking ✅ Operational Experiment logging functional
Train/Test Splitting ✅ Complete Time-based + CV folds
Model Training ✅ Complete Baseline + XGBoost classifiers
Model Registry 🚧 Partial Automated via register_model DVC stage; Staging→Production gate is manual
Serving
FastAPI App ✅ Operational Full app with routers, middleware, lifespan, CORS
POST /predict ✅ Implemented Sync inference via Celery ml queue, 30 s timeout
GET /predict/{match_id} ✅ Implemented Lookup from batch_inference parquet output
POST /predict/async/ ✅ Implemented Async Celery job, returns task_id for polling
GET /predict/model/info ✅ Implemented MLflow model metadata from registry
Request Validation ✅ Implemented Pydantic schemas in src/app/schemas/predict.py
Model Loading ✅ Implemented Lazy-loaded once per worker process via PredictionService
Batch Predictions API 📋 Planned DVC batch_inference stage exists; no HTTP batch endpoint yet
Streamlit UI ✅ Operational src/ui/ — match list, predictions, polling async results
Monitoring
Prometheus Metrics ✅ Implemented GET /metrics, _PrometheusMiddleware, 8 counters/histograms/gauges
Service Health ✅ Implemented GET /healthcheck/, liveness probes
Celery Queue Stats ✅ Implemented GET /monitoring/celery/queues, /celery/workers
Task Status Polling ✅ Implemented GET /monitoring/task_status/{task_id}
Grafana Dashboards 📋 Planned Architecture designed
Evidently Drift Detection 📋 Planned Not integrated
Alerting Rules 📋 Planned Runbooks documented
Infrastructure
Docker Images ✅ Complete Multi-stage builds for API + workers
K8s Manifests ✅ Complete Deployments + Services + ConfigMaps
Helm Charts ✅ Complete Values + templates parameterized
GitLab CI ✅ Operational Build + test + deploy pipeline
Secrets (SOPS) ✅ Operational age encryption for sensitive data
Quality
pytest Framework ✅ Operational ~200 tests across unit, property, service, contract, load
Unit Tests ✅ Operational tests/unit/ — splitting, schemas, preprocess
Property Tests ✅ Operational tests/property/ — Hypothesis: features, splitting, metrics
Service Tests ✅ Operational tests/service/ — prediction service, Celery tasks
Contract Tests ✅ Operational tests/contract/test_pipeline_contracts.py
Load Tests ✅ Operational tests/load/locustfile.py — Locust load scenarios
Integration Tests 🚧 Partial API mock tests; no live MLflow/Celery required
Pre-commit Hooks ✅ Operational ruff + basic linting
Data Validation ✅ Operational Great Expectations suites in 3 DVC stages (raw/interim/features)

Legend

  • Operational: Implemented, tested, and working in practice
  • 🚧 In Progress: Partial implementation or active development
  • 📋 Planned: Designed but not yet implemented

Known Limitations

Current Limitations

  1. Inference Layer
  2. ✅ Sync and async POST /predict implemented with MLflow Registry integration
  3. ✅ Async inference via Celery ml queue with task status polling
  4. Model promotion (Staging→Production) requires manual approval — no automated policy
  5. No HTTP batch prediction endpoint (batch features computed by DVC pipeline only)

  6. Testing

  7. ✅ ~200 tests: unit, property (Hypothesis), service, contract, load (Locust)
  8. ✅ No-leakage invariant verified for rolling features
  9. Integration tests use mocks — no live Celery/MLflow dependency in CI

  10. Monitoring

  11. ✅ Prometheus /metrics endpoint operational, 8 metrics exported
  12. ✅ Request latency and prediction counters instrumented
  13. Grafana dashboards not yet deployed
  14. Evidently drift detection not yet integrated
  15. Alerting rules documented but not deployed

  16. Data Quality

  17. ✅ Great Expectations suites run as mandatory DVC pipeline stages
  18. No scheduled GE data docs refresh or CI-blocking quality gates yet

Architectural Debt

The following represent deliberate trade-offs made for MVP:

  • Feature Store: Currently file-based (Parquet); migration to dedicated store in roadmap
  • Config Management: params.yaml + DVC params; Hydra in roadmap for multi-env configs
  • Model Promotion: Registration automated; Staging→Production gate is manual
  • Drift Detection: Evidently wired into roadmap but not yet integrated
  • Real-time Monitoring: Prometheus metrics live; Grafana dashboards and alerting pending

What Works End-to-End

✅ Reproducible Training Pipeline

dvc pull        # Get versioned datasets
dvc repro       # Run full pipeline
mlflow ui       # Inspect experiments

Result: Deterministic model training with tracked experiments.

✅ Data Ingestion

# Airflow DAGs running
# PostgreSQL populated
# MinIO artifacts stored

Result: Automated data updates from web sources.

✅ Infrastructure

docker build    # Multi-stage images
kubectl apply   # K8s deployment
helm install    # Parameterized configs

Result: Deployment-ready infrastructure.


What Doesn't Work Yet

❌ Drift Monitoring

# No Evidently reports generated
# No drift alerts triggered

Reason: Evidently integration designed but not yet implemented.

❌ Grafana Dashboards

# Prometheus scraping works, dashboards not deployed

Reason: Infrastructure ready; dashboard provisioning not automated.


Next Milestones

Current: Monitoring & Operations

Goal: Full observability layer

  • [ ] Grafana dashboard (latency, throughput, error rate)
  • [ ] Evidently drift detection integrated
  • [ ] Alerting rules deployed (Prometheus Alertmanager)
  • [ ] Automated model promotion policy

Success Criteria: Observable system with automated quality gates


Next: MLOps Maturity

Goal: Reduce operational toil

  • [ ] Hydra multi-environment config management
  • [ ] Feast feature store migration (replace file-based parquet)
  • [ ] A/B testing infrastructure
  • [ ] Load testing benchmarks validated

Success Criteria: Reproducible, configurable, and scalable ML system


How to Verify Claims

Data Pipeline

# Check DVC tracking
dvc status

# Verify dataset versions
cat data/raw/match.parquet.dvc

# Check Airflow DAGs
ls airflow/dags/

ML Training

# Run pipeline
dvc repro

# Check MLflow
mlflow ui --port 5001
# Navigate to experiments tab

Infrastructure

# Check Docker images
docker images | grep soccer

# Check K8s manifests
ls k8s/manifests/

# Check CI pipeline
cat .gitlab-ci.yml | grep stages

Serving

# Start API + ML worker
uvicorn src.app.main:app

# Check healthcheck
curl http://localhost:8000/healthcheck/

# List upcoming matches
curl http://localhost:8000/predict/matches/

# Sync prediction by match ID
curl http://localhost:8000/predict/42

# Async prediction
curl -X POST http://localhost:8000/predict/async/ -H 'Content-Type: application/json' \
  -d '{"match_id": 42}'

# Check metrics
curl http://localhost:8000/metrics

Honest Assessment for Interviews

What to say:

"This is a portfolio project demonstrating a production-style MLOps lifecycle: data ingestion via Airflow, versioned datasets and reproducible pipelines via DVC, experiment tracking and model registry via MLflow, sync and async inference via FastAPI + Celery, Prometheus metrics, and CI/CD with GitLab. The full stack is Dockerized with K8s/Helm manifests."

What NOT to say:

❌ "Drift detection and Grafana dashboards are fully operational in production" ❌ "The model is production-calibrated and profitable"

What to highlight:

✅ "End-to-end pipeline: scraping → features → training → serving → monitoring" ✅ "Both sync (POST /predict) and async (POST /predict/async/) inference implemented" ✅ "DVC + MLflow ensure reproducibility and experiment traceability" ✅ "~200 tests: unit, property (Hypothesis), service, contract, and load (Locust)" ✅ "GitLab CI with SOPS secrets, Docker multi-stage builds, Helm-based K8s deploy"


Trade-offs & Learnings

Why Some Things Aren't Done

  1. Inference endpoint delay: Focused on getting the ML pipeline reproducible first (train before serve)
  2. Monitoring gaps: Infrastructure designed, integration deferred to avoid premature optimization
  3. Test coverage: Framework setup complete, expanding coverage systematically
  4. Model Registry manual: Automated promotion requires production traffic patterns to validate

What I'd Do Differently

  • Start with simpler deployment (FastAPI only, add workers later)
  • Write tests alongside features, not after
  • Implement basic monitoring earlier for feedback loop
  • Use feature flags for gradual rollout

References