Implementation Status¶

This is the canonical source of truth for implementation readiness. All claims in other pages must be consistent with this page.

Last updated: April 28, 2026 (v1 cycle Phase 0–2 + Phase 4)

v1.0 scope: see Requirements — Definition of Done and Roadmap — v1.0 Demo Track for the binding 1–2-week scope. Items below not part of v1.0 stay at their current status until post-v1.

Legend¶

✅ Operational — Implemented, tested, and working in practice
🚧 Partial — Partially implemented or requires manual steps
📋 Planned — Designed but not yet implemented

Implementation Matrix¶

Component	Status	Notes
Data Engineering
Airflow ETL	✅ Operational	Scraping + PostgreSQL ingestion
MinIO Storage	✅ Operational	S3-compatible object storage
DVC Versioning	✅ Operational	Data + model artifacts tracked
PostgreSQL	✅ Operational	Canonical data store
ML Pipeline
Feature Engineering	✅ Operational	`stats_matches.py` — time-windowed stats
DVC Pipeline	✅ Operational	`dvc.yaml` orchestration working
MLflow Tracking	✅ Operational	Experiment logging functional
Train/Test Splitting	✅ Operational	Time-based + CV folds
Model Training	🚧 Partial	Baseline + XGBoost classifiers wired; smoke parameters active by default (`classification.fracs_for_train=[0.001, 0.002]`, `tuning.n_trials=2`); production-scale run is a v1.0 deliverable
Model Registry	🚧 Partial	Registration automated; Staging→Production gate is manual
Serving
FastAPI App	✅ Operational	Routers, middleware, lifespan; CORS allow-list driven by `CORS_ALLOWED_ORIGINS` (default empty = no cross-origin)
POST /predict	✅ Operational	Sync inference via Celery `ml` queue, 30 s timeout
GET /predict/{match_id}	✅ Operational	Lookup from `batch_inference` parquet output
POST /predict/async/	✅ Operational	Async Celery job, returns `task_id` for polling
GET /predict/model/info	✅ Operational	MLflow model metadata from registry
Request Validation	✅ Operational	Pydantic schemas in `src/app/schemas/predict.py`
Model Loading	✅ Operational	Lazy-loaded once per worker process via `PredictionService`
Batch Predictions API	📋 Planned	DVC `batch_inference` stage exists; no HTTP batch endpoint
Streamlit UI	✅ Operational	`src/ui/app/main.py` (livescores), `pages/1_Predictions.py` (1×2 outcome), `pages/2_Model_Metrics.py` (champion/challenger metadata + metrics, read-only). Demo disclaimer rendered on every page.
Monitoring
Prometheus Metrics	✅ Operational	`GET /metrics`, `_PrometheusMiddleware`, 8 counters/histograms/gauges
Service Health	✅ Operational	`GET /healthcheck/`, liveness probes
Celery Queue Stats	✅ Operational	`GET /monitoring/celery/queues`, `/celery/workers`
Task Status Polling	✅ Operational	`GET /monitoring/task_status/{task_id}`
Grafana Dashboards	📋 Planned	Prometheus exporting; dashboards not yet deployed
Evidently Drift Detection	📋 Planned	Not integrated
Alerting Rules	📋 Planned	Runbooks documented; rules not deployed
Infrastructure
Docker Images	✅ Operational	Multi-stage builds for API + workers
K8s Manifests	✅ Operational	Deployments + Services + ConfigMaps
Helm Charts	✅ Operational	Values + templates parameterized; nginx-ingress rate-limit (rps/burst/connections) configurable via `ingress.rateLimit`
GitLab CI	✅ Operational	Build + test + deploy pipeline
Secrets (SOPS)	✅ Operational	age encryption for sensitive data
Quality
pytest Framework	✅ Operational	316 tests collected (unit, property, service, contract, load); see `docs/planning/20260428_test_v2.md` for coverage gap matrix
Unit Tests	✅ Operational	`tests/unit/` — splitting, schemas, preprocess
Property Tests	✅ Operational	`tests/property/` — Hypothesis: features, splitting, metrics
Service Tests	✅ Operational	`tests/service/` — prediction service, Celery tasks
Contract Tests	✅ Operational	`tests/contract/test_pipeline_contracts.py`
Load Tests	✅ Operational	`tests/load/locustfile.py` — Locust load scenarios
Integration Tests	🚧 Partial	API mock tests; no live MLflow/Celery required in CI
Pre-commit Hooks	✅ Operational	ruff + basic linting
Data Validation	✅ Operational	Great Expectations suites at raw / finished / future / features stages

Known Limitations¶

Model promotion — Registration is automated; Staging→Production gate requires manual approval. No automated promotion policy exists.
Batch HTTP endpoint — batch_inference features are computed by DVC pipeline only; no HTTP endpoint for batch requests.
Grafana — Prometheus is exporting 8 metrics. Dashboards are not yet deployed.
Evidently — Drift detection is architecturally designed; not yet integrated.
Alerting — Alerting rules are documented in runbooks; not deployed in Alertmanager.
Integration tests — All CI tests use mocks. No live Celery/MLflow dependency in CI.
Feature store — Features are file-based Parquet; no dedicated online store.
API authentication — All public endpoints are unauthenticated. Access is TLS-only.

See Architecture Limitations for deployment-level constraints.

How to verify¶

# Reproduce the ML pipeline
dvc pull && dvc repro

# Inspect experiments
mlflow ui --port 5001

# Run the test suite
pytest tests/ -q

# Check API health
curl http://localhost:8000/healthcheck/

# Check Prometheus metrics
curl http://localhost:8000/metrics

See Quickstart for the full reproducibility path.

Architecture Overview — system design and layer contracts
Architecture Roadmap — planned improvements with justification
Quickstart — reproducible golden path

Implementation Status¶

Legend¶

Implementation Matrix¶

Known Limitations¶

How to verify¶

Related¶