Review Guide¶

This page helps you navigate the documentation based on your role and time budget.

For what is built and what is not, see Implementation Status. For system design and architecture decisions, see Architecture Overview.

Who this is for¶

Role	Where to start
Recruiter / Hiring Manager	2-minute path
Technical Interviewer	10-minute path, then drill into specifics
Engineer reviewing code	Quickstart → Code Structure
Author preparing for interview	Demo Guide

2-minute path¶

"What is this project and is it real?"

Implementation Status — canonical source on what is built vs planned.
System diagram — how components connect.
Key facts:
Data source: WhoScored.com (Airflow scraper → PostgreSQL)
ML: XGBoost classifier, temporal-split validation, MLflow tracking
Serving: FastAPI + Celery async, 316 automated tests
Infra: Docker, Kubernetes/Helm, GitLab CI, SOPS secrets
Monitoring: Prometheus /metrics operational; Grafana/Evidently planned

10-minute path¶

"Is this person an ML engineer or just a data scientist?"

Architecture Overview — design goals, C4 diagrams, layer separation
Validation Strategy — temporal split, leakage prevention
Training Pipeline (DVC) — reproducible, versioned, multi-stage
Serving Status — what the inference API does today
Evidence — MLflow runs, API responses, test output

Signal to look for: the system is designed around explicit contracts between layers — data schemas (Great Expectations), model signatures (MLflow), API schemas (Pydantic).

Technical deep-dive (15–20 minutes)¶

"Can this person design systems, not just use frameworks?"

Architecture Trade-offs — documented decisions with alternatives considered
ML Problem & Baseline — task formulation, why beating the bookmaker matters
Feature Engineering — leakage-safe design, offline/online parity
Model Contract & Signature — input/output schema enforced at boundary
CI/CD Quality Gates — what runs before code ships
ADR Decisions — orchestration, data versioning, serving modes
Lessons Learned — honest retrospective

What this project proves (by competency)¶

Competency	Evidence
Reproducibility	`dvc repro` from clean checkout → same model. Verified by DVC lock + MLflow run IDs.
Validation rigor	Temporal split enforced in code, tested with `hypothesis` property tests.
Serving design	FastAPI with Pydantic schemas, sync + async via Celery, health endpoints.
Deployment readiness	Docker multi-stage, K8s manifests, Helm charts, GitLab CI pipeline.
Observability thinking	Prometheus `/metrics`, Celery queue stats, alerting runbooks. Grafana/Evidently planned.
Operational maturity	316 tests (unit / property / service / contract / load), SOPS + age secrets.
System thinking	C4 diagrams, ADRs, explicit layer contracts, no cross-layer shortcuts.

See Implementation Status for the full readiness matrix.

Running the system locally¶

Quickstart — environment setup and reproducible golden path
Local Development & Debugging — day-to-day workflow
Code Structure — directory layout and conventions
Configuration Reference — all params and overrides
Common Failures & Troubleshooting

Interview preparation¶

Use Demo Guide — it contains a scripted 2/5/10-minute walkthrough, click paths, and answers to common questions.