Skip to content

Review Guide

This page helps you navigate the documentation based on your role and time budget.

For what is built and what is not, see Implementation Status. For system design and architecture decisions, see Architecture Overview.


Who this is for

Role Where to start
Recruiter / Hiring Manager 2-minute path
Technical Interviewer 10-minute path, then drill into specifics
Engineer reviewing code QuickstartCode Structure
Author preparing for interview Demo Guide

2-minute path

"What is this project and is it real?"

  1. Implementation Status — canonical source on what is built vs planned.
  2. System diagram — how components connect.
  3. Key facts:
  4. Data source: WhoScored.com (Airflow scraper → PostgreSQL)
  5. ML: XGBoost classifier, temporal-split validation, MLflow tracking
  6. Serving: FastAPI + Celery async, 316 automated tests
  7. Infra: Docker, Kubernetes/Helm, GitLab CI, SOPS secrets
  8. Monitoring: Prometheus /metrics operational; Grafana/Evidently planned

10-minute path

"Is this person an ML engineer or just a data scientist?"

  1. Architecture Overview — design goals, C4 diagrams, layer separation
  2. Validation Strategy — temporal split, leakage prevention
  3. Training Pipeline (DVC) — reproducible, versioned, multi-stage
  4. Serving Status — what the inference API does today
  5. Evidence — MLflow runs, API responses, test output

Signal to look for: the system is designed around explicit contracts between layers — data schemas (Great Expectations), model signatures (MLflow), API schemas (Pydantic).


Technical deep-dive (15–20 minutes)

"Can this person design systems, not just use frameworks?"

  1. Architecture Trade-offs — documented decisions with alternatives considered
  2. ML Problem & Baseline — task formulation, why beating the bookmaker matters
  3. Feature Engineering — leakage-safe design, offline/online parity
  4. Model Contract & Signature — input/output schema enforced at boundary
  5. CI/CD Quality Gates — what runs before code ships
  6. ADR Decisions — orchestration, data versioning, serving modes
  7. Lessons Learned — honest retrospective

What this project proves (by competency)

Competency Evidence
Reproducibility dvc repro from clean checkout → same model. Verified by DVC lock + MLflow run IDs.
Validation rigor Temporal split enforced in code, tested with hypothesis property tests.
Serving design FastAPI with Pydantic schemas, sync + async via Celery, health endpoints.
Deployment readiness Docker multi-stage, K8s manifests, Helm charts, GitLab CI pipeline.
Observability thinking Prometheus /metrics, Celery queue stats, alerting runbooks. Grafana/Evidently planned.
Operational maturity 316 tests (unit / property / service / contract / load), SOPS + age secrets.
System thinking C4 diagrams, ADRs, explicit layer contracts, no cross-layer shortcuts.

See Implementation Status for the full readiness matrix.


Running the system locally

  1. Quickstart — environment setup and reproducible golden path
  2. Local Development & Debugging — day-to-day workflow
  3. Code Structure — directory layout and conventions
  4. Configuration Reference — all params and overrides
  5. Common Failures & Troubleshooting

Interview preparation

Use Demo Guide — it contains a scripted 2/5/10-minute walkthrough, click paths, and answers to common questions.