Review Guide¶
This page helps you navigate the documentation based on your role and time budget.
For what is built and what is not, see Implementation Status. For system design and architecture decisions, see Architecture Overview.
Who this is for¶
| Role | Where to start |
|---|---|
| Recruiter / Hiring Manager | 2-minute path |
| Technical Interviewer | 10-minute path, then drill into specifics |
| Engineer reviewing code | Quickstart → Code Structure |
| Author preparing for interview | Demo Guide |
2-minute path¶
"What is this project and is it real?"
- Implementation Status — canonical source on what is built vs planned.
- System diagram — how components connect.
- Key facts:
- Data source: WhoScored.com (Airflow scraper → PostgreSQL)
- ML: XGBoost classifier, temporal-split validation, MLflow tracking
- Serving: FastAPI + Celery async, 316 automated tests
- Infra: Docker, Kubernetes/Helm, GitLab CI, SOPS secrets
- Monitoring: Prometheus
/metricsoperational; Grafana/Evidently planned
10-minute path¶
"Is this person an ML engineer or just a data scientist?"
- Architecture Overview — design goals, C4 diagrams, layer separation
- Validation Strategy — temporal split, leakage prevention
- Training Pipeline (DVC) — reproducible, versioned, multi-stage
- Serving Status — what the inference API does today
- Evidence — MLflow runs, API responses, test output
Signal to look for: the system is designed around explicit contracts between layers — data schemas (Great Expectations), model signatures (MLflow), API schemas (Pydantic).
Technical deep-dive (15–20 minutes)¶
"Can this person design systems, not just use frameworks?"
- Architecture Trade-offs — documented decisions with alternatives considered
- ML Problem & Baseline — task formulation, why beating the bookmaker matters
- Feature Engineering — leakage-safe design, offline/online parity
- Model Contract & Signature — input/output schema enforced at boundary
- CI/CD Quality Gates — what runs before code ships
- ADR Decisions — orchestration, data versioning, serving modes
- Lessons Learned — honest retrospective
What this project proves (by competency)¶
| Competency | Evidence |
|---|---|
| Reproducibility | dvc repro from clean checkout → same model. Verified by DVC lock + MLflow run IDs. |
| Validation rigor | Temporal split enforced in code, tested with hypothesis property tests. |
| Serving design | FastAPI with Pydantic schemas, sync + async via Celery, health endpoints. |
| Deployment readiness | Docker multi-stage, K8s manifests, Helm charts, GitLab CI pipeline. |
| Observability thinking | Prometheus /metrics, Celery queue stats, alerting runbooks. Grafana/Evidently planned. |
| Operational maturity | 316 tests (unit / property / service / contract / load), SOPS + age secrets. |
| System thinking | C4 diagrams, ADRs, explicit layer contracts, no cross-layer shortcuts. |
See Implementation Status for the full readiness matrix.
Running the system locally¶
- Quickstart — environment setup and reproducible golden path
- Local Development & Debugging — day-to-day workflow
- Code Structure — directory layout and conventions
- Configuration Reference — all params and overrides
- Common Failures & Troubleshooting
Interview preparation¶
Use Demo Guide — it contains a scripted 2/5/10-minute walkthrough, click paths, and answers to common questions.