Skip to content

Orchestration Audit Report — SoccerPredictAI

Date: 2026-04-28 Auditor: GitHub Copilot (Claude Opus 4.7) — /skill-ml-system-audit full (audit 08/12) Scope: Airflow DAGs — scheduling, dependency graph, retrain loop, fail handling Baseline: docs/validation/20260424/08_orchestration_audit.md


Delta vs baseline

airflow/dags/*.py unchanged since 2026-04-26. Baseline findings remain in force.


Confirmed inventory (5 DAGs)

DAG Schedule Purpose
soccer_etl_livescores_01 @hourly 3-day rolling livescores scraping
soccer_etl_livescores_02_next_matches @daily 15-day forward upcoming matches
soccer_etl_livescores_backfill_monthly None (manual) historical backfill (1998 → 2026-02)
soccer_etl_livescores_manual_trigger None (manual) 90-day window manual trigger
soccer_etl_export_matches_to_source None (manual) PostgreSQL → MinIO export (match, match_raw)

All DAGs: retries=3, retry_delay=5min. All HttpSensors: poke_interval=60, timeout=3600, mode="reschedule". Auth via X-Token from Airflow Variable SOCCER_FASTAPI_HEADER_TOKEN.

Missing DAGs: none for dvc repro, batch_inference, model retrain, alias promotion, or worker reload.


Risk register (re-confirmed)

ID Severity Description Status
OR-01 P0 No DAG for dvc repro — retrain loop fully manual Open (= R2)
OR-02 P0 No DAG for batch_inference — serving features go stale Open (= R5)
OR-03 P0 etl_export_matches_to_source is manual — no automatic raw-data refresh Open (= D-03)
OR-04 P1 No DAG-level alerting (email/Slack) on failures Open
OR-05 P1 No automatic retrain trigger (drift/metric/volume) Open
OR-06 P2 backfill_monthly uses hardcoded date bounds (1998 → 2026-02) Open
OR-07 P2 No DAG to monitor freshness across pipeline stages Open

Pipeline → Registry → Serving gap

[Airflow] scraping → PostgreSQL                ✅ automatic (@hourly, @daily)
[Manual]  PostgreSQL → MinIO export            ⚠
[Manual]  MinIO → DVC repro                    ⚠
[Manual]  DVC → MLflow Registry                ⚠
[Manual]  Registry → Celery worker restart     ⚠

Three manual hops separate fresh data from updated predictions.


Summary

Aspect Status
DAG inventory + scheduling
Retry / timeout / reschedule semantics
Sequential dependencies in export DAG
Automated retrain loop ❌ (OR-01, OR-02, OR-03)
Failure alerting ❌ (OR-04)

Recommendation: OR-01/02/03 collectively form the most operationally painful gap; address by introducing a soccer_retrain_pipeline DAG (export → repro → batch_inference → registry validation → worker rolling-restart). Add OR-04 in parallel.

See baseline §1–§5 for code-level detail.