Sync vs Async Inference Modes¶

Status: ✅ Implemented

Motivation¶

Different consumers have different requirements: - interactive users require low latency, - batch or heavy workloads tolerate higher latency.

Synchronous inference¶

Endpoint: POST /predict/ (inline features) and GET /predict/{match_id} (batch lookup)

Implementation: - FastAPI handler submits predict_match task to the Celery ml queue. - Blocks up to _SYNC_TIMEOUT = 30s via loop.run_in_executor (non-blocking for asyncio). - Returns PredictResponse directly on success; 504 on timeout.

Endpoint list:

POST /predict/                # inline features
GET  /predict/{match_id}      # precomputed feature lookup
GET  /predict/matches/        # list upcoming matches
GET  /predict/model/info      # MLflow registry metadata

Characteristics: - Strict 30 s SLO. - Bounded payload size via Pydantic schema. - Immediate failure feedback (4xx/5xx).

When to use: - UI-driven predictions. - Real-time decision support.

Asynchronous inference¶

Endpoint: POST /predict/async/

Implementation: - FastAPI submits predict_match task to RabbitMQ ml queue and returns task_id immediately. - PredictionService is initialised once per worker process via worker_process_init signal (avoids reloading the MLflow model for every task). - Results stored in Celery result backend; retrieved via GET /monitoring/task_status/{task_id}.

Polling:

# Submit
curl -X POST /predict/async/ -d '{"match_id": 42}'
# → {"task_id": "abc-123", "status": "submitted", "status_url": "/monitoring/task_status/abc-123"}

# Poll
curl /monitoring/task_status/abc-123
# → {"status": "SUCCESS", "result": {...}}

Characteristics: - Higher throughput via task queue. - Retries and backoff managed by Celery. - Decoupled request/response lifecycle.

When to use: - Streamlit UI polling for results. - Computationally expensive feature assembly. - Batch workloads.

Operational trade-offs¶

Aspect	Sync	Async
Latency	Low (≤30 s SLO)	Higher (queue wait)
Throughput	Limited	High
Complexity	Lower	Higher
Failure mode	Immediate (504)	Deferred
UX	Direct response	Poll `status_url`

Safety considerations¶

Async jobs are idempotent — same match_id re-submission is safe.
Retries are bounded by Celery config.
Dead-letter queue configured for failed tasks.
Prometheus counters: prediction_requests_total{source="sync|async"}, prediction_timeouts_total.