Inference API Contract¶
This page is the canonical reference for the inference API surface: all implemented endpoints, their request/response schemas, and error semantics.
For concrete examples see Examples. For model input/output contract see ML: Model Contract.
Implemented endpoints¶
POST /predict/¶
Synchronous prediction endpoint.
Submits an inference task to the Celery ml queue and blocks until the result is returned
(30 s hard timeout).
Request
{
"match_id": 99,
"features": {
"diff_win_5_mean": 0.3,
"diff_goals_for_3_mean": 0.6,
"home_elo_pre": 1520.0,
"sex": 0
}
}
| Field | Type | Description |
|---|---|---|
match_id |
int |
Identifier for traceability |
features |
dict[str, float \| int \| null] |
Feature dict — validated against MLflow model signature |
Response — 200 OK
{
"match_id": 99,
"prediction": {
"predicted_class": 0,
"probabilities": {"0": 0.58, "1": 0.27, "2": 0.15},
"model_version": "Production",
"model_run_id": "3f7a1c9d2e4b"
}
}
| Field | Type | Description |
|---|---|---|
predicted_class |
int |
Argmax of probabilities: 0 = Home Win, 1 = Draw, 2 = Away Win |
probabilities |
dict[str, float] |
Per-class probabilities |
model_version |
str |
MLflow model stage/alias used |
model_run_id |
str |
MLflow run ID for full traceability |
Error responses
| Code | Condition |
|---|---|
422 Unprocessable Entity |
Pydantic schema validation failure |
504 Gateway Timeout |
Celery worker did not respond within 30 s |
500 Internal Server Error |
Unhandled inference error |
GET /predict/{match_id}¶
Batch-lookup endpoint. Returns a pre-computed prediction from the batch_inference DVC
pipeline output stored as a parquet file.
Response — 200 OK: same schema as POST /predict/ response.
Error responses
| Code | Condition |
|---|---|
404 Not Found |
match_id not found in batch parquet |
GET /predict/matches/¶
Returns a list of upcoming matches available for prediction.
Response — 200 OK
GET /predict/model/info¶
Returns metadata about the currently loaded model from the MLflow Registry.
Response — 200 OK
{
"model_name": "soccer_model",
"model_version": "Production",
"model_run_id": "3f7a1c9d2e4b",
"loaded": true
}
POST /predict/async/¶
Asynchronous prediction endpoint. Enqueues an inference task on the Celery ml queue
and returns a task_id immediately without waiting for the result.
Request: same schema as POST /predict/.
Response — 202 Accepted
{
"task_id": "abc-123-def-456",
"status": "submitted",
"status_url": "/monitoring/task_status/abc-123-def-456"
}
GET /monitoring/task_status/{task_id}¶
Polls the result of an async task submitted via POST /predict/async/.
Response — pending
Response — success
{
"task_id": "abc-123-def-456",
"status": "success",
"result": {
"predicted_class": 0,
"probabilities": {"0": 0.58, "1": 0.27, "2": 0.15},
"model_version": "Production",
"model_run_id": "3f7a1c9d2e4b"
}
}
GET /healthcheck/¶
Liveness probe. Used by Kubernetes to determine if the pod should receive traffic.
Response — 200 OK
GET /metrics¶
Prometheus-compatible metrics endpoint. Scraped by the in-cluster Prometheus instance.
Returns plain-text exposition format with 8 counters, histograms, and gauges:
prediction_requests_total{source="sync|async"}prediction_timeouts_totalprediction_latency_seconds(histogram)model_version_info- and related worker/queue gauges
Planned endpoints¶
| Endpoint | Status | Notes |
|---|---|---|
POST /predict/batch |
📋 Planned | HTTP batch endpoint; batch parquet exists but no HTTP API yet |
Validation semantics¶
- All requests are validated against Pydantic schemas (
src/app/schemas/predict.py) before any inference logic runs. - Unknown fields in
featuresare not rejected; they are passed to the model signature validator. - Invalid types or missing required fields return
422with structured error detail. - Input validation failures are client errors — they are not retried.
Schema boundary¶
The features dict keys must match the feature names recorded in the MLflow model signature.
The serving layer does not transform or impute missing features.
See ML: Model Contract for the full input/output contract.