Inference API Contract¶

This page is the canonical reference for the inference API surface: all implemented endpoints, their request/response schemas, and error semantics.

For concrete examples see Examples. For model input/output contract see ML: Model Contract.

Implemented endpoints¶

`POST /predict/`¶

Synchronous prediction endpoint.

Submits an inference task to the Celery ml queue and blocks until the result is returned (30 s hard timeout).

Request

{
  "match_id": 99,
  "features": {
    "diff_win_5_mean": 0.3,
    "diff_goals_for_3_mean": 0.6,
    "home_elo_pre": 1520.0,
    "sex": 0
  }
}

Field	Type	Description
`match_id`	`int`	Identifier for traceability
`features`	`dict[str, float \\| int \\| null]`	Feature dict — validated against MLflow model signature

Response — 200 OK

{
  "match_id": 99,
  "prediction": {
    "predicted_class": 0,
    "probabilities": {"0": 0.58, "1": 0.27, "2": 0.15},
    "model_version": "Production",
    "model_run_id": "3f7a1c9d2e4b"
  }
}

Field	Type	Description
`predicted_class`	`int`	Argmax of probabilities: 0 = Home Win, 1 = Draw, 2 = Away Win
`probabilities`	`dict[str, float]`	Per-class probabilities
`model_version`	`str`	MLflow model stage/alias used
`model_run_id`	`str`	MLflow run ID for full traceability

Error responses

Code	Condition
`422 Unprocessable Entity`	Pydantic schema validation failure
`504 Gateway Timeout`	Celery worker did not respond within 30 s
`500 Internal Server Error`	Unhandled inference error

`GET /predict/{match_id}`¶

Batch-lookup endpoint. Returns a pre-computed prediction from the batch_inference DVC pipeline output stored as a parquet file.

Response — 200 OK: same schema as POST /predict/ response.

Error responses

Code	Condition
`404 Not Found`	`match_id` not found in batch parquet

`GET /predict/matches/`¶

Returns a list of upcoming matches available for prediction.

Response — 200 OK

[
  {"match_id": 99, "home_team": "...", "away_team": "...", "match_date": "..."},
  ...
]

`GET /predict/model/info`¶

Returns metadata about the currently loaded model from the MLflow Registry.

Response — 200 OK

{
  "model_name": "soccer_model",
  "model_version": "Production",
  "model_run_id": "3f7a1c9d2e4b",
  "loaded": true
}

`POST /predict/async/`¶

Asynchronous prediction endpoint. Enqueues an inference task on the Celery ml queue and returns a task_id immediately without waiting for the result.

Request: same schema as POST /predict/.

Response — 202 Accepted

{
  "task_id": "abc-123-def-456",
  "status": "submitted",
  "status_url": "/monitoring/task_status/abc-123-def-456"
}

`GET /monitoring/task_status/{task_id}`¶

Polls the result of an async task submitted via POST /predict/async/.

Response — pending

{"task_id": "abc-123-def-456", "status": "pending"}

Response — success

{
  "task_id": "abc-123-def-456",
  "status": "success",
  "result": {
    "predicted_class": 0,
    "probabilities": {"0": 0.58, "1": 0.27, "2": 0.15},
    "model_version": "Production",
    "model_run_id": "3f7a1c9d2e4b"
  }
}

`GET /healthcheck/`¶

Liveness probe. Used by Kubernetes to determine if the pod should receive traffic.

Response — 200 OK

{
  "status": "ok",
  "worker_id": "...",
  "memory_usage_mb": 210.4
}

`GET /metrics`¶

Prometheus-compatible metrics endpoint. Scraped by the in-cluster Prometheus instance.

Returns plain-text exposition format with 8 counters, histograms, and gauges:

prediction_requests_total{source="sync|async"}
prediction_timeouts_total
prediction_latency_seconds (histogram)
model_version_info
and related worker/queue gauges

Planned endpoints¶

Endpoint	Status	Notes
`POST /predict/batch`	📋 Planned	HTTP batch endpoint; batch parquet exists but no HTTP API yet

Validation semantics¶

All requests are validated against Pydantic schemas (src/app/schemas/predict.py) before any inference logic runs.
Unknown fields in features are not rejected; they are passed to the model signature validator.
Invalid types or missing required fields return 422 with structured error detail.
Input validation failures are client errors — they are not retried.

Schema boundary¶

The features dict keys must match the feature names recorded in the MLflow model signature. The serving layer does not transform or impute missing features. See ML: Model Contract for the full input/output contract.

Inference API Contract¶

Implemented endpoints¶

POST /predict/¶

GET /predict/{match_id}¶

GET /predict/matches/¶

GET /predict/model/info¶

POST /predict/async/¶

GET /monitoring/task_status/{task_id}¶

GET /healthcheck/¶

GET /metrics¶

Planned endpoints¶

Validation semantics¶

Schema boundary¶

`POST /predict/`¶

`GET /predict/{match_id}`¶

`GET /predict/matches/`¶

`GET /predict/model/info`¶

`POST /predict/async/`¶

`GET /monitoring/task_status/{task_id}`¶

`GET /healthcheck/`¶

`GET /metrics`¶