Skip to content

Inference API Contract

This page is the canonical reference for the inference API surface: all implemented endpoints, their request/response schemas, and error semantics.

For concrete examples see Examples. For model input/output contract see ML: Model Contract.


Implemented endpoints

POST /predict/

Synchronous prediction endpoint.

Submits an inference task to the Celery ml queue and blocks until the result is returned (30 s hard timeout).

Request

{
  "match_id": 99,
  "features": {
    "diff_win_5_mean": 0.3,
    "diff_goals_for_3_mean": 0.6,
    "home_elo_pre": 1520.0,
    "sex": 0
  }
}
Field Type Description
match_id int Identifier for traceability
features dict[str, float \| int \| null] Feature dict — validated against MLflow model signature

Response — 200 OK

{
  "match_id": 99,
  "prediction": {
    "predicted_class": 0,
    "probabilities": {"0": 0.58, "1": 0.27, "2": 0.15},
    "model_version": "Production",
    "model_run_id": "3f7a1c9d2e4b"
  }
}
Field Type Description
predicted_class int Argmax of probabilities: 0 = Home Win, 1 = Draw, 2 = Away Win
probabilities dict[str, float] Per-class probabilities
model_version str MLflow model stage/alias used
model_run_id str MLflow run ID for full traceability

Error responses

Code Condition
422 Unprocessable Entity Pydantic schema validation failure
504 Gateway Timeout Celery worker did not respond within 30 s
500 Internal Server Error Unhandled inference error

GET /predict/{match_id}

Batch-lookup endpoint. Returns a pre-computed prediction from the batch_inference DVC pipeline output stored as a parquet file.

Response — 200 OK: same schema as POST /predict/ response.

Error responses

Code Condition
404 Not Found match_id not found in batch parquet

GET /predict/matches/

Returns a list of upcoming matches available for prediction.

Response — 200 OK

[
  {"match_id": 99, "home_team": "...", "away_team": "...", "match_date": "..."},
  ...
]

GET /predict/model/info

Returns metadata about the currently loaded model from the MLflow Registry.

Response — 200 OK

{
  "model_name": "soccer_model",
  "model_version": "Production",
  "model_run_id": "3f7a1c9d2e4b",
  "loaded": true
}

POST /predict/async/

Asynchronous prediction endpoint. Enqueues an inference task on the Celery ml queue and returns a task_id immediately without waiting for the result.

Request: same schema as POST /predict/.

Response — 202 Accepted

{
  "task_id": "abc-123-def-456",
  "status": "submitted",
  "status_url": "/monitoring/task_status/abc-123-def-456"
}

GET /monitoring/task_status/{task_id}

Polls the result of an async task submitted via POST /predict/async/.

Response — pending

{"task_id": "abc-123-def-456", "status": "pending"}

Response — success

{
  "task_id": "abc-123-def-456",
  "status": "success",
  "result": {
    "predicted_class": 0,
    "probabilities": {"0": 0.58, "1": 0.27, "2": 0.15},
    "model_version": "Production",
    "model_run_id": "3f7a1c9d2e4b"
  }
}

GET /healthcheck/

Liveness probe. Used by Kubernetes to determine if the pod should receive traffic.

Response — 200 OK

{
  "status": "ok",
  "worker_id": "...",
  "memory_usage_mb": 210.4
}

GET /metrics

Prometheus-compatible metrics endpoint. Scraped by the in-cluster Prometheus instance.

Returns plain-text exposition format with 8 counters, histograms, and gauges:

  • prediction_requests_total{source="sync|async"}
  • prediction_timeouts_total
  • prediction_latency_seconds (histogram)
  • model_version_info
  • and related worker/queue gauges

Planned endpoints

Endpoint Status Notes
POST /predict/batch 📋 Planned HTTP batch endpoint; batch parquet exists but no HTTP API yet

Validation semantics

  • All requests are validated against Pydantic schemas (src/app/schemas/predict.py) before any inference logic runs.
  • Unknown fields in features are not rejected; they are passed to the model signature validator.
  • Invalid types or missing required fields return 422 with structured error detail.
  • Input validation failures are client errors — they are not retried.

Schema boundary

The features dict keys must match the feature names recorded in the MLflow model signature. The serving layer does not transform or impute missing features. See ML: Model Contract for the full input/output contract.