Performance, Capacity & SLOs¶

Why SLOs matter¶

Serving systems must have explicit performance targets. Without SLOs, reliability cannot be measured or improved.

Target values (initial):

Target values (initial):

Key signals: - request rate, - CPU/memory utilization, - queue depth, - worker execution time.

Scaling decisions are driven by metrics, not by manual intervention.

In overload scenarios: - async inference is preferred, - non-critical requests may be rate-limited, - alerts notify operators before SLO violation.

SLOs are reviewed and adjusted based on: - observed traffic patterns, - model complexity, - infrastructure changes.