Name origin: SLO comes from Google's Site Reliability Engineering (SRE) book (2016). The SLI/SLO/SLA hierarchy was formalized there, but the concepts trace back to telecom's "five nines" (99.999%) availability targets from the 1990s. The term "error budget" was coined by Google SRE Ben Treynor Sloss to reframe reliability as a finite, spendable resource rather than an absolute goal.
SLI (indicator) → What you measure
SLO (objective) → What you target internally
SLA (agreement) → What you promise contractually
SLO should be stricter than SLA (buffer zone)
Type
Example
Availability SLI
% of requests returning non-5xx
Latency SLI
% of requests completing in < 300ms
Availability SLO
99.9% over 30-day rolling window
Availability SLA
99.5% (credits if breached)
Remember: The hierarchy flows downward in strictness: SLI (raw measurement) feeds into SLO (internal target) which must be stricter than SLA (contractual promise). Mnemonic: "I-O-A" — Indicator measures, Objective targets, Agreement promises. If your SLO equals your SLA, you have zero buffer for unexpected incidents.
Formal review. Reliability > features next quarter.
Gotcha: Error budget math assumes uniform traffic. A 99.9% SLO over 30 days gives 43.2 minutes of downtime, but if 80% of your traffic happens during business hours, a 20-minute outage at peak costs far more error budget than 20 minutes at 3 AM. Consider weighting your SLI by traffic volume, not wall-clock time.
Under the hood: "Blameless" does not mean "accountabilityless." A blameless postmortem focuses on systemic causes (why did the system allow this failure?) rather than individual blame (who caused this?). The goal is to make it psychologically safe to report honestly, which leads to better root-cause analysis. If people fear punishment, they hide information, and the real causes go unfixed.