Quiz: SRE Practices¶
4 questions
L0 (1 questions)¶
1. What is toil in the SRE context, and what is the target threshold?
Show answer
Toil is manual, repetitive, automatable work that scales linearly with service growth and has no enduring value (e.g., manually restarting pods, rotating certs by hand). The SRE target is no more than 50% of an SRE's time should be spent on toil.L1 (1 questions)¶
1. What is an error budget and how does it drive engineering decisions?
Show answer
If your SLO is 99.9%, the error budget is 0.1% of allowed failure. When the budget is healthy (>50%), ship freely. When it is low (<25%), prioritize reliability work and reduce deploy frequency. When exhausted, feature-freeze until reliability improves. It bridges product velocity and operational stability.L2 (1 questions)¶
1. Your team spends 70% of time on toil. What concrete steps do you take to bring it below 50%?
Show answer
1. Catalog all toil tasks with time estimates.2. Rank by frequency x time-cost.
3. Automate the top offenders first (e.g., replace manual cert rotation with cert-manager, add auto-remediation for known alerts).
4. Eliminate false/noisy alerts.
5. Track toil percentage weekly.
6. Negotiate with management to protect automation time.
L3 (1 questions)¶
1. A product team wants to launch a new service in production. What does an SRE production readiness review cover?