Quiz: Capacity Planning¶

7 questions

L1 (3 questions)¶

1. What is the difference between utilization and saturation, and why does it matter for capacity planning?

Show answer

Utilization is the percentage of a resource being used. Saturation is whether work is queuing because the resource cannot keep up. A system can be at 70% utilization with zero saturation (steady load) or massive saturation (bursty load). Plan on saturation, not utilization, because bursts cause queuing even when average utilization is low.

2. What is the USE method and how do you apply it for capacity planning?

Show answer

USE (Utilization, Saturation, Errors) — for every resource (CPU, memory, disk, network), measure all three. Utilization: percentage busy (e.g., CPU at 70%). Saturation: degree of queuing (e.g., CPU run queue length > core count). Errors: error count (e.g., disk I/O errors, NIC drops). For capacity planning: graph utilization trends over 30-90 days and extrapolate. Alert on saturation (it causes user-visible latency before utilization hits 100%). Errors often signal hardware failure that reduces effective capacity. Apply systematically to every resource in the stack.

3. What is headroom and why should you never run infrastructure at high utilization?

Show answer

Headroom is the unused capacity reserved for spikes, failures, and growth. At high utilization (>80% CPU, >85% memory), there is no buffer for: traffic spikes (organic growth, viral events), cascading failures (losing a node shifts load to survivors, pushing them over the edge), deployments (rolling updates temporarily reduce capacity), and batch jobs (backups, ETL). Queuing theory shows latency increases exponentially as utilization approaches 100%. Target 60-70% utilization at peak for compute, 70-80% for storage. Plan to add capacity when projected utilization will exceed targets in 30-60 days.

L2 (4 questions)¶

1. How does Prometheus predict_linear help with capacity planning, and what is its key limitation?

Show answer

predict_linear extrapolates a metric's trend into the future. For example, predict_linear(node_filesystem_avail_bytes[7d], 30*24*3600) < 0 predicts whether disk will fill within 30 days based on the last 7 days. Its limitation is that it assumes linear growth — it fails for seasonal patterns, step-function changes (marketing campaigns), or logarithmic saturation.

2. In an N+1 headroom model, a 4-node cluster handles 1000 rps per node with a current peak of 2800 rps. Can it survive a single node failure? What should you do?

Show answer

With 1 node down, 3 remaining nodes must handle 2800 rps — 933 rps each (93% utilization). This is too tight. Add a 5th node so each handles 700 rps at peak. Losing 1 node means 4 nodes at 700 rps each, which is safe. N+1 means you should always be able to lose one node without exceeding safe utilization thresholds (typically 60-70% for CPU).

3. How do you model capacity for a service with bursty traffic patterns that peak at 10x average?

Show answer

1. Characterize the burst: measure p50, p95, p99 request rates (not just average).
2. Size for p99 peak with headroom, not for average.
3. Use auto-scaling with pre-warming: HPA/KEDA for containers, ASG with scheduled scaling for VMs (pre-scale before known peaks like marketing campaigns).
4. Add a queue or rate limiter to absorb bursts beyond provisioned capacity (shed load gracefully).
5. Load test at 2x expected peak to find the breaking point.
6. Monitor saturation metrics (queue depth, p99 latency) not just utilization. A service sized for average load will fail at peak.

4. How do you build a capacity model for a Kubernetes cluster and when should you add nodes?

Show answer

1. Track allocatable vs requested vs actual usage per resource (CPU, memory) at the cluster level.
2. Alert when total requests exceed 70% of total allocatable (not actual usage — requests are what the scheduler uses).
3. Monitor pod scheduling failures (FailedScheduling events).
4. Track namespace-level resource quotas vs usage.
5. Model growth: plot requested resources over 90 days, extrapolate to determine when you will hit allocatable limits.
6. Factor in node failure: in a 10-node cluster, losing 1 node shifts ~10% load to survivors — ensure survivors can absorb it. Add nodes before you need them, not after pods start failing to schedule.