Skip to content

Capacity Planning

← Back to all decks

26 cards — 🟢 5 easy | 🟡 9 medium | 🔴 6 hard

🟢 Easy (5)

1. What are the four resource dimensions that every system bottleneck lives in?

Show answer CPU, Memory, Disk (IOPS and bandwidth), and Network (bandwidth and packets per second).

Remember: Amdahl's Law — the speedup of a system is limited by the slowest component. Find the bottleneck first, then optimize it.

2. What is the N+1 model in capacity planning?

Show answer N+1 means the system can survive the loss of one node. You provision enough capacity so that if one node fails, the remaining nodes can handle the full peak load.

Remember: capacity planning inputs: historical metrics, growth projections, planned business events (launches, campaigns), and seasonality patterns. Review quarterly at minimum.

3. What is the formula for compound growth forecasting in capacity planning?

Show answer future = current * (1 + growth_rate) ^ months. For example, 12% monthly growth doubles capacity needs in roughly 6 months.

Example: use linear regression on 6 months of historical data to project when you'll hit 80% capacity. Add business events (launches, campaigns) as step functions.

4. What is right-sizing and how do you identify over-provisioned instances?

Show answer Right-sizing means matching instance size to actual workload needs. Identify over-provisioned instances by comparing actual usage to allocated resources over 2-4 weeks. Signals: CPU consistently below 20%, memory below 40%, network below 10% of capacity. Downsize by one instance type, monitor for a week, then repeat. Cloud tools like AWS Compute Optimizer and GCP Recommender automate this analysis.

Remember: capacity planning inputs: historical metrics, growth projections, planned business events (launches, campaigns), and seasonality patterns. Review quarterly at minimum.

5. How do you plan for disk capacity and avoid running out of space?

Show answer Monitor disk usage percentage and growth rate. Set alerts at 70% (warning) and 85% (critical). Use predict_linear to forecast when disk will fill. Common disk consumers: logs (set rotation and retention), database WAL files, temp files, container images. Quick wins: enable log rotation, set database vacuum schedules, prune old container images. For production databases, never let disk exceed 80% — many databases behave unpredictably when disk is nearly full.

Remember: capacity planning inputs: historical metrics, growth projections, planned business events (launches, campaigns), and seasonality patterns. Review quarterly at minimum.

🟡 Medium (9)

1. What is the difference between utilization and saturation, and which should you plan on?

Show answer Utilization is the percentage of a resource in use. Saturation is whether work is queuing because the resource cannot keep up. Plan on saturation, not utilization, because a system can be at 70% utilization with zero queuing or 70% utilization with massive queuing depending on burstiness.

Remember: capacity planning inputs: historical metrics, growth projections, planned business events (launches, campaigns), and seasonality patterns. Review quarterly at minimum.

2. How does Prometheus predict_linear help with capacity planning?

Show answer predict_linear extrapolates a time-series forward. For example, predict_linear(node_filesystem_avail_bytes[7d], 30*24*3600) < 0 returns true if the disk will be full within 30 days based on the last 7 days of data.

Example: use linear regression on 6 months of historical data to project when you'll hit 80% capacity. Add business events (launches, campaigns) as step functions.

3. What are the recommended target max utilization thresholds for CPU, Memory, Disk, and Network?

Show answer CPU: 60-70%, Memory: 70-80%, Disk: 70-75%, Network: 50-60%. These provide headroom for bursts, GC pauses, deployments, retransmissions, and failure scenarios.

Remember: capacity planning inputs: historical metrics, growth projections, planned business events (launches, campaigns), and seasonality patterns. Review quarterly at minimum.

4. How do you right-size container resource requests using Prometheus?

Show answer Compare actual CPU usage (rate of container_cpu_usage_seconds_total) to requested CPU (kube_pod_container_resource_requests). If the ratio is below 0.3, the container is massively over-provisioned. Set requests to p99 of actual usage plus 20% headroom.

Remember: capacity planning inputs: historical metrics, growth projections, planned business events (launches, campaigns), and seasonality patterns. Review quarterly at minimum.

5. How do you perform trend analysis for capacity planning?

Show answer Collect at least 30 days of resource metrics (ideally 90+ days). Plot peak daily usage over time. Fit a linear or exponential regression to identify growth rate. Extrapolate forward to find the date when usage exceeds your threshold (e.g., 70% CPU). Account for known events (product launches, seasonal spikes). Prometheus predict_linear automates simple linear extrapolation but misses seasonal patterns.

Remember: capacity planning inputs: historical metrics, growth projections, planned business events (launches, campaigns), and seasonality patterns. Review quarterly at minimum.

6. How does load testing fit into capacity planning?

Show answer Load testing validates your capacity model against reality. Steps: define target throughput (e.g., 5000 req/s), ramp gradually using tools like k6 or Locust, measure latency percentiles and error rates at each load level. Find the breaking point where p99 latency exceeds SLO or errors spike. Compare against your model's predictions. If the model predicted 8000 req/s but the system degrades at 5000, investigate the bottleneck (usually database, connection pool, or single-threaded component).

Example: use tools like k6, JMeter, or Locust to simulate peak traffic and find the breaking point before your users do.

7. What metrics should trigger auto-scaling and what are the common pitfalls?

Show answer Primary triggers: CPU utilization (>70%), request queue depth (>0 sustained), request latency (p99 > SLO). Pitfalls: scaling on average CPU misses hot-spot nodes, scaling too aggressively causes thrashing (scale up/down repeatedly), cool-down periods too short, and not accounting for application startup time. Set scale-up threshold lower than scale-down to create hysteresis. Always load-test your scaling policy before relying on it in production.

Remember: key capacity metrics: CPU utilization, memory usage, disk IOPS, network throughput, queue depth, request latency. Track all, alert on trends.

8. What is 'headroom' in capacity planning and why is it important?

Show answer Headroom is the buffer between current usage and provisioned capacity. It absorbs traffic spikes and growth without requiring emergency scaling.

Remember: headroom = spare capacity for unexpected spikes. Typical target: 20-30% headroom. Less = risk of outage. More = wasted cost.

9. What are USE metrics and how do they inform capacity decisions?

Show answer USE = Utilization, Saturation, Errors. High utilization signals approaching limits, high saturation means queuing/delays, errors suggest failure. Together they show where to add capacity.

Remember: key capacity metrics: CPU utilization, memory usage, disk IOPS, network throughput, queue depth, request latency. Track all, alert on trends.

🔴 Hard (6)

1. What is seasonal decomposition and how is it used in capacity planning?

Show answer Decompose a metric into three components: trend (long-term direction), seasonal (repeating daily/weekly/monthly pattern), and residual (noise). Plan capacity for trend + seasonal_peak + safety_margin. This prevents both over-provisioning from using raw peaks and under-provisioning from using averages.

Remember: capacity planning inputs: historical metrics, growth projections, planned business events (launches, campaigns), and seasonality patterns. Review quarterly at minimum.

2. Why is burst capacity different from sustained capacity, and what design targets should you set?

Show answer Sustained throughput is the steady-state rate a system handles without queuing. Burst capacity is higher but time-limited before queues build. Design so sustained capacity exceeds normal peak traffic, burst capacity exceeds 2x normal peak for flash crowds, and drain rate exceeds arrival rate so queues eventually empty.

Remember: capacity planning inputs: historical metrics, growth projections, planned business events (launches, campaigns), and seasonality patterns. Review quarterly at minimum.

3. Describe the six-step capacity planning process.

Show answer 1. Measure current usage (CPU, memory, disk, network, application metrics, queue depths). 2. Model workload drivers (e.g., 1 active user = 3 req/s = 0.002 CPU cores). 3. Predict future demand using growth rates. 4. Plan supply factoring lead time, cost optimization, and step-function scaling. 5. Execute procurement or scaling. 6. Iterate quarterly.

Remember: capacity planning inputs: historical metrics, growth projections, planned business events (launches, campaigns), and seasonality patterns. Review quarterly at minimum.

4. What are the key cost-capacity trade-offs in cloud infrastructure?

Show answer On-demand instances provide instant capacity but at premium cost. Reserved instances save 30-60% but commit for 1-3 years. Spot/preemptible saves 60-90% but can be interrupted. Right strategy: reserved for baseline load, on-demand for predictable peaks, spot for fault-tolerant batch jobs. Over-provisioning wastes money; under-provisioning causes outages. Target: spend enough that you never have a capacity-related outage, but not so much that you waste budget on idle resources.

Remember: capacity planning inputs: historical metrics, growth projections, planned business events (launches, campaigns), and seasonality patterns. Review quarterly at minimum.

5. Why is queue depth a better capacity signal than CPU utilization?

Show answer CPU can be at 50% while requests queue because the bottleneck is elsewhere (disk I/O, database connections, locks). Queue depth directly measures whether work is waiting — if queues grow, the system cannot keep up regardless of what CPU shows. Monitor queue depths for message brokers (Kafka consumer lag, SQS ApproximateNumberOfMessages), thread pools, connection pools, and kernel run queues. Sustained queue growth is the most reliable signal that capacity is insufficient.

Remember: capacity planning inputs: historical metrics, growth projections, planned business events (launches, campaigns), and seasonality patterns. Review quarterly at minimum.

6. Describe one quantitative method for forecasting resource capacity needs.

Show answer Linear regression on historical usage trends — plot resource usage over time, fit a trend line, and project when usage will hit the provisioned threshold. Adjust for seasonal patterns.

Example: use linear regression on 6 months of historical data to project when you'll hit 80% capacity. Add business events (launches, campaigns) as step functions.