Skip to content

Portal | Level: L2: Operations | Topics: Capacity Planning, SRE Practices, Prometheus | Domain: DevOps & Tooling

Capacity Planning - Primer

Why This Matters

Every outage post-mortem you've read that says "unexpected traffic spike" is really saying "we didn't plan capacity." Capacity planning is the discipline that prevents your infrastructure from becoming the bottleneck during the moment it matters most — when your product succeeds.

Good capacity planning means you can answer these questions at any time: - When will we run out of X resource at current growth? - How much headroom do we have for a traffic spike right now? - What does the infrastructure cost look like in 6 months?

If you can't answer those, you're flying blind.


The Four Resource Dimensions

Every system bottleneck lives in one of four dimensions:

 ┌──────────────────────────────────────────────────────┐
 │                    System Resources                   │
 │                                                       │
 │   ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────┐│
 │   │   CPU    │  │  Memory  │  │   Disk   │  │ Net  ││
 │   │          │  │          │  │  (IOPS & │  │      ││
 │   │ compute  │  │ capacity │  │   BW)    │  │ BW & ││
 │   │ cycles   │  │ & speed  │  │          │  │ PPS  ││
 │   └──────────┘  └──────────┘  └──────────┘  └──────┘│
 └──────────────────────────────────────────────────────┘
Dimension Metrics to Track Common Bottleneck Symptom
CPU Utilization %, steal %, load average High latency, slow response
Memory Used/available, swap usage, OOM kills OOM kills, heavy swapping
Disk IOPS, throughput (MB/s), latency, queue Slow writes, I/O wait
Network Bandwidth (Mbps), packets/sec, errors Timeouts, dropped connections

Utilization vs Saturation — The Critical Distinction

These are not the same thing, and confusing them will wreck your capacity model.

Utilization: What percentage of the resource is being used. - "CPU is at 60% utilization" means 60% of cycles are doing work.

Saturation: Whether work is queuing because the resource can't keep up. - "CPU has 14 tasks in the run queue" means processes are waiting.

 Utilization: 70%              Utilization: 70%
 Saturation:  0                Saturation:  HIGH
 ┌──────────────────┐          ┌──────────────────┐
 │ ████████████░░░░ │          │ ████████████░░░░ │
 │                  │          │ Queue: ████████  │
 │ System is fine   │          │ System is hurting│
 └──────────────────┘          └──────────────────┘

A system can be at 70% utilization with zero saturation (steady workload, no queuing) or 70% utilization with massive saturation (bursty workload, requests pile up between bursts).

Plan on saturation, not utilization. A system that hits 100% utilization for 200ms during a burst will queue requests even if the average utilization is 40%.

Analogy: Utilization vs saturation is like a highway. A highway at 70% utilization with evenly spaced cars flows smoothly. A highway at 70% average utilization with rush-hour bursts has bumper-to-bumper traffic for hours. Same average, completely different experience. This is why averages lie in capacity planning.


Forecasting Methods

Linear Extrapolation

The simplest method: draw a line through historical data and extend it.

Usage
 ^
 │                          ╱ (projected)
 │                    ╱  ╱
 │              ╱  ╱
 │        ╱  ╱
 │   ╱ ╱
 │ ╱
 ├─────────────────────────── Capacity limit
 └──────────────────────────────> Time
       Now            Exhaustion
# Prometheus: predict_linear forecasts when a metric will hit a value
# "When will disk fill up based on last 7 days?"
predict_linear(node_filesystem_avail_bytes{mountpoint="/"}[7d], 30*24*3600) < 0
# Returns negative if disk will be full within 30 days

When it works: Steady, consistent growth (e.g., database size on a mature product).

When it lies: Anything with seasonal patterns, step-function growth (marketing campaigns), or logarithmic saturation.

Seasonal Decomposition

Most traffic has patterns: daily peaks, weekly cycles, monthly billing runs.

 Requests/sec
 ^
 │    ╱╲      ╱╲      ╱╲
 │   ╱  ╲    ╱  ╲    ╱  ╲    ← daily peak (2 PM)
 │  ╱    ╲  ╱    ╲  ╱    ╲
 │ ╱      ╲╱      ╲╱      ╲
 │                              ← baseline grows over weeks
 └──────────────────────────────> Time
   Mon    Tue    Wed    Thu

Decompose your metric into: 1. Trend — the long-term direction (growing, flat, declining) 2. Seasonal — the repeating pattern (daily, weekly, monthly) 3. Residual — the noise

Plan capacity for: trend + seasonal_peak + safety_margin

Growth Rate Modeling

Current usage:     800 req/s peak
Monthly growth:    12%
Planning horizon:  6 months

Month 1:  800 * 1.12 =   896 req/s
Month 2:  896 * 1.12 = 1,003 req/s
Month 3: 1003 * 1.12 = 1,124 req/s
Month 4: 1124 * 1.12 = 1,259 req/s
Month 5: 1259 * 1.12 = 1,410 req/s
Month 6: 1410 * 1.12 = 1,579 req/s

Compound growth is deceptive. 12% monthly = 2x in 6 months = 4x in a year.

Remember: The Rule of 72 -- divide 72 by your growth rate percentage to get the doubling time. 12% monthly growth: 72/12 = 6 months to double. 10% monthly: 72/10 = 7.2 months. This mental math shortcut lets you estimate exhaustion dates in your head during meetings.

Formula: future = current * (1 + growth_rate) ^ months


Headroom Planning

Headroom is the buffer between your current peak usage and your capacity limit. You need it for:

  1. Organic growth between capacity reviews
  2. Traffic spikes (flash sales, news coverage, DDoS)
  3. Failure scenarios (lose one node, traffic redistributes to survivors)
  4. Operational overhead (deployments, compactions, migrations)

The N+1 / N+2 Model

 ┌──────────────────────────────────────────┐
 │  Cluster: 4 nodes, each handles 1000 rps │
 │  Total capacity: 4000 rps                 │
 │  Current peak: 2800 rps                   │
 │                                           │
 │  N+1: Can survive 1 node failure          │
 │  → 3 nodes must handle 2800 rps           │
 │  → Each node: 933 rps (93% util) ← tight! │
 │                                           │
 │  N+2: Can survive 2 node failures         │
 │  → 2 nodes must handle 2800 rps           │
 │  → Each node: 1400 rps ← over capacity!  │
 │                                           │
 │  Decision: Add nodes so N+1 is safe       │
 │  → 5 nodes: each handles 700 rps at peak  │
 │  → Lose 1: 4 nodes at 700 rps = fine      │
 └──────────────────────────────────────────┘

Headroom Rules of Thumb

Resource Target Max Utilization Why
CPU 60-70% Burst headroom, GC pauses, deployment
Memory 70-80% Page cache, fork overhead, safety
Disk 70-75% Compaction, log spikes, recovery space
Network 50-60% Retransmissions, burst absorption

These are starting points. Tune based on your workload's burstiness.


Burst Capacity

Sustained throughput and burst throughput are different numbers:

# Example: A 4-core system
# Sustained: 2000 req/s (50% CPU, steady state)
# Burst (10s): 3500 req/s (90% CPU, queues build)
# Burst (60s): 2800 req/s (70% CPU, some queueing)
# Burst (5min): 2400 req/s (caches warm, 60% CPU)

Design for: - Sustained capacity > normal peak traffic - Burst capacity > 2x normal peak (for flash crowds) - Drain rate > arrival rate (queues must eventually empty)

If your burst capacity equals your sustained capacity, you have no shock absorber.


The Capacity Planning Process

 ┌─────────────┐     ┌──────────────┐     ┌──────────────┐
 │  1. Measure  │ ──▶ │  2. Model     │ ──▶ │  3. Predict   │
 │  Current     │     │  Workload     │     │  Future       │
 │  Usage       │     │  Drivers      │     │  Demand       │
 └─────────────┘     └──────────────┘     └──────────────┘
        │                                         │
        │                                         ▼
 ┌──────▼──────┐                          ┌──────────────┐
 │  6. Iterate  │ ◀────────────────────── │  4. Plan      │
 │  (quarterly) │                          │  Supply       │
 └─────────────┘                          └──────┬───────┘
                                          ┌──────▼───────┐
                                          │  5. Execute   │
                                          │  (procure,    │
                                          │   scale)      │
                                          └──────────────┘

Step 1: Measure

Instrument everything. At minimum:

- CPU utilization (per-core and aggregate)
- Memory used/available (not just "free")
- Disk IOPS + throughput + latency + space
- Network bandwidth + packet rate + errors
- Application-level: requests/sec, latency p50/p95/p99, error rate
- Queue depths: connection pool, message queues, thread pools

Step 2: Model Workload Drivers

Find what drives your resource consumption:

"1 active user = 3 req/s = 0.002 CPU cores = 50MB RAM"
"1 message published = 0.5ms CPU + 4KB disk write + 1KB network"

Step 3: Predict Future Demand

Current: 10,000 active users → 30,000 req/s → 20 CPU cores
Growth: +15% users/month
In 6 months: 23,000 users → 69,000 req/s → 46 CPU cores
Add headroom (30%): 60 CPU cores needed

Step 4: Plan Supply

Match infrastructure to predicted demand. Factor in: - Lead time for procurement (cloud: minutes; on-prem: weeks to months) - Cost optimization (reserved instances, committed use discounts) - Step-function scaling (you can't buy half a server)


Right-Sizing Containers

Containers make capacity planning both easier (flexible) and harder (death by a thousand paper cuts).

# Prometheus: Find containers that request 2 CPU but use 0.3
avg(rate(container_cpu_usage_seconds_total[5m])) by (pod)
/
avg(kube_pod_container_resource_requests{resource="cpu"}) by (pod)
# If this ratio is < 0.3, the container is massively over-provisioned

# Memory: actual vs requested
avg(container_memory_working_set_bytes) by (pod)
/
avg(kube_pod_container_resource_requests{resource="memory"}) by (pod)

Right-sizing rules: - Requests = p99 of actual usage + 20% headroom - Limits = requests * 1.5 (or 2x for bursty workloads) - Review monthly. Usage patterns shift.


The Capacity Planning Spreadsheet

Even with fancy tools, a spreadsheet model often communicates best to leadership:

| Resource      | Current | Peak  | Capacity | Headroom | Exhaust Date |
|---------------|---------|-------|----------|----------|--------------|
| API CPU       | 42 cores| 58 c  | 80 cores | 27%      | Aug 2026     |
| DB Memory     | 180 GB  | 210 GB| 256 GB   | 18%      | Jun 2026     |
| Disk (data)   | 3.2 TB  | -     | 5 TB     | 36%      | Nov 2026     |
| Disk IOPS     | 12,000  | 18,000| 25,000   | 28%      | Sep 2026     |
| Network (ext) | 2.4 Gbps| 4.1 Gb| 10 Gbps  | 59%      | 2027+        |

Update this quarterly. Present it to leadership. The resource with the earliest exhaust date is your priority.

Interview tip: When asked about capacity planning, lead with the exhaust-date table. Interviewers want to see that you can translate technical metrics into business-relevant timelines. "We'll run out of database memory in June" is actionable; "memory is at 70%" is not.


Key Takeaways

  1. Measure all four dimensions: CPU, memory, disk, network. Your bottleneck will be the one you forgot.
  2. Plan on peaks and saturation, never averages. Averages hide the pain.
  3. Compound growth is deceptive — 10% monthly is 3x annually.
  4. Headroom is not waste. It's your buffer for spikes, failures, and operations.
  5. N+1 minimum for any service that matters. Prove it by testing a node failure.
  6. Right-size containers by measuring actual usage, not guessing.
  7. A simple spreadsheet with exhaust dates communicates better than a Grafana dashboard to the people who approve budgets.

Fun fact: Google's original "USE method" (Utilization, Saturation, Errors) for capacity analysis was formalized by Brendan Gregg at Netflix. It provides a systematic checklist: for every resource, check utilization, saturation, and errors. This prevents the common mistake of only checking the resource you suspect while the real bottleneck hides in a dimension you forgot.


Wiki Navigation

Prerequisites