Capacity Planning - Street-Level Ops¶
Quick Diagnosis Commands¶
# ── CPU ──
mpstat -P ALL 1 5 # Per-core CPU utilization (5 samples)
uptime # Load averages (1, 5, 15 min)
sar -u 1 10 # CPU utilization over 10 seconds
pidstat -u 1 5 # Per-process CPU usage
# ── Memory ──
free -h # Memory summary (human-readable)
vmstat 1 5 # Memory, swap, I/O, CPU in 1-sec intervals
cat /proc/meminfo | head -20 # Detailed memory breakdown
slabtop -o # Kernel slab cache usage
# ── Disk ──
iostat -xz 1 5 # Disk IOPS, throughput, latency, queue depth
df -h # Filesystem space usage
df -i # Inode usage (can exhaust before space)
lsblk # Block device layout
# ── Network ──
sar -n DEV 1 5 # Network throughput per interface
ss -s # Socket statistics summary
nstat # Kernel network counters
ip -s link show eth0 # Interface stats (errors, drops)
# ── Application Level ──
# Prometheus instant queries (via curl or UI)
# Request rate: rate(http_requests_total[5m])
# Latency p99: histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
# Error rate: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])
Pattern: Prometheus-Based Forecasting with predict_linear¶
predict_linear is your most useful capacity planning function. It uses linear regression on a time range to predict future values.
# Alert: disk will be full within 7 days
- alert: DiskSpaceExhaustionPredicted
expr: |
predict_linear(
node_filesystem_avail_bytes{fstype=~"ext4|xfs"}[7d],
7 * 24 * 3600
) < 0
for: 1h
labels:
severity: warning
annotations:
summary: "Disk {{ $labels.mountpoint }} on {{ $labels.instance }} will fill in ~7 days"
# Alert: memory exhaustion within 3 days
- alert: MemoryExhaustionPredicted
expr: |
predict_linear(
node_memory_MemAvailable_bytes[3d],
3 * 24 * 3600
) < 0
for: 1h
labels:
severity: warning
# Dashboard query: when will CPU hit 80%?
# "Days until 80% CPU utilization"
(0.80 - avg(rate(node_cpu_seconds_total{mode!="idle"}[1h])))
/
deriv(avg(rate(node_cpu_seconds_total{mode!="idle"}[1h]))[7d:1h])
/ 86400
Gotcha: predict_linear assumes linear growth. If your growth is exponential, the prediction will be too optimistic (it'll say you have more time than you do).
Debug clue: If
predict_lineargives wildly wrong results, check your input range. A 7-day window with a one-time spike (like a backup job) will skew the linear regression. Use a longer window (14-30 days) to smooth out periodic spikes, or pre-filter withavg_over_timebefore feeding intopredict_linear.
Pattern: Load Testing to Establish Capacity Baseline¶
You can't plan capacity if you don't know your system's limits:
# Step 1: Establish baseline with hey (HTTP load generator)
hey -z 60s -c 50 -q 100 https://api.example.com/health
# 50 concurrent connections, 100 req/s for 60 seconds
# Step 2: Ramp up until latency degrades
for rate in 200 400 600 800 1000 1200; do
echo "=== Testing $rate req/s ==="
hey -z 30s -c 100 -q $rate https://api.example.com/endpoint 2>&1 | \
grep -E '(Requests/sec|Average|99%|Status)'
sleep 10 # cool down
done
# Step 3: Record the inflection point
# Example results:
# 400 rps → p99: 45ms (healthy)
# 600 rps → p99: 52ms (healthy)
# 800 rps → p99: 120ms (latency climbing)
# 1000 rps → p99: 450ms (saturated)
# 1200 rps → p99: 2.1s (degraded)
#
# Conclusion: Capacity = ~800 req/s per instance before degradation
Document results in a table:
| Load (rps) | p50 (ms) | p99 (ms) | CPU % | Error Rate |
|------------|----------|----------|-------|------------|
| 200 | 12 | 28 | 15% | 0% |
| 400 | 14 | 45 | 30% | 0% |
| 600 | 18 | 52 | 48% | 0% |
| 800 | 25 | 120 | 65% | 0.1% |
| 1000 | 85 | 450 | 82% | 1.2% |
| 1200 | 310 | 2100 | 95% | 8.5% |
Pattern: Right-Sizing Kubernetes Containers¶
# Step 1: Get actual CPU usage (last 7 days, 99th percentile)
# PromQL:
quantile_over_time(0.99,
rate(container_cpu_usage_seconds_total{
namespace="production",
container!="POD"
}[5m])[7d:5m]
) by (pod, container)
# Step 2: Compare to requests
kube_pod_container_resource_requests{resource="cpu", namespace="production"}
# Step 3: Kubectl for quick spot-check
kubectl top pods -n production --sort-by=cpu
kubectl top pods -n production --sort-by=memory
# Step 4: VPA recommendations (if VPA is installed)
kubectl get vpa -n production -o yaml | grep -A 10 recommendation
Right-sizing formula:
CPU request = p99 actual usage * 1.2 (20% headroom)
CPU limit = CPU request * 1.5 (burst allowance)
Mem request = p99 actual usage * 1.2
Mem limit = Mem request * 1.3 (less burst needed; OOM is hard failure)
Gotcha: Monitoring Averages Instead of Percentiles¶
Your monitoring shows "average response time: 45ms." Looks great. But 5% of requests take 3 seconds, and those users are churning.
# Bad: average
avg(rate(http_request_duration_seconds_sum[5m]))
/
avg(rate(http_request_duration_seconds_count[5m]))
# Good: percentiles
histogram_quantile(0.50, rate(http_request_duration_seconds_bucket[5m])) # p50
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) # p95
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) # p99
Capacity plan on p99, not p50. If p99 is high, you need more capacity even if the average looks fine.
Remember: Averages hide outliers. A service with 45ms average and 3s p99 means 1 in 100 users waits 60x longer than the median user. If each user session makes 20 API calls, the probability of hitting at least one slow request per session is
1 - 0.99^20 = 18%. Nearly one in five users experiences degraded performance that the average never shows.
Pattern: Correlating Business Metrics to Infrastructure¶
The most useful capacity model links business metrics to resources:
# Step 1: Find the correlation
# Plot "active users" against "CPU cores used" over 30 days
# Usually linear within a range
# Step 2: Establish the ratio
# Example: 1000 active users = 2.3 CPU cores = 4.1 GB RAM
# Step 3: Build the model
# Current: 50,000 users → 115 CPU cores → 205 GB RAM
# Growth: +8% users/month
# In 6 months: 79,000 users → 182 CPU cores → 324 GB RAM
# Step 4: Prometheus recording rules for the ratio
# record: capacity:cpu_per_1k_users
# expr: |
# sum(rate(container_cpu_usage_seconds_total{namespace="production"}[5m]))
# /
# (sum(active_users_gauge) / 1000)
Pattern: Disk Growth Tracking¶
# Track disk usage daily with a cron job
# /usr/local/bin/disk-tracker.sh
#!/bin/bash
DATE=$(date +%Y-%m-%d)
for MOUNT in / /var /data; do
USED=$(df --output=used -B1 "$MOUNT" | tail -1)
echo "$DATE,$MOUNT,$USED" >> /var/log/disk-growth.csv
done
# Analyze: calculate daily growth rate
# awk -F',' '$2=="/data" {if(prev) print $1, ($3-prev)/1073741824, "GB/day"; prev=$3}' \
# /var/log/disk-growth.csv
# Prometheus approach (better):
# rate(node_filesystem_size_bytes{mountpoint="/data"}[7d]) * 86400
# Gives bytes/day growth rate
Gotcha: Ignoring Inode Exhaustion¶
Disk has 500GB free but you can't create files. Inodes are exhausted — common with millions of small files (mail queues, session files, container layers).
# Check inode usage
df -i
# Find directories with massive file counts
find /var -xdev -printf '%h\n' | sort | uniq -c | sort -rn | head -20
# Prevention: monitor inode usage alongside space
# Prometheus: node_filesystem_files_free / node_filesystem_files * 100
Pattern: Quick Capacity Math¶
Useful back-of-envelope calculations:
# Network bandwidth
1 Gbps = ~125 MB/s = ~120,000 req/s at 1KB each
# Disk IOPS
HDD: 100-200 random IOPS, 100-200 MB/s sequential
SSD: 10,000-100,000 random IOPS, 500-3,500 MB/s sequential
NVMe: 100,000-1,000,000 IOPS, 3,000-7,000 MB/s sequential
# Memory bandwidth
DDR4: ~25 GB/s per channel, 2-4 channels = 50-100 GB/s
# Connection limits
Linux default: ~28,000 ephemeral ports (tune net.ipv4.ip_local_port_range)
1 TCP connection = ~3.5KB kernel memory (established state)
1M connections = ~3.5 GB kernel memory for sockets alone
# Pod density
Rule of thumb: 30-110 pods per Kubernetes node (depends on CNI + IP range)
Each pod: ~256KB-1MB kubelet overhead
Scale note: The "1M connections = 3.5 GB kernel memory" calculation only covers socket buffers. Each connection also consumes application-level memory (goroutine stacks, request buffers, connection pool entries). Real-world per-connection overhead is typically 10-50 KB depending on the application, making 1M connections cost 10-50 GB of application memory.
Pattern: The Quarterly Capacity Review¶
Agenda:
1. Current state — resource utilization summary, hotspots
2. Growth review — actual growth vs last quarter's prediction
3. Forecast update — next 3/6/12 month projections
4. Bottleneck ID — which resource hits the wall first?
5. Action items — procure, scale, optimize, or defer
Inputs needed:
- Prometheus/Grafana dashboards for the quarter
- Business metrics (user growth, transaction volume)
- Cost reports (cloud spend by service)
- Upcoming product launches or campaigns
Output:
- Updated capacity spreadsheet with exhaust dates
- Purchase/scaling requests with justification
- Optimization opportunities (right-sizing, cleanup)