Skip to content

Pattern: Memory Limit Equals Request

ID: FP-035 Family: Configuration Landmine Frequency: Common Blast Radius: Single Pod to Single Service Detection Difficulty: Moderate

The Shape

Setting resources.limits.memory equal to resources.requests.memory in Kubernetes provides no headroom for memory spikes. Under normal load, the pod runs fine. Under peak load, a GC pause, a large payload, or a traffic spike briefly pushes RSS above the limit. The OOM killer fires immediately. The pod is killed and restarted. What was a brief traffic spike becomes a service disruption.

How You'll See It

In Kubernetes

resources:
  requests:
    memory: "256Mi"
  limits:
    memory: "256Mi"    # ← no headroom
Under normal load: 220Mi RSS. During a traffic spike: 265Mi for 50ms. Exit code 137. kubectl describe pod shows OOMKilled. Pod restarts; traffic spike is gone by the time the pod starts; pod runs fine for hours until the next spike.

Normal kubectl top pods output looks fine (220Mi). The OOMKill only happens during spikes that don't show up in average memory metrics.

In Linux/Infrastructure

cgroup memory.limit_in_bytes set to the 95th percentile of observed usage. The 99th percentile spike (GC, large request, batch processing) occasionally hits the hard limit. Process is killed. No visible trend in memory metrics — the kill is a point event.

In CI/CD

Build container with memory: 512Mi limit and request. Large compilation step (linking a large binary) uses 520Mi for 3 seconds. Container OOMKilled mid-compile. Build fails with no obvious error (just a "container exited with code 137").

The Tell

Pod OOMKilled despite kubectl top showing average memory well below the limit. Exit code 137 occurs intermittently, correlated with traffic spikes or specific operations. requests.memory == limits.memory in the pod spec. The kill timestamp correlates with a brief spike in request rate or payload size.

Common Misdiagnosis

Looks Like But Actually How to Tell the Difference
Memory leak Spike exceeds tight limit RSS returns to baseline after OOMKill; a leak would show monotonic growth
Random crash Deterministic spike at ceiling OOMKills correlate with specific load patterns
Flaky behavior Limit too tight Behavior is consistent: always fails at peak load

The Fix (Generic)

  1. Immediate: Increase limits.memory to 1.5x–2x the observed peak RSS.
  2. Short-term: Set requests.memory = 95th-percentile steady-state; set limits.memory = 150%–200% of requests.
  3. Long-term: Use VPA (Vertical Pod Autoscaler) in recommendation mode to observe actual usage patterns; set limits based on 99th-percentile spikes with 20% buffer.

Real-World Examples

  • Example 1: Java service: requests=512Mi, limits=512Mi. During GC (full collection): 530Mi for 100ms. OOMKilled 3–4 times per day during peak traffic. Changed to limits=768Mi: zero OOMKills.
  • Example 2: Python ML inference: requests=2Gi, limits=2Gi. Model loading during warm-up: 2.1Gi. Every pod restart failed at startup. Changed to limits=3Gi: service started successfully.

War Story

We had "QoS Guaranteed" class pods (requests == limits) because we were told it was best practice for predictability. Our on-call was paged every Monday morning: OOMKills at 9am. Monday morning meant EU users coming online while US users were active. RSS would briefly spike from 480Mi to 510Mi. Pod would die. Restart took 30s. In those 30 seconds, the other pods got the extra traffic, also spiked, also died. We were crashing pods every Monday in a small cascade. Changed limits to 768Mi. Monday OOMKills: zero. "QoS Guaranteed" is only "best practice" if your limits are actually above your peak usage.

Cross-References

  • Topic Packs: k8s-ops, linux-memory-management
  • Footguns: k8s-ops/footguns.md — "Memory limit == request too tight"
  • Related Patterns: FP-004 (OOM without swap — the kill mechanism), FP-005 (cgroup soft/hard confusion — same underlying issue)