Skip to content

Pattern: Cgroup Soft/Hard Limit Confusion

ID: FP-005 Family: Resource Exhaustion Frequency: Common Blast Radius: Single Container/Pod Detection Difficulty: Subtle

The Shape

Cgroups have two memory limit mechanisms: memory.high (soft — throttles allocation, triggers reclaim but doesn't kill) and memory.limit_in_bytes / memory.max (hard — triggers OOM kill). Kubernetes requests map to soft guarantees; limits map to the hard ceiling. Operators who treat limits as "expected usage" rather than "absolute maximum" set them too close to actual usage, leaving no room for spikes.

How You'll See It

In Kubernetes

Pod is OOMKilled during traffic spikes or GC cycles but runs fine under normal load. kubectl top pod shows RSS well below limits under normal conditions. The spike is brief — a large request, a GC pause, a batch job — and briefly exceeds the limit by a small margin. Because the limit is a hard ceiling, the pod is killed even for a 5% overage.

In Linux/Infrastructure

cgroup v2 memory.high is set to 80% of the container's intended memory. When the application exceeds this, the kernel throttles allocations (process slows). When it exceeds memory.max, the OOM killer fires. The throttling signal is often invisible in application metrics.

In CI/CD

Build steps that compile large projects (e.g., LLVM, Webpack) spike memory during linking. If the build container has limits set to the average compile memory, the linker step OOMKills sporadically — only for large builds.

The Tell

Pod is OOMKilled but RSS under normal load is significantly below the limit. Spikes (GC, large requests, batch) briefly push above the limit. The ratio of limit to request is 1:1 or very close.

Common Misdiagnosis

Looks Like But Actually How to Tell the Difference
Memory leak Spike exceeds tight limit RSS returns to baseline after OOMKill; leak would show persistent growth
Application bug Limit too tight No crash log or exception — exit code 137 only
Flaky behavior Deterministic spike at limit OOMKills correlate with specific operations (GC, large payload)

The Fix (Generic)

  1. Immediate: Increase limits.memory to 1.5x–2x the observed peak RSS.
  2. Short-term: Profile the application's memory under realistic peak load; use VPA (Vertical Pod Autoscaler) recommendations to set requests/limits.
  3. Long-term: Set requests to the 95th percentile of steady-state usage; set limits to 1.5x–2x of the 99th percentile spike. Add a container_memory_working_set_bytes alert at 85% of limit.

Real-World Examples

  • Example 1: Java service, limits: 512Mi, requests: 512Mi. Normal RSS: 420Mi. During full GC, the JVM needs extra buffer to copy objects. A 520Mi GC spike triggers OOMKill. Fix: set limits to 768Mi.
  • Example 2: Node.js service with limits: 256Mi. Under normal load, 180Mi. On Black Friday, large shopping carts caused 270Mi peaks. OOMKills spiked on the busiest 10-second windows. Fix: limits to 512Mi, requests remain 256Mi.

War Story

We set requests == limits for "predictability." The ops team told us it was best practice for QoS Guaranteed class. What we didn't understand was that GC in our JVM spiked memory by 30% for 100–200ms during compaction. Guaranteed QoS didn't save us — the OOM killer doesn't care about QoS class, only about whether you exceeded the hard limit. We changed to requests=256Mi, limits=512Mi. OOMKills dropped to zero.

Cross-References