Pattern: Cgroup Soft/Hard Limit Confusion¶
ID: FP-005 Family: Resource Exhaustion Frequency: Common Blast Radius: Single Container/Pod Detection Difficulty: Subtle
The Shape¶
Cgroups have two memory limit mechanisms: memory.high (soft — throttles allocation,
triggers reclaim but doesn't kill) and memory.limit_in_bytes / memory.max (hard —
triggers OOM kill). Kubernetes requests map to soft guarantees; limits map to the
hard ceiling. Operators who treat limits as "expected usage" rather than "absolute
maximum" set them too close to actual usage, leaving no room for spikes.
How You'll See It¶
In Kubernetes¶
Pod is OOMKilled during traffic spikes or GC cycles but runs fine under normal load.
kubectl top pod shows RSS well below limits under normal conditions. The spike is brief
— a large request, a GC pause, a batch job — and briefly exceeds the limit by a small
margin. Because the limit is a hard ceiling, the pod is killed even for a 5% overage.
In Linux/Infrastructure¶
cgroup v2 memory.high is set to 80% of the container's intended memory. When the
application exceeds this, the kernel throttles allocations (process slows). When it
exceeds memory.max, the OOM killer fires. The throttling signal is often invisible
in application metrics.
In CI/CD¶
Build steps that compile large projects (e.g., LLVM, Webpack) spike memory during
linking. If the build container has limits set to the average compile memory, the
linker step OOMKills sporadically — only for large builds.
The Tell¶
Pod is OOMKilled but RSS under normal load is significantly below the limit. Spikes (GC, large requests, batch) briefly push above the limit. The ratio of limit to request is 1:1 or very close.
Common Misdiagnosis¶
| Looks Like | But Actually | How to Tell the Difference |
|---|---|---|
| Memory leak | Spike exceeds tight limit | RSS returns to baseline after OOMKill; leak would show persistent growth |
| Application bug | Limit too tight | No crash log or exception — exit code 137 only |
| Flaky behavior | Deterministic spike at limit | OOMKills correlate with specific operations (GC, large payload) |
The Fix (Generic)¶
- Immediate: Increase
limits.memoryto 1.5x–2x the observed peak RSS. - Short-term: Profile the application's memory under realistic peak load; use VPA (Vertical Pod Autoscaler) recommendations to set requests/limits.
- Long-term: Set
requeststo the 95th percentile of steady-state usage; setlimitsto 1.5x–2x of the 99th percentile spike. Add acontainer_memory_working_set_bytesalert at 85% of limit.
Real-World Examples¶
- Example 1: Java service,
limits: 512Mi,requests: 512Mi. Normal RSS: 420Mi. During full GC, the JVM needs extra buffer to copy objects. A 520Mi GC spike triggers OOMKill. Fix: set limits to 768Mi. - Example 2: Node.js service with
limits: 256Mi. Under normal load, 180Mi. On Black Friday, large shopping carts caused 270Mi peaks. OOMKills spiked on the busiest 10-second windows. Fix: limits to 512Mi, requests remain 256Mi.
War Story¶
We set requests == limits for "predictability." The ops team told us it was best practice for QoS Guaranteed class. What we didn't understand was that GC in our JVM spiked memory by 30% for 100–200ms during compaction. Guaranteed QoS didn't save us — the OOM killer doesn't care about QoS class, only about whether you exceeded the hard limit. We changed to requests=256Mi, limits=512Mi. OOMKills dropped to zero.
Cross-References¶
- Topic Packs: linux-memory-management, k8s-ops
- Case Studies: linux_ops/oom-killer-events/
- Footguns: k8s-ops/footguns.md — "Memory limit == request too tight"
- Related Patterns: FP-004 (OOM without swap — the kill mechanism), FP-035 (tight memory limit — the configuration cause)