Solution: Lab Runtime 08 -- Resource Limits OOM¶

SPOILER WARNING: Try to solve it yourself first.

Hint Ladder¶

Hint 1: The pod is crashing repeatedly. Check the termination reason -- is it OOMKilled?

Hint 2: kubectl describe pod <pod> -n grokdevops -- look at "Last State: Terminated, Reason: OOMKilled".

Hint 3: The memory limit was set to 4Mi, which is far too low for a Python application. The kernel's OOM killer terminates the process when it exceeds the cgroup memory limit.

Hint 4: Increase the memory limit to a reasonable value (e.g., 128Mi request, 256Mi limit). Run ./fix.sh or patch manually.

Minimal Solution¶

kubectl patch deployment grokdevops -n grokdevops --type=json \
  -p='[{"op":"replace","path":"/spec/template/spec/containers/0/resources/limits/memory","value":"256Mi"},{"op":"replace","path":"/spec/template/spec/containers/0/resources/requests/memory","value":"128Mi"}]'
kubectl rollout status deployment/grokdevops -n grokdevops --timeout=120s

Explain¶

Symptom: Pod shows CrashLoopBackOff with increasing restart count. kubectl describe shows Reason: OOMKilled.

Evidence: kubectl describe pod shows "Last State: Terminated" with "Reason: OOMKilled" and "Exit Code: 137" (128 + SIGKILL=9).

Root cause: Kubernetes enforces memory limits via Linux cgroups. When a container's memory usage exceeds resources.limits.memory, the kernel's OOM killer sends SIGKILL (signal 9) to the container's main process. The container exits with code 137, and kubelet reports it as OOMKilled. The 4Mi limit is far below what a Python runtime needs just to initialize.

Key insight: OOMKilled is a kernel-level kill, not a Kubernetes decision. K8s just sets the cgroup limit; the kernel enforces it. Exit code 137 = 128 + 9 (SIGKILL). This is different from the pod being evicted due to node memory pressure.

Prevent¶

Set resource requests based on profiled application behavior
Set limits with headroom (2x request is a common starting point)
Monitor container memory usage in Grafana to right-size limits
Use kubectl top pods to observe actual memory consumption before setting limits
Consider Vertical Pod Autoscaler (VPA) for automatic right-sizing

Solution: Lab Runtime 08 -- Resource Limits OOM¶

Hint Ladder¶

Minimal Solution¶

Explain¶

Prevent¶

See Also¶

Pages that link here¶