Skip to content

Symptoms: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values

Domains: kubernetes_ops | observability | devops_tooling Level: L2 Estimated time: 30-45 min

Initial Alert

Prometheus Alertmanager fires at 14:32 UTC:

FIRING: KubePodCrashLooping
  pod: payment-service-7f8b9c6d4-xk2nm
  namespace: prod
  container: payment-service
  reason: OOMKilled
  restarts: 7 in last 30 minutes

The oncall dashboard shows:

payment-service pod restart rate: 14/hour (baseline: 0)
payment-service error rate: 23% (baseline: 0.1%)
payment-service p99 latency: 12.4s (baseline: 180ms)

Observable Symptoms

  • The payment-service container is being OOMKilled every 2-4 minutes.
  • kubectl top pod shows the payment-service container using 245Mi out of its 256Mi limit.
  • The application logs show normal operation right up until the kill — no memory-related errors from the app itself.
  • The Deployment has 3 replicas, and all 3 are experiencing the same OOMKill cycle.
  • Recent deploy: a new version of payment-service was deployed 2 hours ago via Helm. The app team says "no memory-related changes in this release."

The Misleading Signal

The OOMKill events clearly point to the payment-service container. The Kubernetes events say OOMKilled with container name payment-service. The natural response is to investigate the application for a memory leak — check heap dumps, review recent code changes, look at garbage collection logs. The app team's release notes show only a minor API endpoint addition, nothing memory-intensive. The engineer is now deep in application profiling.