Skip to content

Answer Key: The Session Store That Keeps Dying

The System

An e-commerce platform stores user sessions in a Redis instance running as a single pod in Kubernetes. The web application (multiple pods) connects to Redis to manage shopping cart state, login sessions, and checkout flow data.

[Web App Pods (312 clients)] --> [Redis Pod (session-cache)] --> [In-Memory Only]
         |                              |
    /api/checkout              namespace: ecommerce
    /api/cart                  48K session keys
    /api/login                 ~56MB memory used

Redis is configured as a pure ephemeral cache: no persistence (save "", appendonly no), no authentication, no replication. This is a single-instance session store — no HA.

What's Broken

Root cause: The Redis container has a memory limit of 64Mi (67,108,864 bytes), and Redis memory usage has reached 58,720,256 bytes (~56MB). With maxmemory-policy noeviction, Redis refuses to evict any keys. As new sessions are created, Redis memory grows until it hits the 64Mi container limit. The Linux kernel then OOM-kills the Redis process (exit code 137). Kubernetes restarts it, it reloads (empty, since no persistence), clients reconnect and quickly refill memory, and the cycle repeats.

Key clue: The kubectl describe output shows OOMKilled with exit code 137 and a memory limit of only 64Mi. The Prometheus metrics confirm Redis is using 56MB of that 64MB budget — only 8MB of headroom with 48K keys and growing.

The Fix

Immediate (stop the bleeding)

  1. Increase the memory limit to something reasonable for a session store:

    kubectl patch deployment session-cache -n ecommerce \
      --type='json' -p='[
        {"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/memory", "value": "256Mi"},
        {"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/memory", "value": "256Mi"}
      ]'
    

  2. Set a proper eviction policy to prevent unbounded growth:

    kubectl exec -n ecommerce deploy/session-cache -- redis-cli CONFIG SET maxmemory-policy allkeys-lru
    kubectl exec -n ecommerce deploy/session-cache -- redis-cli CONFIG SET maxmemory 200mb
    

Permanent (fix in Helm values)

Update values-prod.yaml:

sessionCache:
  resources:
    limits:
      memory: 256Mi
      cpu: 500m
    requests:
      memory: 256Mi
      cpu: 250m
  master:
    configuration: |-
      maxmemory 200mb
      maxmemory-policy allkeys-lru
      save ""
      appendonly no

Verification

# Confirm pod is stable
kubectl get pods -n ecommerce -l app=session-cache -w

# Check Redis memory config
kubectl exec -n ecommerce deploy/session-cache -- redis-cli INFO memory | grep maxmemory

# Check eviction policy
kubectl exec -n ecommerce deploy/session-cache -- redis-cli CONFIG GET maxmemory-policy

# Monitor memory usage
kubectl exec -n ecommerce deploy/session-cache -- redis-cli INFO memory | grep used_memory_human

Artifact Decoder

Artifact What It Revealed What Was Misleading
CLI Output OOMKilled + 64Mi limit = memory limit too low for workload The CPU limit (250m) is a distraction — this is a memory problem
Metrics 56MB used with 48K keys and 0 evictions = memory only grows 312 clients looks alarming but is normal for a shared session store
IaC Snippet noeviction policy is the root config error; no persistence means data loss on restart save "" and appendonly no look suspicious but are correct for an ephemeral cache
Log Lines Webapp latency spike confirms Redis is degraded before crashing "Can't save in background: fork" is a red herring — persistence is disabled, this is a stale config artifact

Skills Demonstrated

  • Reading Kubernetes pod status and recognizing OOM exit codes
  • Correlating container resource limits with application memory metrics
  • Understanding Redis memory policies and their operational implications
  • Connecting Helm values to runtime behavior
  • Distinguishing red herring log lines from diagnostic clues

Prerequisite Topic Packs