Skip to content

Lab 7: Pod Debugging

Field Value
Tier 2 — Kubernetes Core
Estimated Time 45 minutes
Prerequisites k3s cluster, kubectl
Auto-Grade Yes

Scenario

You inherit a Kubernetes namespace from a departing colleague. It contains ten pod specifications that were supposed to form a working microservices stack. Every single one is broken in a different way. Some fail to pull images because the tag is wrong. Others crash immediately due to OOM kills. Several have misconfigured probes that cause endless restart loops. A few reference secrets or ConfigMaps that do not exist. One has an RBAC issue that prevents it from reading a required API resource.

Your manager wants all ten pods green by end of day. You need to diagnose each failure using kubectl describe, kubectl logs, and kubectl events, then fix the underlying issue. No pod should be restarting, crash-looping, or stuck in Pending.

Objectives

  • Fix the pod with ImagePullBackOff (wrong image tag)
  • Fix the pod with OOMKilled (memory limit too low)
  • Fix the pod with failing liveness probe (wrong port)
  • Fix the pod with failing readiness probe (wrong path)
  • Fix the pod referencing a missing Secret
  • Fix the pod referencing a missing ConfigMap
  • Fix the pod with wrong command (entrypoint error)
  • Fix the pod stuck in Pending (missing node selector label)
  • Fix the pod with security context issues (read-only root filesystem)
  • All 9 pods are Running and Ready (not restarting)

Setup

./setup.sh

Creates namespace lab-pod-debug with 9 broken pod specifications.

Hints

Hint 1: Diagnosing pod issues Start with `kubectl get pods -n lab-pod-debug` to see the state, then `kubectl describe pod -n lab-pod-debug` for events, and `kubectl logs -n lab-pod-debug` for container output.
Hint 2: ImagePullBackOff Check the image name and tag in the pod spec. Use `kubectl describe` to see the exact pull error. Common fix: correct the tag to a valid version.
Hint 3: OOMKilled The container needs more memory. Increase `.resources.limits.memory`. Check `kubectl describe pod` for the OOM event and last known memory usage.
Hint 4: Missing Secret/ConfigMap Create the missing resource. Check the pod spec for the expected name and keys. `kubectl create secret generic --from-literal=key=value -n lab-pod-debug`
Hint 5: Pending pods `kubectl describe pod` will show scheduling failures. If it says "node selector didn't match", either add the label to a node or remove the selector from the pod.

Grading

./grade.sh

Solution

See the solution/ directory for fixes for each broken pod.