Portal | Level: L1: Foundations | Topics: Kubernetes Core | Domain: Kubernetes
kubectl Debugging Cheat Sheet¶
Dense command reference grouped by scenario. One-line explanations. Cleanup snippets included.
For deeper coverage:
training/interactive/exercises/(k8s track)
Pod Status & Lifecycle¶
kubectl get pods -n grokdevops # List pods with status
kubectl get pods -n grokdevops -o wide # Include node + IP
kubectl get pods -n grokdevops -w # Watch for changes
kubectl get pods -A # All namespaces
kubectl get pods -n grokdevops --sort-by=.status.startTime # Sort by start time
kubectl get pods -n grokdevops -l app.kubernetes.io/name=grokdevops # By label
Pod Logs¶
kubectl logs -n grokdevops deploy/grokdevops # Current logs
kubectl logs -n grokdevops deploy/grokdevops --previous # Previous container (after crash)
kubectl logs -n grokdevops deploy/grokdevops -f # Follow/stream
kubectl logs -n grokdevops deploy/grokdevops --tail=100 # Last 100 lines
kubectl logs -n grokdevops deploy/grokdevops --since=5m # Last 5 minutes
kubectl logs -n grokdevops -l app.kubernetes.io/name=grokdevops --all-containers # All pods
Describe & Events¶
kubectl describe pod -n grokdevops -l app.kubernetes.io/name=grokdevops # Full pod detail
kubectl describe deployment grokdevops -n grokdevops # Deployment detail
kubectl get events -n grokdevops --sort-by='.lastTimestamp' # Recent events
kubectl get events -n grokdevops --field-selector reason=Failed # Failed events only
kubectl get events -n grokdevops --field-selector reason=OOMKilling # OOM events
Exec Into Pods¶
kubectl exec -n grokdevops deploy/grokdevops -- sh # Shell into pod
kubectl exec -n grokdevops deploy/grokdevops -- cat /etc/resolv.conf # Check DNS config
kubectl exec -n grokdevops deploy/grokdevops -- wget -qO- http://localhost:8000/health # Test endpoint
kubectl exec -n grokdevops deploy/grokdevops -- env # Check env vars
kubectl exec -n grokdevops deploy/grokdevops -- ps aux # Check processes
Debug Pods (Ephemeral)¶
# Run a debug pod in the namespace
kubectl run debug -n grokdevops --rm -i --restart=Never --image=busybox:1.36 -- sh
# DNS test
kubectl run dns-test -n grokdevops --rm -i --restart=Never --image=busybox:1.36 -- nslookup grokdevops
# Curl test
kubectl run curl-test -n grokdevops --rm -i --restart=Never --image=curlimages/curl -- curl -s http://grokdevops/health
Deployments & Rollouts¶
kubectl get deploy -n grokdevops # List deployments
kubectl rollout status deployment/grokdevops -n grokdevops # Rollout status
kubectl rollout history deployment/grokdevops -n grokdevops # Revision history
kubectl rollout undo deployment/grokdevops -n grokdevops # Rollback to previous
kubectl rollout restart deployment/grokdevops -n grokdevops # Restart all pods
kubectl scale deployment grokdevops -n grokdevops --replicas=3 # Scale manually
# Undo scale: kubectl scale deployment grokdevops -n grokdevops --replicas=1
Services & Endpoints¶
kubectl get svc -n grokdevops # List services
kubectl get endpoints -n grokdevops # Check endpoints (IPs)
kubectl describe svc grokdevops -n grokdevops # Service detail
kubectl port-forward -n grokdevops svc/grokdevops 8000:80 # Port forward
# Stop: Ctrl+C
Resource Usage (requires metrics-server)¶
kubectl top nodes # Node CPU/memory
kubectl top pods -n grokdevops # Pod CPU/memory
kubectl top pods -n grokdevops --sort-by=memory # Sort by memory
kubectl top pods -A # All namespaces
HPA (Horizontal Pod Autoscaler)¶
kubectl get hpa -n grokdevops # HPA status
kubectl describe hpa grokdevops -n grokdevops # HPA detail + events
kubectl get hpa -n grokdevops -w # Watch scaling
ConfigMaps & Secrets¶
kubectl get configmap -n grokdevops # List ConfigMaps
kubectl describe configmap <name> -n grokdevops # View ConfigMap
kubectl get secret -n grokdevops # List secrets
kubectl get secret <name> -n grokdevops -o jsonpath='{.data.key}' | base64 -d # Decode secret
Helm¶
helm list -n grokdevops # List releases
helm history grokdevops -n grokdevops # Release history
helm status grokdevops -n grokdevops # Release status
helm get values grokdevops -n grokdevops # Current values
helm get manifest grokdevops -n grokdevops # Rendered manifests
helm template grokdevops devops/helm/grokdevops -f devops/helm/values-dev.yaml --debug # Debug template
helm rollback grokdevops 0 -n grokdevops # Rollback to previous
Observability Stack¶
# Prometheus
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090
# → http://localhost:9090/targets
# Grafana
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80
# → http://localhost:3000 (admin/prom-operator)
# Loki
kubectl port-forward -n monitoring svc/loki 3100:3100
curl http://localhost:3100/ready
# Check all monitoring pods
kubectl get pods -n monitoring
# ServiceMonitor
kubectl get servicemonitor -n grokdevops -o yaml
Network Debugging¶
kubectl get networkpolicy -n grokdevops # List policies
kubectl describe networkpolicy -n grokdevops # Policy detail
kubectl get ingress -n grokdevops # List ingress rules
kubectl describe ingress -n grokdevops # Ingress detail
# Test connectivity from pod
kubectl exec -n grokdevops deploy/grokdevops -- wget -qO- --timeout=3 http://grokdevops/health
RBAC¶
kubectl auth can-i --list -n grokdevops # What can I do?
kubectl auth can-i get pods -n grokdevops # Specific check
kubectl auth can-i get pods --as=system:serviceaccount:grokdevops:default -n grokdevops # As SA
kubectl get roles,rolebindings -n grokdevops # Namespace RBAC
kubectl get clusterroles,clusterrolebindings | grep grokdevops # Cluster RBAC
JSON Path & Output Formatting¶
kubectl get deploy grokdevops -n grokdevops -o yaml # Full YAML
kubectl get deploy grokdevops -n grokdevops -o json # Full JSON
kubectl get pods -n grokdevops -o jsonpath='{.items[*].metadata.name}' # Pod names
kubectl get deploy grokdevops -n grokdevops -o jsonpath='{.spec.template.spec.containers[0].image}' # Image
kubectl get pods -n grokdevops -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.phase}{"\n"}{end}' # Name + phase table
kubectl get pods -n grokdevops -o custom-columns='NAME:.metadata.name,STATUS:.status.phase,RESTARTS:.status.containerStatuses[0].restartCount' # Custom columns
jq & yq Patterns¶
# jq: filter and transform JSON output
kubectl get pods -n grokdevops -o json | jq -r '.items[].metadata.name' # Pod names
kubectl get pods -n grokdevops -o json | jq '.items[] | select(.status.phase != "Running")' # Non-running pods
kubectl get pods -n grokdevops -o json | jq -r '.items[] | "\(.metadata.name)\t\(.status.phase)"' # Name + phase
kubectl get pods -n grokdevops -o json | jq '[.items[] | .status.containerStatuses[]? | select(.restartCount > 0)] | length' # Count pods with restarts
kubectl get events -n grokdevops -o json | jq -r '.items | sort_by(.lastTimestamp) | .[-5:] | .[].message' # Last 5 event messages
kubectl get deploy -n grokdevops -o json | jq -r '.items[] | "\(.metadata.name)\t\(.spec.replicas)\t\(.status.readyReplicas // 0)"' # Deploy replica status
# yq: filter and transform YAML output directly (no JSON round-trip)
kubectl get deploy grokdevops -n grokdevops -o yaml | yq '.spec.template.spec.containers[0].image' # Image
kubectl get deploy grokdevops -n grokdevops -o yaml | yq '.spec.template.spec.containers[0].resources' # Resource limits
kubectl get cm -n grokdevops -o yaml | yq '.items[].metadata.name' # ConfigMap names
kubectl get svc grokdevops -n grokdevops -o yaml | yq '.spec.ports' # Service ports
kubectl get deploy grokdevops -n grokdevops -o yaml | yq '.spec.template.spec.containers[0].env' # Env vars
# yq: edit manifests in-place (useful for patching files before apply)
yq -i '.spec.replicas = 3' deployment.yaml # Change replica count
yq -i '.metadata.labels.version = "v2"' deployment.yaml # Add/update label
yq -i 'del(.spec.template.spec.containers[0].resources.limits)' deployment.yaml # Remove limits
# Combine: diff two deployments
diff <(kubectl get deploy app-v1 -n grokdevops -o yaml | yq 'del(.metadata.managedFields)') \
<(kubectl get deploy app-v2 -n grokdevops -o yaml | yq 'del(.metadata.managedFields)')
Cleanup & Nuclear Options¶
# Redeploy app from Helm (resets all patches)
helm upgrade grokdevops devops/helm/grokdevops -n grokdevops -f devops/helm/values-dev.yaml
# Remove everything this repo deployed
make undeploy-all
# Delete stuck pods
kubectl delete pod <name> -n grokdevops --grace-period=0 --force
# Remove chaos artifacts
kubectl delete networkpolicy -n grokdevops -l chaos=true
kubectl delete pod -n grokdevops -l chaos
Wiki Navigation¶
Related Content¶
- Adversarial Interview Gauntlet (30 sequences) (Scenario, L2) — Kubernetes Core
- Case Study: Alert Storm — Flapping Health Checks (Case Study, L2) — Kubernetes Core
- Case Study: Canary Deploy Routing to Wrong Backend — Ingress Misconfigured (Case Study, L2) — Kubernetes Core
- Case Study: CrashLoopBackOff No Logs (Case Study, L1) — Kubernetes Core
- Case Study: DNS Looks Broken — TLS Expired, Fix Is Cert-Manager (Case Study, L2) — Kubernetes Core
- Case Study: DaemonSet Blocks Eviction (Case Study, L2) — Kubernetes Core
- Case Study: Deployment Stuck — ImagePull Auth Failure, Vault Secret Rotation (Case Study, L2) — Kubernetes Core
- Case Study: Drain Blocked by PDB (Case Study, L2) — Kubernetes Core
- Case Study: HPA Flapping — Metrics Server Clock Skew, Fix Is NTP (Case Study, L2) — Kubernetes Core
- Case Study: ImagePullBackOff Registry Auth (Case Study, L1) — Kubernetes Core