Solution: Lab Runtime 05 -- Helm Upgrade Rollback¶
SPOILER WARNING: Try to solve it yourself first.
Hint Ladder¶
Hint 1: After the bad upgrade, check helm history grokdevops -n grokdevops. What does the latest revision's status say?
Hint 2: Check kubectl get pods -n grokdevops. Are pods in ImagePullBackOff? That means the image tag doesn't exist.
Hint 3: Use helm rollback to revert to the last working revision.
Hint 4: helm rollback grokdevops -n grokdevops (without a revision number rolls back to previous).
Minimal Solution¶
helm rollback grokdevops -n grokdevops
kubectl rollout status deployment/grokdevops -n grokdevops --timeout=120s
Explain¶
Symptom: After helm upgrade, pods are in ImagePullBackOff or ErrImagePull. The app is down.
Evidence: helm history shows latest revision as failed or deployed (with broken pods). kubectl describe pod shows "Failed to pull image" with a nonexistent tag.
Root cause: The Helm upgrade used a values file with a nonexistent image tag. Helm created a new revision and updated the Deployment, which tried to roll out pods with the bad image. Since the image doesn't exist in the registry, pods fail to start.
Key insight: Helm stores each revision as a Secret in the namespace. helm rollback reads the previous revision's manifests and re-applies them. This is a 3-way merge: previous -> current -> desired. The rollback creates a new revision (it doesn't delete the failed one).
Prevent¶
- Use
helm upgrade --dry-runbefore applying - Validate image tags exist in the registry as a CI step
- Set
--atomicflag to auto-rollback on failure:helm upgrade --atomic - Set
--timeoutto fail fast instead of waiting