Skip to content

Solution: Lab Runtime 05 -- Helm Upgrade Rollback

SPOILER WARNING: Try to solve it yourself first.


Hint Ladder

Hint 1: After the bad upgrade, check helm history grokdevops -n grokdevops. What does the latest revision's status say?

Hint 2: Check kubectl get pods -n grokdevops. Are pods in ImagePullBackOff? That means the image tag doesn't exist.

Hint 3: Use helm rollback to revert to the last working revision.

Hint 4: helm rollback grokdevops -n grokdevops (without a revision number rolls back to previous).


Minimal Solution

helm rollback grokdevops -n grokdevops
kubectl rollout status deployment/grokdevops -n grokdevops --timeout=120s

Explain

Symptom: After helm upgrade, pods are in ImagePullBackOff or ErrImagePull. The app is down.

Evidence: helm history shows latest revision as failed or deployed (with broken pods). kubectl describe pod shows "Failed to pull image" with a nonexistent tag.

Root cause: The Helm upgrade used a values file with a nonexistent image tag. Helm created a new revision and updated the Deployment, which tried to roll out pods with the bad image. Since the image doesn't exist in the registry, pods fail to start.

Key insight: Helm stores each revision as a Secret in the namespace. helm rollback reads the previous revision's manifests and re-applies them. This is a 3-way merge: previous -> current -> desired. The rollback creates a new revision (it doesn't delete the failed one).


Prevent

  • Use helm upgrade --dry-run before applying
  • Validate image tags exist in the registry as a CI step
  • Set --atomic flag to auto-rollback on failure: helm upgrade --atomic
  • Set --timeout to fail fast instead of waiting

See Also