Skip to content

Portal | Level: L1: Foundations | Topics: Helm | Domain: DevOps & Tooling

Runbook: Helm Upgrade Failed

Symptoms

  • helm upgrade returns error or times out
  • Release stuck in pending-upgrade or failed state
  • New pods not rolling out

Fast Triage

helm list -n grokdevops
helm history grokdevops -n grokdevops
helm status grokdevops -n grokdevops
kubectl get pods -n grokdevops
kubectl get events -n grokdevops --sort-by='.lastTimestamp' | tail -20

Likely Causes (ranked)

  1. Bad values — invalid YAML, wrong image tag, missing required field
  2. Template rendering error — Helm template syntax issue
  3. Resource conflict — CRD not installed (e.g., ServiceMonitor without prometheus-operator)
  4. Timeout — pods didn't become ready in time (probe, resource, image issue)
  5. Stuck release — previous failed upgrade left release in bad state

Evidence Interpretation

What bad looks like:

$ helm history grokdevops -n grokdevops
REVISION  STATUS          DESCRIPTION
1         deployed        Install complete
2         failed          Upgrade "grokdevops" failed: timed out waiting for the condition
$ helm list -n grokdevops
NAME        STATUS          REVISION
grokdevops  pending-upgrade 3
- failed status means the upgrade ran but resources did not become healthy in time (or template rendering errored). - pending-upgrade means Helm started the upgrade but never finished — the release is locked and further upgrades will be rejected until you rollback. - Check helm status for the error message and kubectl get events for what went wrong at the Kubernetes level.

Fix Steps

  1. If bad values, test template rendering:
    helm template grokdevops devops/helm/grokdevops -f devops/helm/values-dev.yaml --debug
    
  2. Rollback to last working revision:
    helm rollback grokdevops 0 -n grokdevops  # 0 = previous revision
    
  3. If release is stuck in pending-upgrade:
    helm rollback grokdevops <last-good-revision> -n grokdevops
    
  4. Fix values and retry:
    helm upgrade grokdevops devops/helm/grokdevops -n grokdevops -f devops/helm/values-dev.yaml
    

Verification

helm status grokdevops -n grokdevops  # STATUS: deployed
kubectl rollout status deployment/grokdevops -n grokdevops

Cleanup

Clean up failed revisions:

helm history grokdevops -n grokdevops  # review

Unknown Unknowns

  • Helm stores each release revision as a Secret in the namespace (type helm.sh/release.v1). You can inspect them with kubectl get secrets -l owner=helm.
  • helm rollback does not delete the failed revision — it creates a new revision with the old config. Revision numbers only go up.
  • The --atomic flag on helm upgrade auto-rolls back on failure, preventing stuck pending-upgrade states.
  • Helm uses a 3-way merge (old manifest, new manifest, live state). If someone edited a resource with kubectl edit, the merge can produce surprises.

[!WARNING] Never run helm uninstall to "fix" a failed upgrade. Uninstalling deletes all managed resources (Deployments, Services, PVCs if unprotected). Use helm rollback instead — it creates a new revision with the last known-good config without destroying anything.

Pitfalls

  • Running upgrade again without fixing values — the same bad config will fail again and add another failed revision.
  • Deleting the release instead of rolling backhelm uninstall removes all managed resources (including PVCs if not protected). Use helm rollback instead.
  • Not using --dry-run firsthelm upgrade --dry-run catches template errors and bad values before touching the cluster.

See Also

  • training/library/guides/troubleshooting.md (Helm section)
  • training/interactive/runtime-labs/lab-runtime-05-helm-upgrade-rollback/
  • training/interview-scenarios/05-helm-upgrade-broke-prod.md
  • training/interactive/incidents/scenarios/helm-upgrade-bad-values.sh

Wiki Navigation