Helm Ops¶
12 cards — 🟢 4 easy | 🟡 4 medium | 🔴 4 hard
🟢 Easy (4)¶
1. You changed a template but helm install gives a YAML parse error referencing a line number that does not match your template. Why?
Show answer
The error line number refers to the rendered output, not the template source. Use helm template --debug to see the fully rendered YAML with line numbers and locate the actual breakage (usually a wrong nindent value or unquoted value injection).2. You run --set replicaCount=true intending the string "true" but the template receives a boolean. How do you force a string?
Show answer
Use --set-string replicaCount=true instead of --set. Helm's --set infers YAML types automatically (true becomes boolean, 123 becomes integer). The --set-string flag forces the value to remain a string regardless of content.3. Your CI/CD pipeline runs helm install on every deploy and fails on the second run. What single command fixes this?
Show answer
Use helm upgrade --install (the idempotent form). It installs the release if it does not exist, or upgrades it if it does. Combine with --atomic and --timeout for safe CI/CD deploys.4. How do you retrieve the exact Kubernetes manifests that Helm applied for a specific release revision?
Show answer
Run helm get manifest🟡 Medium (4)¶
1. Your pre-upgrade hook Job fails on retry because the previous Job resource still exists. What annotation fixes this?
Show answer
Add helm.sh/hook-delete-policy: before-hook-creation to the Job metadata. This tells Helm to delete the previous hook resource before creating a new one, preventing name-collision failures on retry.2. Explain the difference between --wait and --atomic on helm upgrade. When does --atomic add value over --wait alone?
Show answer
--wait makes Helm wait for resources to become ready before marking success, but a failed deploy stays in the failed state. --atomic implies --wait and additionally auto-rolls back to the previous revision on failure or timeout. --atomic prevents the cluster from being left in a half-upgraded state.3. An SRE manually scaled a Deployment to 5 replicas via kubectl. The Helm chart says 3 replicas. On next helm upgrade (chart unchanged), what happens to the replica count?
Show answer
It stays at 5. Helm 3 uses a three-way merge comparing old manifest, new manifest, and live state. Since the old and new chart manifests both say 3 (no change in that field), Helm does not patch it. But if the chart changes replicas to 4, Helm patches it to 4, overwriting the manual edit.4. A chart hardcodes namespace: monitoring in a ServiceMonitor template instead of using .Release.Namespace. What operational problem does this cause?
Show answer
Helm tracks resources in the release namespace, but the ServiceMonitor is created in the monitoring namespace. When you run helm uninstall, Helm does not delete the ServiceMonitor because it only cleans up resources tracked in its release secret. The resource becomes orphaned. Always use {{ .Release.Namespace }} in templates.🔴 Hard (4)¶
1. A release is stuck in pending-upgrade after a deploy crashed. helm rollback also fails. What is the recovery procedure?
Show answer
Check helm history2. You have a pre-upgrade hook Job running database migrations that is not idempotent. The Job fails partway through, and the SRE retries the upgrade. What goes wrong and how should the hook be redesigned?
Show answer
The migration runs again from the start, potentially re-applying already-completed steps and corrupting data. Redesign: make migrations idempotent (use IF NOT EXISTS, migration versioning tables), set backoffLimit: 0 or 1 to prevent Kubernetes-level retries of the broken Job, add hook-delete-policy: before-hook-creation for clean retry, and set a hook-weight if ordering among multiple hooks matters.3. How does the helm-diff plugin help catch problems before a helm upgrade, and what is its key limitation regarding out-of-band changes?
Show answer
helm diff upgrade4. Your chart has a dependency on redis version 17.x. A colleague runs helm dependency update and the Chart.lock changes from 17.3.2 to 17.5.0. The deploy fails in staging. How should you manage dependency versions to prevent this?