Solution¶
Triage¶
- Check pod events:
- Verify the imagePullSecrets reference in the pod spec:
- Check if the referenced secret exists:
- Decode and inspect the secret's credentials:
Root Cause¶
The ops team rotated the private registry credentials as part of quarterly security rotation. The Kubernetes secret registry-creds in the prod namespace still contains the old credentials. When new pods attempt to pull the image, the registry rejects the old token with unauthorized: authentication required.
Existing pods continue running because the image was already pulled and cached on the node. Only new pods (or pods scheduled to nodes without the cached image) fail.
Fix¶
-
Update the secret with new credentials:
-
Alternatively, update in place without deleting:
-
Delete the failing pods to trigger fresh pulls:
-
Verify new pods are running:
Rollback / Safety¶
- If the new credentials are also wrong, the pods will continue in ImagePullBackOff. Test credentials locally first.
- The secret update does not affect running pods; they do not re-pull images unless restarted.
- If the secret is managed by Helm or a GitOps tool, update the source of truth, not just the live object.
Common Traps¶
- Updating the secret in one namespace but not others. If multiple namespaces pull from the same registry, all their secrets need updating.
- Forgetting service account imagePullSecrets. If the secret is attached to a service account rather than the pod spec, check
kubectl get sa default -n prod -o yaml. - Using
imagePullPolicy: IfNotPresentmasks the issue. Pods scheduled to nodes with cached images will work; pods on fresh nodes will fail. The problem appears intermittent. - Not automating secret rotation. Use external-secrets-operator or sealed-secrets to sync credentials from a vault automatically.
- Registry token expiry vs. password rotation. Some registries issue time-limited tokens. Check if the secret contains a token with an expiry rather than a static password.