Skip to content

Solution

Triage

  1. Check pod events:
    kubectl describe pod -n prod -l app=checkout-service | grep -A 5 Events
    
  2. Verify the imagePullSecrets reference in the pod spec:
    kubectl get pod <pod-name> -n prod -o jsonpath='{.spec.imagePullSecrets}'
    
  3. Check if the referenced secret exists:
    kubectl get secret registry-creds -n prod
    
  4. Decode and inspect the secret's credentials:
    kubectl get secret registry-creds -n prod -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d | jq .
    

Root Cause

The ops team rotated the private registry credentials as part of quarterly security rotation. The Kubernetes secret registry-creds in the prod namespace still contains the old credentials. When new pods attempt to pull the image, the registry rejects the old token with unauthorized: authentication required.

Existing pods continue running because the image was already pulled and cached on the node. Only new pods (or pods scheduled to nodes without the cached image) fail.

Fix

  1. Update the secret with new credentials:

    kubectl delete secret registry-creds -n prod
    kubectl create secret docker-registry registry-creds \
      --docker-server=registry.company.io \
      --docker-username=svc-k8s-pull \
      --docker-password='<new-rotated-password>' \
      --docker-email=devops@company.io \
      -n prod
    

  2. Alternatively, update in place without deleting:

    kubectl create secret docker-registry registry-creds \
      --docker-server=registry.company.io \
      --docker-username=svc-k8s-pull \
      --docker-password='<new-rotated-password>' \
      -n prod --dry-run=client -o yaml | kubectl apply -f -
    

  3. Delete the failing pods to trigger fresh pulls:

    kubectl delete pods -n prod -l app=checkout-service
    

  4. Verify new pods are running:

    kubectl get pods -n prod -l app=checkout-service -w
    

Rollback / Safety

  • If the new credentials are also wrong, the pods will continue in ImagePullBackOff. Test credentials locally first.
  • The secret update does not affect running pods; they do not re-pull images unless restarted.
  • If the secret is managed by Helm or a GitOps tool, update the source of truth, not just the live object.

Common Traps

  • Updating the secret in one namespace but not others. If multiple namespaces pull from the same registry, all their secrets need updating.
  • Forgetting service account imagePullSecrets. If the secret is attached to a service account rather than the pod spec, check kubectl get sa default -n prod -o yaml.
  • Using imagePullPolicy: IfNotPresent masks the issue. Pods scheduled to nodes with cached images will work; pods on fresh nodes will fail. The problem appears intermittent.
  • Not automating secret rotation. Use external-secrets-operator or sealed-secrets to sync credentials from a vault automatically.
  • Registry token expiry vs. password rotation. Some registries issue time-limited tokens. Check if the secret contains a token with an expiry rather than a static password.