Remediation: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation¶
Immediate Fix (DevOps Tooling — Domain C)¶
The fix requires restoring the Vault policy and triggering a credential refresh through the CI/CD pipeline.
Step 1: Restore the Vault policy¶
$ vault policy write eso-order-service - <<'EOF'
path "registry-creds/creds/order-service" {
capabilities = ["read"]
}
path "registry-creds/data/order-service" {
capabilities = ["read"]
}
EOF
Success! Uploaded policy: eso-order-service
# Attach the policy to the ESO's Vault role
$ vault write auth/kubernetes/role/external-secrets-operator \
bound_service_account_names=external-secrets \
bound_service_account_namespaces=external-secrets \
policies=eso-default,eso-order-service,eso-payment-service,eso-inventory-service,eso-user-service \
ttl=1h
Step 2: Force the ExternalSecret to resync¶
$ kubectl annotate externalsecret order-service-regcred -n prod \
force-sync=$(date +%s)
# Wait for sync
$ kubectl get externalsecret order-service-regcred -n prod -w
NAME STORE REFRESH STATUS
order-service-regcred vault 1h SecretSynced
Step 3: Verify the new credentials work¶
$ kubectl get secret regcred-order-service -n prod -o jsonpath='{.data.\.dockerconfigjson}' \
| base64 -d | jq -r '.auths["registry.internal:5000"].password' \
| head -c 20
hvs.CAESINxw2...
# Test authentication
$ kubectl get secret regcred-order-service -n prod -o jsonpath='{.data.\.dockerconfigjson}' \
| base64 -d | jq -r '.auths["registry.internal:5000"] | .username + ":" + .password' \
| xargs -I{} curl -u {} https://registry.internal:5000/v2/
{} # Empty JSON = success
Step 4: Restart the rollout¶
$ kubectl rollout restart deployment/order-service -n prod
deployment.apps/order-service restarted
$ kubectl rollout status deployment/order-service -n prod
Waiting for deployment "order-service" rollout to finish: 1 of 3 updated replicas are available...
deployment "order-service" successfully rolled out
Verification¶
Domain A (Kubernetes) — Deployment healthy¶
$ kubectl get pods -n prod -l app=order-service
NAME READY STATUS RESTARTS AGE
order-service-9d8e7f6a5-k3m2n 1/1 Running 0 2m
order-service-9d8e7f6a5-j8p4q 1/1 Running 0 2m
order-service-9d8e7f6a5-h7r1s 1/1 Running 0 2m
Domain B (Security) — Vault policy active, ESO syncing¶
$ vault policy read eso-order-service
path "registry-creds/creds/order-service" {
capabilities = ["read"]
}
$ kubectl get externalsecret order-service-regcred -n prod
NAME STORE REFRESH STATUS
order-service-regcred vault 1h SecretSynced
Domain C (DevOps Tooling) — Vault policy in IaC¶
# Ensure the policy is also in the Terraform/Vault IaC so it survives future cleanups
$ grep -r "eso-order-service" devops/terraform/modules/vault/
devops/terraform/modules/vault/policies.tf: name = "eso-order-service"
Prevention¶
- Monitoring: Add an ExternalSecret sync status alert. Fire WARNING when any ExternalSecret has
Status != SecretSyncedfor more than 2 hours.
- alert: ExternalSecretSyncFailed
expr: externalsecret_status_condition{condition="SecretSynced",status="False"} == 1
for: 2h
labels:
severity: warning
-
Runbook: Vault policy changes must be tested against all ESO ExternalSecrets before committing. Add a CI check that verifies all ExternalSecrets can authenticate to Vault.
-
Architecture: Define all Vault policies in Terraform/IaC so that manual deletions are detected as drift and automatically corrected. Use Vault Sentinel policies to prevent deletion of policies that are referenced by active Kubernetes auth roles.