Skip to content

Portal | Level: L2: Operations | Topics: Secrets Management | Domain: Security

Runbook: Secret Rotation (Zero Downtime)

When to Use

  • Credential leaked to Git or logs
  • Scheduled rotation (compliance)
  • Suspected compromise
  • Employee offboarding with access to secrets

Pre-Flight Check

# Identify which services use the secret
kubectl get pods -A -o json | jq -r '
  .items[] | select(.spec.volumes[]?.secret.secretName == "db-creds") |
  "\(.metadata.namespace)/\(.metadata.name)"'

# Or via environment variables
kubectl get pods -A -o json | jq -r '
  .items[] | select(.spec.containers[].envFrom[]?.secretRef.name == "db-creds") |
  "\(.metadata.namespace)/\(.metadata.name)"'

Rotation Procedure

If Credential Is Compromised (Emergency)

# STEP 1: Rotate NOW (don't investigate first)
# Generate new credential
NEW_PASSWORD=$(openssl rand -base64 24)

# STEP 2: Update in the credential source
# Vault:
vault kv put secret/db-creds password=$NEW_PASSWORD

# AWS Secrets Manager:
aws secretsmanager update-secret --secret-id prod/db-password \
  --secret-string $NEW_PASSWORD

# STEP 3: Update Kubernetes Secret
kubectl create secret generic db-creds \
  --from-literal=password=$NEW_PASSWORD \
  --dry-run=client -o yaml | kubectl apply -f -

# STEP 4: Update the database password
psql -U admin -c "ALTER USER app_user PASSWORD '$NEW_PASSWORD';"

# STEP 5: Rolling restart all consuming pods
kubectl rollout restart deployment svc-1 svc-2 svc-3 -n production

# STEP 6: Verify
kubectl logs deploy/svc-1 -n production --tail=10 | grep -i "error\|connect"

If Scheduled Rotation (Planned)

# STEP 1: Create new credential alongside old one
# (dual-credential period)

# STEP 2: Update secret with new credential
kubectl create secret generic db-creds \
  --from-literal=password=$NEW_PASSWORD \
  --dry-run=client -o yaml | kubectl apply -f -

# STEP 3: Rolling restart ONE service to verify
kubectl rollout restart deployment svc-1 -n production
kubectl rollout status deployment svc-1 -n production

# STEP 4: If verified, restart remaining services
kubectl rollout restart deployment svc-2 svc-3 -n production

# STEP 5: Revoke old credential
psql -U admin -c "ALTER USER app_user PASSWORD '$NEW_PASSWORD';"

# STEP 6: Verify no service uses old credential
# (monitor for auth errors for 30 minutes)

If Using External Secrets Operator

# ESO auto-syncs from the external store
# Just update the external store, then:

# Force immediate sync
kubectl annotate externalsecret db-creds \
  force-sync=$(date +%s) -n production

# Verify K8s Secret updated
kubectl get secret db-creds -n production -o jsonpath='{.metadata.resourceVersion}'

# Restart pods to pick up new secret
kubectl rollout restart deployment -l uses-db=true -n production

Verification

# Check for auth errors
kubectl logs -l app=svc-1 -n production --tail=50 | grep -i "auth\|denied\|refused"

# Check application health
kubectl get pods -n production
curl -s https://api.example.com/health | jq .status

Post-Rotation

  • Audit: check database access logs during the exposure window
  • Clean Git history if credential was in a commit
  • Update any hardcoded references (CI/CD variables, documentation)
  • Add pre-commit hook to prevent future leaks
  • Schedule next rotation

Wiki Navigation