Skip to content

Portal | Level: L2: Operations | Topics: TLS & PKI | Domain: Security

Runbook: Certificate Renewal Failed

Symptoms

  • Browser: NET::ERR_CERT_DATE_INVALID
  • curl: SSL certificate problem: certificate has expired
  • kubectl get certificate shows READY: False
  • cert-manager logs show renewal errors

Fast Triage (under 2 minutes)

# Check cert status
kubectl get certificate -A

# Check expiry of the current cert
kubectl get secret <tls-secret> -n <ns> -o jsonpath='{.data.tls\.crt}' | \
  base64 -d | openssl x509 -noout -dates

# Check cert-manager logs for errors
kubectl logs -n cert-manager deploy/cert-manager --tail=50 | grep -i error

Causes and Fixes

1. ACME Challenge Failed (Most Common)

Symptoms: CertificateRequest pending, Challenge in state pending or invalid.

kubectl get challenges -A
kubectl describe challenge <name> -n <ns>

Fixes: - HTTP-01: Check Ingress is serving /.well-known/acme-challenge/, check firewall allows port 80 - DNS-01: Check DNS provider credentials, verify TXT record creation - Rate limit: Check Let's Encrypt rate limit status, use staging issuer for testing

2. cert-manager Pod Down

kubectl get pods -n cert-manager
kubectl describe deploy cert-manager -n cert-manager

Fix: Restart cert-manager, check resource limits.

3. Issuer Misconfigured

kubectl describe clusterissuer letsencrypt-prod
# Check: account key secret exists, email is set, server URL is correct

Fix: Verify Issuer configuration, recreate ACME account if needed.

4. Force Renewal

[!WARNING] Deleting the TLS secret to force re-issuance causes an immediate TLS outage until cert-manager provisions a new certificate. If the underlying issue (ACME challenge, issuer config) is not fixed first, the secret stays deleted and the outage persists. Prefer kubectl cert-manager renew which triggers renewal without deleting the existing cert.

# Using cert-manager kubectl plugin (preferred — no downtime)
kubectl cert-manager renew <certificate-name> -n <ns>

# Or delete the secret to force re-issuance (causes downtime until new cert is issued)
kubectl delete secret <tls-secret> -n <ns>

Verification

# Check new cert dates
kubectl get secret <tls-secret> -n <ns> -o jsonpath='{.data.tls\.crt}' | \
  base64 -d | openssl x509 -noout -dates

# Check from outside
openssl s_client -connect <host>:443 -servername <host> </dev/null 2>/dev/null | \
  openssl x509 -noout -dates

# Check Certificate status
kubectl get certificate <name> -n <ns>
# Should show READY: True

Prevention

  • Alert on certmanager_certificate_expiration_timestamp_seconds - time() < 14 * 24 * 3600
  • Use renewBefore: 360h (15 days) in Certificate spec
  • Monitor cert-manager pod health
  • Test renewal in staging before production

Wiki Navigation