Portal | Level: L2: Operations | Topics: TLS & PKI | Domain: Security
Runbook: Certificate Renewal Failed¶
Symptoms¶
- Browser:
NET::ERR_CERT_DATE_INVALID - curl:
SSL certificate problem: certificate has expired kubectl get certificateshowsREADY: False- cert-manager logs show renewal errors
Fast Triage (under 2 minutes)¶
# Check cert status
kubectl get certificate -A
# Check expiry of the current cert
kubectl get secret <tls-secret> -n <ns> -o jsonpath='{.data.tls\.crt}' | \
base64 -d | openssl x509 -noout -dates
# Check cert-manager logs for errors
kubectl logs -n cert-manager deploy/cert-manager --tail=50 | grep -i error
Causes and Fixes¶
1. ACME Challenge Failed (Most Common)¶
Symptoms: CertificateRequest pending, Challenge in state pending or invalid.
Fixes:
- HTTP-01: Check Ingress is serving /.well-known/acme-challenge/, check firewall allows port 80
- DNS-01: Check DNS provider credentials, verify TXT record creation
- Rate limit: Check Let's Encrypt rate limit status, use staging issuer for testing
2. cert-manager Pod Down¶
Fix: Restart cert-manager, check resource limits.
3. Issuer Misconfigured¶
kubectl describe clusterissuer letsencrypt-prod
# Check: account key secret exists, email is set, server URL is correct
Fix: Verify Issuer configuration, recreate ACME account if needed.
4. Force Renewal¶
[!WARNING] Deleting the TLS secret to force re-issuance causes an immediate TLS outage until cert-manager provisions a new certificate. If the underlying issue (ACME challenge, issuer config) is not fixed first, the secret stays deleted and the outage persists. Prefer
kubectl cert-manager renewwhich triggers renewal without deleting the existing cert.
# Using cert-manager kubectl plugin (preferred — no downtime)
kubectl cert-manager renew <certificate-name> -n <ns>
# Or delete the secret to force re-issuance (causes downtime until new cert is issued)
kubectl delete secret <tls-secret> -n <ns>
Verification¶
# Check new cert dates
kubectl get secret <tls-secret> -n <ns> -o jsonpath='{.data.tls\.crt}' | \
base64 -d | openssl x509 -noout -dates
# Check from outside
openssl s_client -connect <host>:443 -servername <host> </dev/null 2>/dev/null | \
openssl x509 -noout -dates
# Check Certificate status
kubectl get certificate <name> -n <ns>
# Should show READY: True
Prevention¶
- Alert on
certmanager_certificate_expiration_timestamp_seconds - time() < 14 * 24 * 3600 - Use
renewBefore: 360h(15 days) in Certificate spec - Monitor cert-manager pod health
- Test renewal in staging before production
Wiki Navigation¶
Related Content¶
- Case Study: BMC Clock Skew Cert Failure (Case Study, L2) — TLS & PKI
- Case Study: DNS Looks Broken — TLS Expired, Fix Is Cert-Manager (Case Study, L2) — TLS & PKI
- Case Study: Deployment Stuck — ImagePull Auth Failure, Vault Secret Rotation (Case Study, L2) — TLS & PKI
- Case Study: SSL Cert Chain Incomplete (Case Study, L1) — TLS & PKI
- Case Study: User Auth Failing — OIDC Cert Expired, Cloud KMS Rotation (Case Study, L2) — TLS & PKI
- Deep Dive: TLS Handshake (deep_dive, L2) — TLS & PKI
- HTTP Protocol (Topic Pack, L0) — TLS & PKI
- Interview: Certificate Expired (Scenario, L2) — TLS & PKI
- Networking Deep Dive (Topic Pack, L1) — TLS & PKI
- Nginx & Web Servers (Topic Pack, L1) — TLS & PKI
Pages that link here¶
- Decision Tree: Service Returning 5xx Errors
- HTTP Protocol
- Nginx & Web Servers
- Operational Runbooks
- Runbook: TLS Certificate Expiry
- Scenario: TLS Certificate Expired
- Security Domain
- Symptoms: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
- TLS & Certificates Ops
- TLS & Certificates Ops - Primer
- TLS & PKI - Skill Check
- TLS & PKI Drills
- TLS Handshake Deep Dive
- TLS Works From Some Clients But Fails From Others
- cert-manager — Street-Level Ops