cert-manager Footguns¶

Mistakes that cause certificate issuance failures, unexpected expiry, outages, and rate limit exhaustion.

1. Using Let's Encrypt Production for Development/Testing¶

You set up cert-manager, point it at Let's Encrypt production, and iterate on your Ingress configuration while debugging DNS. You exhaust the rate limit (50 certs per domain per week) before you get your configuration right. Now your production services can't renew their certificates for a week.

Fix: Always use Let's Encrypt staging for non-production workloads. Staging certs are not browser-trusted but they have 10x the rate limits. Only point at production after you have confirmed everything works end-to-end.

# Staging issuer for dev/test
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory

2. Wildcard Certificate With HTTP-01 Issuer¶

You create a Certificate resource for *.example.com and reference a ClusterIssuer configured for HTTP-01. The Challenge immediately goes to invalid state: "Wildcard domain names require dns01". You do not understand why it failed until you spend 45 minutes reading ACME spec documentation.

Fix: Wildcard certificates require DNS-01 validation. Period. Let's Encrypt enforces this. Either switch to a DNS-01 issuer for wildcards or enumerate specific subdomains for HTTP-01.

# Wildcards: always use dns01
spec:
  dnsNames:
    - "*.example.com"
  issuerRef:
    name: letsencrypt-dns01   # must be DNS-01 configured
    kind: ClusterIssuer

3. Ingress Auth Middleware Blocking HTTP-01 Challenge Path¶

Your Ingress is protected by oauth2-proxy or basic auth. The Let's Encrypt validation server tries to reach /.well-known/acme-challenge/ and receives a 401 or 302 redirect to your auth provider. The challenge fails. You see "Error: 403 urn:ietf:params:acme:error:unauthorized" and spend hours checking DNS.

Fix: The ACME challenge path must bypass authentication. Add an nginx configuration snippet to pass the challenge path directly, or switch to DNS-01 to avoid the HTTP reachability requirement entirely.

# Nginx snippet to bypass auth for challenge path
nginx.ingress.kubernetes.io/configuration-snippet: |
  location ^~ /.well-known/acme-challenge/ {
    auth_request off;
  }

4. Cert Not Picked Up by Pods After Renewal¶

cert-manager renews the certificate and updates the Kubernetes Secret. But pods that mounted the Secret as a volume at startup time are still using the old certificate. They are not restarted. Traffic fails six hours later when the old cert expires and the pod is still serving it.

Fix: Kubernetes volumes backed by Secrets do eventually sync — but this can take up to a minute by default (kubelet sync period). For applications that cache the cert in memory (e.g., nginx, Envoy), you must restart or send a signal. Use Reloader (Stakater) or wave to watch Secrets and trigger rolling restarts:

helm install reloader stakater/reloader

Then annotate the Deployment:

metadata:
  annotations:
    reloader.stakater.com/auto: "true"
    # or specific secret:
    secret.reloader.stakater.com/reload: "myapp-tls"

5. Deleting the ClusterIssuer ACME Account Key Secret¶

The ClusterIssuer stores its ACME account key in a Secret (e.g., letsencrypt-prod-account-key). If you delete this Secret — to "clean up" or during a namespace migration — cert-manager loses the ACME account registration. It tries to re-register and sometimes gets a rate-limited or confused state. Pending certificates stall.

Fix: Never delete ACME account key Secrets. Treat them as persistent state. Back them up:

kubectl get secret letsencrypt-prod-account-key -n cert-manager -o yaml > \
  backup/letsencrypt-prod-account-key.yaml

If you accidentally delete it: delete the ClusterIssuer and recreate it. cert-manager will register a fresh ACME account.

6. Not Monitoring Certificate Expiry — Relying Only on Auto-Renewal¶

cert-manager will renew automatically — unless it can't. DNS-01 provider credentials expire. IAM roles get rotated. Issuers go into error state. You find out the renewal failed only when the certificate expires and users start seeing TLS errors.

Fix: Always monitor certificate expiry externally from cert-manager itself. Three layers:

Prometheus alert on certmanager_certificate_expiration_timestamp_seconds < now() + 14 days
External synthetic monitor (e.g., curl -v https://yourdomain.com from outside the cluster, alert if cert expires < 7 days)
kubectl get certificate -A in a weekly ops review

Never trust auto-renewal without independent monitoring.

7. Certificate Stuck Because DNS-01 Credentials Have Wrong IAM Permissions¶

You configure Route53 DNS-01 with an IAM user. The Challenge resource sits in pending for hours. cert-manager logs show InvalidClientTokenId or AccessDenied. You assume it's a cert-manager bug, open a GitHub issue, and lose two hours before checking IAM.

Fix: Before setting up the ClusterIssuer, validate the credentials manually:

# Test Route53 access with the exact permissions cert-manager needs
aws sts get-caller-identity --profile cert-manager-user

# cert-manager needs these specific Route53 actions:
# route53:GetChange
# route53:ChangeResourceRecordSets
# route53:ListResourceRecordSets
# route53:ListHostedZonesByName (if not specifying hostedZoneID)

# Test a record change
aws route53 change-resource-record-sets \
  --hosted-zone-id Z1XXXXX \
  --change-batch '{"Changes": [{"Action": "UPSERT", "ResourceRecordSet": {"Name": "_test.example.com", "Type": "TXT", "TTL": 1, "ResourceRecords": [{"Value": "\"test\""}]}}]}'

8. Using `Issuer` Instead of `ClusterIssuer` Across Namespaces¶

Your Certificate resource in the production namespace references an Issuer in the cert-manager namespace. It doesn't work. Issuers are namespace-scoped — a Certificate in production can only use an Issuer in the same namespace. cert-manager logs show "referenced Issuer not found".

Fix: Use ClusterIssuer for anything that needs to span namespaces. Use Issuer only when the issuer configuration is truly namespace-specific (e.g., different Let's Encrypt credentials per namespace, internal namespace CA).

# Wrong — Issuer in different namespace won't work
issuerRef:
  name: letsencrypt-prod
  kind: Issuer     # ← namespace-scoped

# Correct for cross-namespace use
issuerRef:
  name: letsencrypt-prod
  kind: ClusterIssuer   # ← cluster-scoped

9. Apex Domain Excluded From Wildcard Certificate¶

You create a certificate for *.example.com expecting it to cover example.com too. It does not. Wildcard certificates match subdomains only — *.example.com covers app.example.com but not example.com. The apex domain gets a TLS error.

Fix: Always include the apex domain explicitly alongside the wildcard:

spec:
  dnsNames:
    - "*.example.com"
    - example.com      # apex must be listed separately

10. Importing cert-manager into an Existing Cluster Without Installing CRDs First¶

You run helm install cert-manager jetstack/cert-manager without --set installCRDs=true. The cert-manager pods come up but immediately start logging "no kind is registered for the type...". Certificate resources you apply return "no matches for kind Certificate in group cert-manager.io/v1". Nothing works.

Fix: Always install CRDs before or during cert-manager installation:

# Option 1: Install CRDs via Helm flag
helm install cert-manager jetstack/cert-manager \
  --set installCRDs=true

# Option 2: Install CRDs separately (better for GitOps — avoids Helm CRD upgrade issues)
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.4/cert-manager.crds.yaml

11. Vault PKI Issuer Fails Because of Token Expiry¶

You configure cert-manager with a Vault token for PKI signing. It works on day one. On day 32, the Vault token expires. Every Certificate renewal fails silently. The cert-manager logs show 403 permission denied from Vault. You do not notice until a certificate expires.

Fix: Use Vault Kubernetes auth instead of static tokens. cert-manager can use the pod's service account token to authenticate to Vault, and Vault will issue a short-lived token automatically. This never expires as long as the service account exists:

spec:
  vault:
    server: https://vault.example.com
    path: pki/sign/example-com
    auth:
      kubernetes:
        mountPath: /v1/auth/kubernetes
        role: cert-manager   # Vault role that maps to the cert-manager SA
        secretRef:
          name: cert-manager-vault-token
          key: token

12. Not Backing Up the CA Secret for CA Issuers¶

You use cert-manager's CA issuer, signing with your own root CA. The CA's private key is in a Kubernetes Secret. The cluster is destroyed (DR, migration, accidental deletion). You recreate the cluster and reinstall cert-manager, but you cannot recreate the CA Secret — you never exported the key. All previously issued certificates are now unverifiable against a CA that no longer exists.

Fix: The CA private key Secret is critical state — treat it like an HSM key. Export it and store it in a secrets manager (Vault, AWS Secrets Manager) or encrypt it and store in a secure backup:

# Export the CA secret
kubectl get secret my-ca-secret -n cert-manager -o yaml | \
  kubeseal --format yaml > sealed-ca-secret.yaml   # or encrypt with SOPS

# Or export raw and store in Vault
kubectl get secret my-ca-secret -n cert-manager \
  -o jsonpath='{.data.tls\.key}' | base64 -d | \
  vault kv put secret/cert-manager/ca-key value=-