Skip to content

Answer Key: The Certificate That Works Sometimes

The System

An API platform serving external partners and internal services over TLS:

[Browsers] ----\
[Python apps] ----> [nginx-ingress] --TLS--> [api.megacorp.io]
[Java apps] ---/         |                         |
[Go services]-/     [TLS Secret: api-tls]    [api-service pods]
                         |
                    Certificate chain:
                    - Leaf: CN=api.megacorp.io (PRESENT)
                    - Intermediate: R11 (MISSING)
                    - Root: ISRG Root X1 (in trust stores)

[cert-manager] --issues--> [Certificate: api-tls-cert] --stores--> [Secret: api-tls]
     |
[Let's Encrypt ACME]

What's Broken

Root cause: The TLS secret (api-tls) contains only the leaf certificate without the intermediate certificate (Let's Encrypt R11). The nginx ingress controller serves whatever certificate chain is in the secret. Without the intermediate, the chain is incomplete.

Why some clients work and others don't: - Browsers (Chrome, Firefox): Implement AIA (Authority Information Access) fetching — they follow the caIssuers URL in the leaf certificate to download the missing intermediate. They also cache intermediates aggressively. - curl on macOS: Uses the system trust store which may have the intermediate cached from previous connections. - Python requests, Go net/http, Java HttpClient: Do NOT implement AIA fetching. They require the server to present the full chain. When the intermediate is missing, they cannot build a path from the leaf to a trusted root, and verification fails.

The 14% failure rate corresponds to the proportion of traffic from programmatic API clients (partner integrations, internal microservices) vs browser traffic.

Key clue: The openssl s_client output shows depth=0 only — no depth=1 intermediate in the chain. The error verify error:num=21:unable to verify the first certificate confirms the chain is incomplete.

The Fix

Immediate (rebuild the secret with full chain)

# Check what is in the current secret
kubectl get secret api-tls -n api -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -subject -issuer

# Count certificates in the chain (should be 2, is probably 1)
kubectl get secret api-tls -n api -o jsonpath='{.data.tls\.crt}' | base64 -d | grep -c "BEGIN CERTIFICATE"

# Force cert-manager to re-issue with full chain
kubectl delete secret api-tls -n api
kubectl annotate certificate api-tls-cert -n api cert-manager.io/issue-temporary-certificate="true" --overwrite

# Or trigger re-issuance
cmctl renew api-tls-cert -n api

Permanent

  1. Verify cert-manager is configured to include the full chain. Modern cert-manager versions include the full chain by default. If using an older version:

    apiVersion: cert-manager.io/v1
    kind: Certificate
    metadata:
      name: api-tls-cert
      namespace: api
    spec:
      secretName: api-tls
      issuerRef:
        name: letsencrypt-prod
        kind: ClusterIssuer
      # Ensure full chain is stored (default in cert-manager >= 1.7)
      # For older versions, check the ACME issuer config
    

  2. Add a chain validation check to the deployment pipeline:

    # Verify the chain has at least 2 certificates
    CERT_COUNT=$(kubectl get secret api-tls -n api -o jsonpath='{.data.tls\.crt}' | base64 -d | grep -c "BEGIN CERTIFICATE")
    if [ "$CERT_COUNT" -lt 2 ]; then
      echo "ERROR: TLS chain incomplete — only $CERT_COUNT certificate(s) in chain"
      exit 1
    fi
    

  3. Add external SSL monitoring:

    # Use SSL Labs or similar to continuously verify chain completeness
    curl -s "https://api.ssllabs.com/api/v3/analyze?host=api.megacorp.io" | jq '.endpoints[0].grade'
    

Verification

# Verify chain completeness
openssl s_client -connect api.megacorp.io:443 -servername api.megacorp.io 2>/dev/null | grep -A2 "Certificate chain"
# Should show depth=0 (leaf) AND depth=1 (intermediate)

# Test with a strict client
python3 -c "import requests; r = requests.get('https://api.megacorp.io/health'); print(r.status_code)"

# Test with openssl verify
openssl s_client -connect api.megacorp.io:443 -servername api.megacorp.io 2>/dev/null | openssl x509 -noout -text | grep "CA Issuers"

Artifact Decoder

Artifact What It Revealed What Was Misleading
CLI Output openssl s_client shows depth=0 only (missing intermediate); cert-manager says Ready Certificate shows Ready/True — everything looks green in Kubernetes
Metrics 0% browser failures, 14% API client failures = partial chain validation Zero 502s from nginx — the ingress is serving responses fine; the TLS error happens before HTTP
IaC Snippet Standard cert-manager + ingress setup — config looks correct The YAML is textbook-perfect; the issue is in the generated secret contents, not the config
Log Lines Java SSLHandshakeException confirms certificate chain validation failure cert-manager "re-queuing item due to optimistic locking" sounds alarming but is normal controller behavior

Skills Demonstrated

  • Understanding TLS certificate chains (leaf, intermediate, root)
  • Knowing how different TLS clients handle incomplete chains (AIA vs strict)
  • Using openssl s_client to diagnose chain issues
  • Understanding cert-manager certificate lifecycle in Kubernetes
  • Recognizing that "works in browser" does not mean "works for all clients"

Prerequisite Topic Packs