Skip to content

Portal | Level: L2: Operations | Topics: TLS & PKI | Domain: Security

TLS & PKI Drills

Remember: The TLS handshake in 4 steps: ClientHello (supported ciphers + SNI hostname) -> ServerHello (chosen cipher + certificate) -> Key Exchange (agree on session key) -> Encrypted Data. Most TLS failures happen at step 2: wrong cert, expired cert, or CA not trusted. Mnemonic: "CSKE" — Client, Server, Key, Encrypted.

Debug clue: openssl s_client -connect host:443 -servername host is the Swiss Army knife for TLS debugging. It shows the full certificate chain, expiry dates, cipher negotiated, and any verification errors. Add -showcerts to see intermediate certificates — a missing intermediate is the #1 cause of "works in Chrome, fails in curl."

Gotcha: cert-manager's Certificate resource creates a Secret containing tls.crt and tls.key. If you delete the Secret manually, cert-manager recreates it — but if you delete the Certificate resource, the Secret is orphaned and stops being renewed. Always manage the Certificate resource, not the Secret directly.

Drill 1: Check Certificate Expiry

Difficulty: Easy

Q: Check when the TLS certificate for a Kubernetes Secret myapp-tls expires.

Answer
kubectl get secret myapp-tls -n production -o jsonpath='{.data.tls\.crt}' | \
  base64 -d | openssl x509 -noout -dates
Output shows `notBefore` and `notAfter` dates.

Drill 2: Check Live Server Certificate

Difficulty: Easy

Q: Check the TLS certificate of a live server at api.example.com:443 from the command line.

Answer
# View cert details
openssl s_client -connect api.example.com:443 -servername api.example.com </dev/null 2>/dev/null | \
  openssl x509 -noout -text

# Just expiry dates
openssl s_client -connect api.example.com:443 -servername api.example.com </dev/null 2>/dev/null | \
  openssl x509 -noout -dates

# Check the full chain
openssl s_client -connect api.example.com:443 -servername api.example.com -showcerts </dev/null
`-servername` is needed for SNI (Server Name Indication) — without it you may get the wrong cert.

Drill 3: Verify Key Matches Certificate

Difficulty: Easy

Q: How do you verify that a private key file matches a certificate file?

Answer
# Compare modulus hashes — they must match
openssl x509 -noout -modulus -in cert.pem | md5sum
openssl rsa -noout -modulus -in key.pem | md5sum

# If the MD5 hashes are identical, the key matches the cert
This is essential when debugging "certificate/key mismatch" errors in Ingress or load balancers.

Drill 4: Create a cert-manager Certificate

Difficulty: Medium

Q: Write a cert-manager Certificate resource for app.example.com and www.example.com using a Let's Encrypt ClusterIssuer, with auto-renewal 15 days before expiry.

Answer
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: app-tls
  namespace: production
spec:
  secretName: app-tls-secret
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
  - app.example.com
  - www.example.com
  renewBefore: 360h    # 15 days
  privateKey:
    algorithm: ECDSA
    size: 256
cert-manager creates a CertificateRequest → Order → Challenge, then stores the signed cert in `app-tls-secret`.

Drill 5: Debug cert-manager Renewal Failure

Difficulty: Hard

Q: A Certificate shows READY: False and hasn't renewed. Walk through the debugging chain.

Answer
# 1. Certificate status
kubectl describe certificate app-tls -n production
# Look for: conditions, lastTransitionTime, message

# 2. CertificateRequest
kubectl get certificaterequest -n production
kubectl describe certificaterequest <name> -n production
# Look for: conditions, approval status

# 3. Order (ACME)
kubectl get orders -n production
kubectl describe order <name> -n production

# 4. Challenge (where it usually fails)
kubectl get challenges -n production
kubectl describe challenge <name> -n production
# Look for: state, reason, presented

# 5. cert-manager controller logs
kubectl logs -n cert-manager deploy/cert-manager --tail=200 | grep -i error

# 6. Force renewal
kubectl cert-manager renew app-tls -n production
Common failures: - HTTP-01: challenge solver can't be reached (firewall, wrong ingress class) - DNS-01: credentials for DNS provider expired - Rate limit: Let's Encrypt limits 5 certs per domain per week - Issuer: ACME account key secret deleted

Drill 6: HTTP-01 vs DNS-01 Challenge

Difficulty: Easy

Q: When would you use DNS-01 instead of HTTP-01 for ACME challenges?

Answer Use DNS-01 when: - You need **wildcard certificates** (`*.example.com`) — only DNS-01 supports this - The cluster is **not publicly accessible** (private/internal clusters) - Port 80 is **blocked** by firewall or security policy - You're behind a **CDN** that caches the challenge path Use HTTP-01 when: - Simple setup, cluster is publicly accessible on port 80 - You don't have DNS API access - Quick setup without DNS provider integration
# HTTP-01 solver
solvers:
- http01:
    ingress:
      class: nginx

# DNS-01 solver (example: Cloudflare)
solvers:
- dns01:
    cloudflare:
      email: ops@example.com
      apiTokenSecretRef:
        name: cloudflare-api-token
        key: api-token

Drill 7: Internal CA with cert-manager

Difficulty: Medium

Q: Set up cert-manager to issue certificates from an internal CA for service-to-service TLS within the cluster.

Answer
# 1. Create a self-signed issuer to bootstrap
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: selfsigned-bootstrap
spec:
  selfSigned: {}
---
# 2. Create the CA certificate
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: internal-ca
  namespace: cert-manager
spec:
  isCA: true
  secretName: internal-ca-secret
  commonName: Internal CA
  duration: 87600h    # 10 years
  issuerRef:
    name: selfsigned-bootstrap
    kind: ClusterIssuer
---
# 3. Create a CA issuer using the CA cert
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: internal-ca-issuer
spec:
  ca:
    secretName: internal-ca-secret
Now any Certificate referencing `internal-ca-issuer` gets signed by the internal CA. Distribute the CA cert to clients that need to trust it.

Drill 8: Ingress TLS Termination

Difficulty: Easy

Q: Configure an Ingress to terminate TLS using cert-manager auto-provisioning.

Answer
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp
  namespace: production
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - app.example.com
    secretName: app-tls-secret    # cert-manager creates this
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp
            port:
              number: 80
The annotation `cert-manager.io/cluster-issuer` triggers automatic certificate provisioning. cert-manager watches for Ingress resources with this annotation.

Drill 9: Cert Expiry Alerting

Difficulty: Medium

Q: Write a Prometheus alert rule that fires when any cert-manager certificate will expire within 14 days.

Answer
groups:
- name: tls-alerts
  rules:
  - alert: CertificateExpiringSoon
    expr: |
      certmanager_certificate_expiration_timestamp_seconds - time() < 14 * 24 * 3600
    for: 1h
    labels:
      severity: warning
    annotations:
      summary: "Certificate {{ $labels.name }} in {{ $labels.namespace }} expires in < 14 days"
      description: "Expires at {{ $value | humanizeTimestamp }}"

  - alert: CertificateExpiryCritical
    expr: |
      certmanager_certificate_expiration_timestamp_seconds - time() < 3 * 24 * 3600
    for: 10m
    labels:
      severity: critical
    annotations:
      summary: "Certificate {{ $labels.name }} expires in < 3 days!"

  - alert: CertificateNotReady
    expr: |
      certmanager_certificate_ready_status{condition="False"} == 1
    for: 30m
    labels:
      severity: warning
    annotations:
      summary: "Certificate {{ $labels.name }} is not ready"

Drill 10: Debug "Certificate Not Valid For" Error

Difficulty: Medium

Q: curl returns SSL: certificate subject name 'old.example.com' does not match target host name 'app.example.com'. How do you fix this?

Answer
# 1. Check the current cert's SANs (Subject Alternative Names)
kubectl get secret app-tls-secret -n production -o jsonpath='{.data.tls\.crt}' | \
  base64 -d | openssl x509 -noout -text | grep -A1 "Subject Alternative Name"

# 2. Check the Certificate resource
kubectl get certificate app-tls -n production -o yaml | grep -A5 dnsNames

# 3. Fix: update the Certificate to include the correct hostname
kubectl edit certificate app-tls -n production
# Add app.example.com to dnsNames

# 4. Delete the old secret to force re-issuance
kubectl delete secret app-tls-secret -n production

# 5. Or force renewal
kubectl cert-manager renew app-tls -n production

# 6. Verify new cert
kubectl get secret app-tls-secret -n production -o jsonpath='{.data.tls\.crt}' | \
  base64 -d | openssl x509 -noout -text | grep -A1 "Subject Alternative Name"
The cert was issued for `old.example.com` but the hostname changed. Update dnsNames and re-issue.

Wiki Navigation

Prerequisites