Skip to content

Remediation: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager

Immediate Fix (Kubernetes Ops — Domain C)

The fix is in Kubernetes/cert-manager configuration, not in DNS or TLS directly.

Step 1: Force cert-manager to re-issue

# Delete the failed Certificate resource to trigger a fresh issuance
$ kubectl delete certificate api-prod-tls -n prod
certificate.cert-manager.io/api-prod-tls deleted

# Re-apply the Certificate manifest
$ kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: api-prod-tls
  namespace: prod
spec:
  secretName: api-prod-tls-secret
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
    - api.prod.example.com
EOF
certificate.cert-manager.io/api-prod-tls created

Step 2: Wait for issuance and verify

$ kubectl get certificate api-prod-tls -n prod -w
NAME           READY   SECRET                AGE
api-prod-tls   False   api-prod-tls-secret   5s
api-prod-tls   True    api-prod-tls-secret   38s

$ kubectl get secret api-prod-tls-secret -n prod
NAME                  TYPE                DATA   AGE
api-prod-tls-secret   kubernetes.io/tls   3      42s

Step 3: Annotate the secret to survive Helm upgrades

$ kubectl annotate secret api-prod-tls-secret -n prod \
    helm.sh/resource-policy=keep \
    meta.helm.sh/release-name=api-prod \
    meta.helm.sh/release-namespace=prod

Step 4: Update the Helm chart to include the annotation

In devops/helm/grokdevops/templates/certificate.yaml:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: {{ .Release.Name }}-tls
  namespace: {{ .Release.Namespace }}
  annotations:
    helm.sh/resource-policy: keep
spec:
  secretName: {{ .Release.Name }}-tls-secret
  issuerRef:
    name: {{ .Values.tls.issuerName | default "letsencrypt-prod" }}
    kind: ClusterIssuer
  dnsNames:
    - {{ .Values.ingress.host }}

Verification

Domain A (Networking) — DNS and connectivity

$ curl -v https://api.prod.example.com 2>&1 | grep "SSL certificate verify ok"
* SSL certificate verify ok.

$ echo | openssl s_client -connect api.prod.example.com:443 -servername api.prod.example.com 2>/dev/null | openssl x509 -noout -dates
notBefore=Mar 19 03:05:22 2026 GMT
notAfter=Jun 17 03:05:21 2026 GMT

Domain B (Security) — Certificate validity

$ kubectl get certificate api-prod-tls -n prod
NAME           READY   SECRET                AGE
api-prod-tls   True    api-prod-tls-secret   5m

$ kubectl describe certificate api-prod-tls -n prod | grep "Not After"
  Not After:  2026-06-17T03:05:21Z

Domain C (Kubernetes Ops) — Secret persistence

$ kubectl get secret api-prod-tls-secret -n prod -o jsonpath='{.metadata.annotations}' | jq .
{
  "helm.sh/resource-policy": "keep",
  "meta.helm.sh/release-name": "api-prod",
  "meta.helm.sh/release-namespace": "prod"
}

Prevention

  • Monitoring: Add a cert-manager certificate readiness alert. Fire WARNING when any Certificate object has Ready=False for more than 1 hour. Fire CRITICAL at 7 days before expiry.
# Prometheus alert rule
- alert: CertManagerCertNotReady
  expr: certmanager_certificate_ready_status{condition="False"} == 1
  for: 1h
  labels:
    severity: warning
  annotations:
    summary: "Certificate {{ $labels.name }} in {{ $labels.namespace }} is not ready"
  • Runbook: Document the Helm/cert-manager secret lifecycle interaction. All cert-manager-managed secrets must have helm.sh/resource-policy: keep if the namespace is Helm-managed.

  • Architecture: Consider deploying cert-manager secrets into a dedicated namespace that is not managed by Helm, or use Helm's --no-hooks carefully. Alternatively, use a Certificate resource defined in the Helm chart itself so Helm tracks it.