Remediation: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager¶
Immediate Fix (Kubernetes Ops — Domain C)¶
The fix is in Kubernetes/cert-manager configuration, not in DNS or TLS directly.
Step 1: Force cert-manager to re-issue¶
# Delete the failed Certificate resource to trigger a fresh issuance
$ kubectl delete certificate api-prod-tls -n prod
certificate.cert-manager.io/api-prod-tls deleted
# Re-apply the Certificate manifest
$ kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: api-prod-tls
namespace: prod
spec:
secretName: api-prod-tls-secret
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames:
- api.prod.example.com
EOF
certificate.cert-manager.io/api-prod-tls created
Step 2: Wait for issuance and verify¶
$ kubectl get certificate api-prod-tls -n prod -w
NAME READY SECRET AGE
api-prod-tls False api-prod-tls-secret 5s
api-prod-tls True api-prod-tls-secret 38s
$ kubectl get secret api-prod-tls-secret -n prod
NAME TYPE DATA AGE
api-prod-tls-secret kubernetes.io/tls 3 42s
Step 3: Annotate the secret to survive Helm upgrades¶
$ kubectl annotate secret api-prod-tls-secret -n prod \
helm.sh/resource-policy=keep \
meta.helm.sh/release-name=api-prod \
meta.helm.sh/release-namespace=prod
Step 4: Update the Helm chart to include the annotation¶
In devops/helm/grokdevops/templates/certificate.yaml:
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: {{ .Release.Name }}-tls
namespace: {{ .Release.Namespace }}
annotations:
helm.sh/resource-policy: keep
spec:
secretName: {{ .Release.Name }}-tls-secret
issuerRef:
name: {{ .Values.tls.issuerName | default "letsencrypt-prod" }}
kind: ClusterIssuer
dnsNames:
- {{ .Values.ingress.host }}
Verification¶
Domain A (Networking) — DNS and connectivity¶
$ curl -v https://api.prod.example.com 2>&1 | grep "SSL certificate verify ok"
* SSL certificate verify ok.
$ echo | openssl s_client -connect api.prod.example.com:443 -servername api.prod.example.com 2>/dev/null | openssl x509 -noout -dates
notBefore=Mar 19 03:05:22 2026 GMT
notAfter=Jun 17 03:05:21 2026 GMT
Domain B (Security) — Certificate validity¶
$ kubectl get certificate api-prod-tls -n prod
NAME READY SECRET AGE
api-prod-tls True api-prod-tls-secret 5m
$ kubectl describe certificate api-prod-tls -n prod | grep "Not After"
Not After: 2026-06-17T03:05:21Z
Domain C (Kubernetes Ops) — Secret persistence¶
$ kubectl get secret api-prod-tls-secret -n prod -o jsonpath='{.metadata.annotations}' | jq .
{
"helm.sh/resource-policy": "keep",
"meta.helm.sh/release-name": "api-prod",
"meta.helm.sh/release-namespace": "prod"
}
Prevention¶
- Monitoring: Add a cert-manager certificate readiness alert. Fire WARNING when any Certificate object has
Ready=Falsefor more than 1 hour. Fire CRITICAL at 7 days before expiry.
# Prometheus alert rule
- alert: CertManagerCertNotReady
expr: certmanager_certificate_ready_status{condition="False"} == 1
for: 1h
labels:
severity: warning
annotations:
summary: "Certificate {{ $labels.name }} in {{ $labels.namespace }} is not ready"
-
Runbook: Document the Helm/cert-manager secret lifecycle interaction. All cert-manager-managed secrets must have
helm.sh/resource-policy: keepif the namespace is Helm-managed. -
Architecture: Consider deploying cert-manager secrets into a dedicated namespace that is not managed by Helm, or use Helm's
--no-hookscarefully. Alternatively, use aCertificateresource defined in the Helm chart itself so Helm tracks it.