Symptoms¶

Application pods intermittently fail to resolve DNS names, both internal services and external domains.
Error logs from the application show: dial tcp: lookup api.stripe.com: i/o timeout.
The failures are sporadic -- roughly 10-15% of DNS lookups fail.
The issue worsens during peak traffic hours (9am-11am).
CoreDNS pods show elevated CPU usage and request queuing.
The cluster has 200+ pods across 8 nodes and only 2 CoreDNS replicas.
Pod resolv.conf shows ndots:5 which is the Kubernetes default.