Skip to content

Pattern: ndots:5 Query Amplification

ID: FP-036 Family: Configuration Landmine Frequency: Common Blast Radius: Multi-Service Detection Difficulty: Subtle

The Shape

Kubernetes defaults ndots:5 in pod DNS configuration, meaning any name with fewer than 5 dots triggers search-domain expansion before the literal name is tried. A lookup for api.example.com (2 dots) first tries api.example.com.default.svc.cluster.local, then api.example.com.svc.cluster.local, then api.example.com.cluster.local, then api.example.com — 4 DNS queries for what should be 1. CoreDNS receives 4× more queries than expected. Under load, this amplification overwhelms CoreDNS and causes DNS resolution failures for all pods.

How You'll See It

In Kubernetes

# In a pod, for an external DNS lookup:
$ strace -e trace=network getent hosts api.example.com 2>&1 | grep sendto
# Shows 4 sendto() calls instead of 1
CoreDNS metrics: coredns_dns_request_duration_seconds spikes. Pods report intermittent dial tcp: lookup api.example.com: no such host errors — not consistently, because some queries succeed before CoreDNS is overloaded.

In Linux/Infrastructure

Not applicable (ndots is a Kubernetes/resolv.conf concept). But the same pattern exists in /etc/resolv.conf with multiple search domains — every short name is tried against each search domain before the literal name, multiplying DNS queries by the number of search domains.

In CI/CD

CI jobs running in Kubernetes pods make many external API calls (package registries, notification services). Each call generates 4–8 DNS queries instead of 1. CoreDNS is the bottleneck during parallel CI builds.

The Tell

CoreDNS request_count_total is 4–6× higher than the number of application-level DNS lookups. Intermittent DNS failures under load, not under low traffic. strace on a pod shows multiple consecutive DNS queries for variants of the same hostname before the actual query.

Common Misdiagnosis

Looks Like But Actually How to Tell the Difference
CoreDNS overloaded (underpowered) ndots amplification Adding more CoreDNS replicas reduces symptoms but doesn't fix the 4x amplification
Network instability DNS query amplification DNS failures are intermittent and load-correlated; not random
External DNS outage Local CoreDNS overload External DNS (public resolver) has no issues; CoreDNS metrics show overload

The Fix (Generic)

  1. Immediate: Use FQDN (trailing dot) in application configs: api.example.com. — a trailing dot tells the resolver "this is already fully qualified; don't expand."
  2. Short-term: In pod spec, set dnsConfig.options: [{name: ndots, value: "1"}] for pods that primarily make external DNS lookups.
  3. Long-term: Use CoreDNS caching (already built in); tune ndots per deployment based on whether the service primarily calls internal (needs ndots:5) or external (needs ndots:1) names.

Real-World Examples

  • Example 1: Microservice making 1,000 external API calls/min. With ndots:5: 4,000–5,000 CoreDNS queries/min. CoreDNS at 2 replicas was overwhelmed. Reducing ndots to 1 for that deployment cut DNS queries by 75%.
  • Example 2: Black Friday: 10× normal traffic. External payment API lookups amplified by ndots:5. CoreDNS overloaded; payment DNS resolution failures. Intermittent payment failures for 12 minutes until CoreDNS was scaled up.

War Story

Payment service was failing intermittently — maybe 2% of requests getting DNS failures. We scaled CoreDNS from 2 to 5 replicas: improved but not fixed. Then someone ran strace on the payment pod and showed us: every lookup of api.stripe.com (2 dots) was triggering 5 DNS queries (the 4 search variants plus the actual one). At our request rate, the payment service alone was generating 12,000 DNS queries/min from "3,000 actual lookups." We added ndots: "1" to the payment deployment's dnsConfig. DNS query rate dropped 75%. CoreDNS load dropped immediately. We reverted to 2 CoreDNS replicas — which was now sufficient.

Cross-References