Progressive Hints¶
Hint 1 (after 5 min)¶
Compare the two nslookup results: payment-api.payments.svc.cluster.local resolves successfully, but payment-api.payments returns NXDOMAIN. In Kubernetes, short names should be expanded using the search domains in /etc/resolv.conf. Look at the ndots setting.
Hint 2 (after 10 min)¶
The ndots value is 15. This is absurdly high. The ndots option tells the resolver: "if the query name has fewer than N dots, append the search domains and try those first." The name payment-api.payments has 1 dot, which is less than 15, so the resolver appends search domains: it tries payment-api.payments.default.svc.cluster.local (wrong namespace!), then payment-api.payments.svc.cluster.local, then payment-api.payments.cluster.local, then payment-api.payments.ec2.internal. The CoreDNS log confirms it tried payment-api.payments.default.svc.cluster.local and got NXDOMAIN.
Hint 3 (after 15 min)¶
This is a payment processing system where payment-worker pods call payment-api and inventory-api services. The pod spec sets ndots: 15, which forces all DNS queries through the full search domain list. For payment-api.payments (1 dot < 15 ndots), the resolver tries payment-api.payments.default.svc.cluster.local first — this is the wrong namespace (default instead of payments) and returns NXDOMAIN. Eventually it tries payment-api.payments.svc.cluster.local which succeeds, but some DNS client libraries stop at the first NXDOMAIN or the search domain expansion causes intermittent failures depending on response timing and UDP packet ordering. The fix is to set ndots: 2 (the Kubernetes-sensible default is 5, but 2 works for service.namespace names).