Skip to content

Solution: DNS Split-Horizon Confusion

Summary

The internal DNS zone for acme.com is missing an A record for payments.acme.com. When internal hosts query the internal DNS server, it does not find the record in its authoritative zone, so the query falls through to external resolution, returning the public IP 203.0.113.50. Internal hosts cannot reach this external IP due to firewall rules blocking hairpin NAT.

Senior Workflow

Step 1: Confirm DNS resolution from the affected host

# Check which DNS server the host uses
cat /etc/resolv.conf
# Expected: nameserver 10.100.1.10 (internal DNS)

# Query the internal DNS server explicitly
dig payments.acme.com @10.100.1.10
# Returns: 203.0.113.50 (external IP -- this is wrong for internal clients)

# Query external DNS for comparison
dig payments.acme.com @8.8.8.8
# Returns: 203.0.113.50 (correct for external)

# Check a working service for comparison
dig inventory.acme.com @10.100.1.10
# Returns: 10.100.8.25 (internal IP -- correct)

Step 2: Verify the internal DNS zone

# On the internal DNS server (10.100.1.10), check the zone file
cat /etc/bind/zones/internal.acme.com.zone

# Look for payments.acme.com -- it will be MISSING
# Compare with inventory.acme.com which is present

Step 3: Confirm the service is reachable via internal IP

# Bypass DNS, connect directly
curl -k --resolve payments.acme.com:443:10.100.8.30 https://payments.acme.com/health
# Expected: 200 OK -- confirms the service is fine, only DNS is wrong

Step 4: Confirm the external IP is unreachable from inside

curl -v --connect-timeout 5 https://203.0.113.50/health
# Expected: connection timeout -- firewall blocks hairpin NAT

Step 5: Apply the fix

# Add the missing A record to the internal zone
# /etc/bind/zones/internal.acme.com.zone:
# payments    IN  A  10.100.8.30

# Increment the serial number in the SOA record
# Reload the zone
rndc reload acme.com

Step 6: Flush caches and verify

# Flush systemd-resolved cache if applicable
systemd-resolve --flush-caches

# Flush nscd if running
nscd -i hosts

# Verify resolution
dig payments.acme.com @10.100.1.10
# Should now return: 10.100.8.30

# Test the application
curl https://payments.acme.com/health
# Expected: 200 OK

Step 7: Audit for other missing records

# Compare internal and external zones
diff <(dig @10.100.1.10 axfr acme.com | grep "IN A" | sort) \
     <(cat external-records.txt | sort)
# Identify any other records missing from the internal zone

Common Pitfalls

  • Adding /etc/hosts entries: This works as a temporary fix but does not scale and is not maintained centrally. Fix the DNS zone instead.
  • Not flushing DNS caches: After fixing the zone, clients may cache the old answer for the TTL duration. Flush caches to apply immediately.
  • Missing the forwarding behavior: If the internal DNS server is authoritative for the zone but the record is missing, some configurations forward to external resolvers instead of returning NXDOMAIN. Understand your DNS server's forwarding policy.
  • Not auditing after consolidation: The DNS consolidation likely missed other records too. Do a full audit.
  • Forgetting to increment the SOA serial: Zone transfers to secondary DNS servers will not pick up changes without a serial increment.