Skip to content

Portal | Level: L1: Foundations | Topics: DNS Deep Dive, DNS | Domain: Networking

DNS Deep Dive - Primer

Why This Matters

Every service you operate depends on DNS. When DNS works, it is invisible. When it breaks, everything breaks — and the failure mode is deceptive. Applications time out. API calls hang. Authentication fails. Load balancers route to nowhere. The symptoms masquerade as application bugs, network partitions, or authentication failures until someone finally asks "is it DNS?" The answer is almost always yes.

DNS is also where infrastructure decisions compound. A bad TTL choice during a migration means hours of stale traffic. A misconfigured zone transfer exposes your entire internal topology. A missing PTR record silently breaks email deliverability for weeks. Understanding DNS deeply — from the root servers to your pod's /etc/resolv.conf — is one of the highest-leverage skills in operations.

The DNS Hierarchy

DNS is a distributed, hierarchical database. No single server knows everything. Instead, authority is delegated from the root down through a tree of zones.

. (Root)
├── com.
│   ├── example.com.
│   │   ├── app.example.com.
│   │   ├── api.example.com.
│   │   └── mail.example.com.
│   └── google.com.
├── org.
│   └── wikipedia.org.
├── net.
├── io.
└── arpa.
    └── in-addr.arpa.     (reverse DNS)
        └── 10.in-addr.arpa.

The Players

Fun fact: There are exactly 13 root server addresses (A through M) because that is the maximum that fits in a single 512-byte DNS UDP response without EDNS. In reality there are over 1,700 physical root server instances worldwide, distributed via anycast. The number 13 is a protocol constraint from 1987, not a design choice.

Root servers: 13 logical root server addresses (a.root-servers.net through m.root-servers.net), operated by different organizations. They don't know IP addresses — they only know which TLD servers are authoritative for .com, .org, .net, etc. In practice there are hundreds of physical root server instances distributed globally via anycast.

TLD (Top-Level Domain) servers: Operated by registries (Verisign for .com/.net, PIR for .org). They know which nameservers are authoritative for each registered domain under their TLD.

Authoritative nameservers: These are the servers that actually hold the DNS records for a zone. When you configure DNS for example.com, you are configuring authoritative nameservers.

Recursive resolvers: The workhorses that clients actually talk to. They walk the hierarchy on behalf of clients, starting from the root if necessary, and cache results. Examples: your ISP's resolver, 8.8.8.8 (Google), 1.1.1.1 (Cloudflare), or your internal resolver running BIND/Unbound.

Stub resolver: The DNS client library on your machine. It reads /etc/resolv.conf, sends queries to the configured recursive resolver, and returns answers to the application.

DNS Resolution Step by Step

When your application looks up app.example.com:

1. Application calls getaddrinfo("app.example.com")
2. Stub resolver checks /etc/nsswitch.conf for resolution order
   (typically: files → dns, meaning /etc/hosts first, then DNS)
3. Stub resolver checks /etc/hosts — no match
4. Stub resolver reads /etc/resolv.conf for nameserver IP
5. Stub resolver sends query to recursive resolver (e.g., 10.0.1.10)
6. Recursive resolver checks its cache — cache miss
7. Recursive resolver queries a root server:
   "Who handles .com?"  →  "Ask a.gtld-servers.net"
8. Recursive resolver queries the .com TLD server:
   "Who handles example.com?"  →  "Ask ns1.example.com (10.0.1.50)"
9. Recursive resolver queries ns1.example.com:
   "What is app.example.com?"  →  "10.0.2.100, TTL 300"
10. Recursive resolver caches the answer for 300 seconds
11. Recursive resolver returns 10.0.2.100 to stub resolver
12. Application connects to 10.0.2.100

Each layer in this chain caches responses according to the TTL. A response with TTL 3600 means every resolver in the chain can reuse that answer for up to 3600 seconds without asking again.

Record Types

Core Records

Type Purpose Example
A IPv4 address mapping app.example.com. 300 IN A 10.0.2.100
AAAA IPv6 address mapping app.example.com. 300 IN AAAA 2001:db8::1
CNAME Canonical name (alias) www.example.com. 300 IN CNAME app.example.com.
MX Mail exchange server example.com. 3600 IN MX 10 mail.example.com.
NS Nameserver delegation example.com. 86400 IN NS ns1.example.com.
SOA Start of Authority Serial number, refresh intervals, zone metadata
TXT Arbitrary text data SPF, DKIM, domain verification tokens
SRV Service location _http._tcp.example.com. 300 IN SRV 10 0 8080 app.example.com.
PTR Reverse lookup (IP to name) 100.2.0.10.in-addr.arpa. 3600 IN PTR app.example.com.
CAA Certificate Authority Authorization example.com. 3600 IN CAA 0 issue "letsencrypt.org"

SOA Record in Detail

The SOA (Start of Authority) record is mandatory for every zone. It contains zone-level metadata:

example.com. IN SOA ns1.example.com. admin.example.com. (
    2026031901  ; Serial — MUST increment on every change
    3600        ; Refresh — how often secondaries check for updates (1h)
    900         ; Retry — how often to retry if refresh fails (15m)
    604800      ; Expire — when secondaries stop serving if primary is unreachable (7d)
    300         ; Minimum TTL — negative caching TTL (5m)
)

The serial number is the most operationally important field. If you edit a zone file and forget to increment the serial, secondary nameservers will ignore the update because they think they already have the latest version.

Remember: Use the date-based serial format YYYYMMDDNN (e.g., 2026031901 for the first change on March 19, 2026). This is human-readable and naturally increments. The NN suffix allows up to 99 changes per day. Never use arbitrary numbers — you cannot go backward, and if you accidentally set the serial to 9999999999, recovery requires manual intervention on every secondary.

SRV Records and Service Discovery

SRV records encode service location with priority, weight, port, and target:

_service._proto.name. TTL IN SRV priority weight port target.

_http._tcp.example.com.  300 IN SRV 10 60 8080 app1.example.com.
_http._tcp.example.com.  300 IN SRV 10 40 8080 app2.example.com.
_http._tcp.example.com.  300 IN SRV 20 0  8080 app3.example.com.
  • Priority: Lower number = preferred (like MX). Priority 10 servers are tried before priority 20.
  • Weight: For load distribution among same-priority records. 60/40 split means 60% of traffic to app1, 40% to app2.
  • Port: The port the service listens on.
  • Target: The hostname of the service.

Consul uses SRV records for service discovery. Kubernetes uses them for headless services.

TTL and Caching

TTL (Time-To-Live) is the single most important operational parameter in DNS. It controls how long resolvers cache a response before asking the authoritative server again.

High TTL (3600-86400 seconds):
  + Less query load on authoritative servers
  + Faster responses for clients (served from cache)
  - Slow propagation when you change records
  - Long recovery time during incidents (clients stuck on old IP)

Low TTL (30-300 seconds):
  + Fast propagation of changes
  + Quick failover during incidents
  - Higher query load on authoritative servers
  - Slightly more latency on cache misses
  - Some resolvers enforce a minimum TTL (often 30-60s) regardless of what you set

TTL Strategy for Migrations

Normal state:     TTL 3600 (1 hour) or higher
48 hours before:  Lower TTL to 60 seconds
                  (wait for old cached entries to expire)
Migration time:   Change the record to new IP (TTL still 60)
Verify:           Check from multiple vantage points
Post-migration:   Raise TTL back to 3600 after 24-48 hours

The 48-hour lead time matters. If your current TTL is 86400 (24 hours), lowering it to 60 seconds only takes effect after existing cached entries expire — up to 24 hours later. Plan ahead.

Negative Caching

When a name does not exist (NXDOMAIN), resolvers cache that negative result too. The cache duration is the SOA minimum TTL field. If your SOA minimum is 3600 seconds and someone queries a name before you create it, the NXDOMAIN is cached for an hour even after you add the record.

EDNS (Extension Mechanisms for DNS)

Standard DNS messages are limited to 512 bytes over UDP. EDNS0 (RFC 6891) extends this limit, typically to 4096 bytes. This matters because:

  • DNSSEC responses are larger than 512 bytes (signatures add bulk)
  • Responses with many records (round-robin A records, large TXT records) may exceed 512 bytes
  • Without EDNS, the server returns a truncated response and the client retries over TCP (slower)
# Check EDNS support
dig +edns=0 +bufsize=4096 example.com

# If you see "EDNS: version: 0, flags:; udp: 4096" in output, EDNS is working

Some broken middleboxes (firewalls, load balancers) strip or block EDNS. This causes DNSSEC failures and truncation issues. If you see intermittent DNS failures that correlate with response size, suspect EDNS problems.

DNS over HTTPS (DoH) and DNS over TLS (DoT)

Traditional DNS is unencrypted UDP on port 53. Anyone on the network path can see and modify DNS queries. DoH and DoT encrypt DNS traffic.

DNS over TLS (DoT):
  - Port 853
  - TLS wrapper around standard DNS wire format
  - Easy to block (dedicated port)
  - Used by: systemd-resolved, Android 9+, Unbound

DNS over HTTPS (DoH):
  - Port 443 (same as HTTPS)
  - DNS queries inside HTTP/2 or HTTP/3
  - Harder to block (mixed with regular HTTPS traffic)
  - Used by: Firefox, Chrome, Cloudflare, Google

Configuring DoT with systemd-resolved

# /etc/systemd/resolved.conf
[Resolve]
DNS=1.1.1.1#cloudflare-dns.com 8.8.8.8#dns.google
DNSOverTLS=yes
systemctl restart systemd-resolved
resolvectl status  # Verify "DNS over TLS" shows "yes"

Split-Horizon DNS

Split-horizon DNS returns different answers based on the source of the query. Internal clients get private IPs; external clients get public IPs.

BIND Views

view "internal" {
    match-clients { 10.0.0.0/8; 172.16.0.0/12; 192.168.0.0/16; 127.0.0.0/8; };
    zone "example.com" {
        type master;
        file "zones/example.com.internal";
    };
};

view "external" {
    match-clients { any; };
    zone "example.com" {
        type master;
        file "zones/example.com.external";
    };
};
# zones/example.com.internal
app     IN  A   10.0.2.100

# zones/example.com.external
app     IN  A   203.0.113.50

Cloud Split-Horizon

In AWS, use Route 53 private hosted zones associated with your VPCs. Queries from within the VPC resolve to private IPs; queries from outside resolve via the public hosted zone.

In GCP, use Cloud DNS private zones. In Azure, use Azure Private DNS zones.

Zone Files and Zone Transfers

Zone File Format

$TTL 300
$ORIGIN example.com.

@   IN  SOA ns1.example.com. admin.example.com. (
        2026031901  ; Serial
        3600        ; Refresh
        900         ; Retry
        604800      ; Expire
        300         ; Minimum TTL
    )

; Nameservers
    IN  NS  ns1.example.com.
    IN  NS  ns2.example.com.

; Mail
    IN  MX  10  mail.example.com.
    IN  MX  20  mail-backup.example.com.

; Glue records (NS records need A records in the same zone)
ns1     IN  A   10.0.1.10
ns2     IN  A   10.0.1.11

; Services
app     IN  A   10.0.2.100
app     IN  A   10.0.2.101     ; Round-robin
mail    IN  A   10.0.2.200
www     IN  CNAME   app.example.com.

; TXT records
@       IN  TXT "v=spf1 mx -all"
@       IN  CAA 0 issue "letsencrypt.org"

Key syntax rules: - @ means the zone origin (example.com.) - Names without a trailing dot are relative to $ORIGIN - Names WITH a trailing dot are absolute (fully qualified) - www means www.example.com. but www.example.com (no dot) means www.example.com.example.com.

Zone Transfers (AXFR/IXFR)

Zone transfers replicate zone data from primary to secondary nameservers:

  • AXFR (full transfer): Transfers the entire zone. Used for initial replication or when incremental data is unavailable.
  • IXFR (incremental transfer): Transfers only changes since a given serial number. More efficient for large zones with small changes.
# BIND primary configuration
zone "example.com" {
    type master;
    file "zones/example.com";
    allow-transfer { 10.0.1.11; 10.0.1.12; };  # Only secondaries
    also-notify { 10.0.1.11; 10.0.1.12; };      # Push notifications
};

# BIND secondary configuration
zone "example.com" {
    type slave;
    file "zones/example.com.slave";
    masters { 10.0.1.10; };
};

DNS Server Software

BIND (named)

The oldest and most widely deployed DNS server. Runs both authoritative and recursive. Configuration is complex but extremely flexible.

# Check config syntax
named-checkconf /etc/named.conf
named-checkzone example.com /var/named/zones/example.com

# Reload a zone without restarting
rndc reload example.com

# Flush resolver cache
rndc flush

# Dump cache to file for inspection
rndc dumpdb -cache
cat /var/named/data/cache_dump.db

Unbound

Purpose-built recursive resolver. Faster and more secure than BIND for pure recursion. Does not serve authoritative zones. Excellent for internal resolver infrastructure.

# /etc/unbound/unbound.conf
server:
    interface: 0.0.0.0
    access-control: 10.0.0.0/8 allow
    access-control: 127.0.0.0/8 allow

    # Performance
    num-threads: 4
    msg-cache-size: 128m
    rrset-cache-size: 256m

    # Security
    hide-identity: yes
    hide-version: yes
    harden-glue: yes
    harden-dnssec-stripped: yes

    # DNSSEC
    auto-trust-anchor-file: "/var/lib/unbound/root.key"

    # Forward specific zones to internal DNS
    forward-zone:
        name: "internal.example.com."
        forward-addr: 10.0.1.10
        forward-addr: 10.0.1.11

    # Forward everything else to upstream
    forward-zone:
        name: "."
        forward-addr: 1.1.1.1
        forward-addr: 8.8.8.8

dnsmasq

Lightweight DNS forwarder and DHCP server. Common in home routers, development environments, and small networks. Not suitable for large-scale authoritative DNS.

# /etc/dnsmasq.conf
listen-address=127.0.0.1,10.0.1.1
cache-size=10000
no-resolv
server=8.8.8.8
server=8.8.4.4
# Override specific names
address=/app.local/10.0.2.100
# Forward specific domain to internal DNS
server=/internal.example.com/10.0.1.10

CoreDNS

Cloud-native, plugin-based DNS server. Written in Go. Default DNS server in Kubernetes since 1.13. Configuration is a Corefile — a chain of plugins that process queries in order.

# Corefile
.:53 {
    errors
    health {
        lameduck 5s
    }
    ready
    kubernetes cluster.local in-addr.arpa ip6.arpa {
        pods insecure
        fallthrough in-addr.arpa ip6.arpa
        ttl 30
    }
    prometheus :9153
    forward . /etc/resolv.conf {
        max_concurrent 1000
    }
    cache 30
    loop
    reload
    loadbalance
}

Plugin order matters. cache after forward means responses from upstream are cached. loop detects forwarding loops and halts CoreDNS if detected. loadbalance randomizes A/AAAA record order for round-robin distribution.

The Linux DNS Stack

/etc/resolv.conf

# Traditional resolv.conf
nameserver 10.0.1.10
nameserver 10.0.1.11
search example.com internal.example.com
options timeout:2 attempts:3 ndots:1
  • nameserver: Up to 3 recursive resolvers. Tried in order, failover on timeout.
  • search: Suffix list appended to short names. app becomes app.example.com, then app.internal.example.com.
  • ndots: If a name has fewer dots than ndots, search domains are tried first. Default is 1. Kubernetes sets this to 5.
  • timeout: Seconds to wait for a response before trying the next nameserver.
  • attempts: Number of times to try the entire nameserver list.

/etc/nsswitch.conf

Controls the order of name resolution sources:

hosts: files dns myhostname

This means: check /etc/hosts first, then DNS, then fall back to the local hostname. If /etc/hosts has an entry for app.example.com, DNS is never queried for that name.

systemd-resolved

Modern Linux systems often use systemd-resolved as a local caching stub resolver. It listens on 127.0.0.53 and manages DNS configuration per-interface.

# Check status
resolvectl status

# Query through resolved
resolvectl query app.example.com

# Flush cache
resolvectl flush-caches

# Show cache statistics
resolvectl statistics

# Set DNS for an interface
resolvectl dns eth0 10.0.1.10 10.0.1.11
resolvectl domain eth0 example.com

# Check current DNS configuration
resolvectl dns
resolvectl domain

When systemd-resolved is active, /etc/resolv.conf is typically a symlink to /run/systemd/resolve/stub-resolv.conf (pointing to 127.0.0.53). Do not edit it directly — configure via systemd-resolved or /etc/systemd/resolved.conf.

DNS Load Balancing

Round-Robin

The simplest approach: multiple A records for the same name.

app.example.com.  300  IN  A  10.0.2.100
app.example.com.  300  IN  A  10.0.2.101
app.example.com.  300  IN  A  10.0.2.102

Resolvers rotate the order of returned records. Clients typically connect to the first address. This gives rough load distribution but no health checking — if 10.0.2.101 is down, ~33% of clients will try it first and experience delays.

Weighted Routing (Route 53)

{
  "Name": "app.example.com",
  "Type": "A",
  "SetIdentifier": "primary",
  "Weight": 70,
  "TTL": 60,
  "ResourceRecords": [{"Value": "10.0.2.100"}]
}

{
  "Name": "app.example.com",
  "Type": "A",
  "SetIdentifier": "secondary",
  "Weight": 30,
  "TTL": 60,
  "ResourceRecords": [{"Value": "10.0.2.101"}]
}

70% of DNS responses return the primary IP, 30% return the secondary. Useful for canary deployments and gradual traffic shifting.

Latency-Based Routing

Route 53 and Cloud DNS can route based on the resolver's geographic location, returning the IP of the nearest datacenter. This is measured latency, not geographic distance.

Failover Routing

Active-passive: return the primary record unless a health check fails, then return the secondary.

# Route 53 health check + failover
aws route53 create-health-check --caller-reference $(date +%s) \
  --health-check-config '{
    "IPAddress": "10.0.2.100",
    "Port": 443,
    "Type": "HTTPS",
    "ResourcePath": "/health",
    "RequestInterval": 10,
    "FailureThreshold": 3
  }'

DNSSEC Basics

DNSSEC adds cryptographic signatures to DNS responses, allowing resolvers to verify that answers have not been tampered with.

Without DNSSEC:
  Resolver asks: "What is app.example.com?"
  Attacker intercepts, returns malicious IP
  Resolver has no way to detect the forgery

With DNSSEC:
  Authoritative server signs each record with a private key
  Resolver verifies signature using the published DNSKEY record
  Forged or modified responses fail signature verification

Key Types

  • ZSK (Zone Signing Key): Signs individual records in the zone. Rotated frequently (every 1-3 months).
  • KSK (Key Signing Key): Signs the DNSKEY record set. Rotated less frequently (every 1-2 years).
  • DS (Delegation Signer): A hash of the KSK, published in the parent zone. Creates the chain of trust from parent to child.

Chain of Trust

Root (.)          — signs .com DS record
  |
.com TLD          — signs example.com DS record
  |
example.com       — KSK signs DNSKEY RRset
                  — ZSK signs all other records

Checking DNSSEC

# Check if a domain has DNSSEC
dig example.com +dnssec +short

# Full DNSSEC validation trace
dig example.com +trace +dnssec

# Check DS record in parent zone
dig DS example.com @a.gtld-servers.net +short

Private DNS Zones

AWS Route 53

# Create a private hosted zone
aws route53 create-hosted-zone \
  --name internal.example.com \
  --vpc VPCRegion=us-east-1,VPCId=vpc-abc123 \
  --caller-reference $(date +%s) \
  --hosted-zone-config PrivateZone=true

# Associate with additional VPCs
aws route53 associate-vpc-with-hosted-zone \
  --hosted-zone-id Z123456 \
  --vpc VPCRegion=us-west-2,VPCId=vpc-def456

GCP Cloud DNS

gcloud dns managed-zones create internal-zone \
  --description="Internal DNS" \
  --dns-name="internal.example.com." \
  --visibility=private \
  --networks=my-vpc

Azure Private DNS

az network private-dns zone create \
  --resource-group myRG \
  --name internal.example.com

az network private-dns link vnet create \
  --resource-group myRG \
  --zone-name internal.example.com \
  --name mylink \
  --virtual-network myVnet \
  --registration-enabled false

Service Discovery via DNS

Consul DNS Interface

Consul provides a DNS interface for service discovery on port 8600 by default:

# Query a service
dig @127.0.0.1 -p 8600 web.service.consul SRV

# Returns:
# web.service.consul. 0 IN SRV 1 1 8080 node1.node.dc1.consul.
# web.service.consul. 0 IN SRV 1 1 8080 node2.node.dc1.consul.

# Query with tag
dig @127.0.0.1 -p 8600 production.web.service.consul

# Forward .consul domain from your resolver
# dnsmasq:
server=/consul/127.0.0.1#8600
# Unbound:
forward-zone:
    name: "consul."
    forward-addr: 127.0.0.1@8600

Kubernetes DNS

Kubernetes assigns DNS names to Services and Pods automatically:

Service:  <service>.<namespace>.svc.cluster.local
Pod:      <pod-ip-dashed>.<namespace>.pod.cluster.local
Headless: <pod-name>.<service>.<namespace>.svc.cluster.local

Examples:
  api-server.production.svc.cluster.local        → ClusterIP
  10-0-2-100.production.pod.cluster.local         → Pod IP
  api-server-0.api-server.production.svc.cluster.local  → StatefulSet pod

Headless services (ClusterIP: None) return individual Pod IPs instead of a single virtual IP, enabling client-side load balancing and StatefulSet addressing.

DNS in Cloud Environments

Route 53 Key Concepts

  • Hosted zones: Containers for DNS records. Public zones serve the internet; private zones serve VPCs.
  • Alias records: Route 53 extension that maps to AWS resources (ELB, CloudFront, S3) without a CNAME. Works at the zone apex.
  • Health checks: Probe endpoints and remove unhealthy records from responses.
  • Routing policies: Simple, weighted, latency, failover, geolocation, multivalue answer.

Cloud DNS Resolver Endpoints

In hybrid environments (on-premises + cloud), you need DNS forwarding between networks:

On-premises DNS server → Route 53 Resolver Inbound Endpoint
  (resolves cloud private zones from on-prem)

Route 53 Resolver Outbound Endpoint → On-premises DNS server
  (resolves on-prem zones from cloud)

This replaces hacks like running BIND forwarders in EC2 instances.


Wiki Navigation

Prerequisites

Next Steps