Portal | Level: L1: Foundations | Topics: DNS Deep Dive, DNS | Domain: Networking
DNS Deep Dive - Primer¶
Why This Matters¶
Every service you operate depends on DNS. When DNS works, it is invisible. When it breaks, everything breaks — and the failure mode is deceptive. Applications time out. API calls hang. Authentication fails. Load balancers route to nowhere. The symptoms masquerade as application bugs, network partitions, or authentication failures until someone finally asks "is it DNS?" The answer is almost always yes.
DNS is also where infrastructure decisions compound. A bad TTL choice during a migration means hours of stale traffic. A misconfigured zone transfer exposes your entire internal topology. A missing PTR record silently breaks email deliverability for weeks. Understanding DNS deeply — from the root servers to your pod's /etc/resolv.conf — is one of the highest-leverage skills in operations.
The DNS Hierarchy¶
DNS is a distributed, hierarchical database. No single server knows everything. Instead, authority is delegated from the root down through a tree of zones.
. (Root)
├── com.
│ ├── example.com.
│ │ ├── app.example.com.
│ │ ├── api.example.com.
│ │ └── mail.example.com.
│ └── google.com.
├── org.
│ └── wikipedia.org.
├── net.
├── io.
└── arpa.
└── in-addr.arpa. (reverse DNS)
└── 10.in-addr.arpa.
The Players¶
Fun fact: There are exactly 13 root server addresses (A through M) because that is the maximum that fits in a single 512-byte DNS UDP response without EDNS. In reality there are over 1,700 physical root server instances worldwide, distributed via anycast. The number 13 is a protocol constraint from 1987, not a design choice.
Root servers: 13 logical root server addresses (a.root-servers.net through m.root-servers.net), operated by different organizations. They don't know IP addresses — they only know which TLD servers are authoritative for .com, .org, .net, etc. In practice there are hundreds of physical root server instances distributed globally via anycast.
TLD (Top-Level Domain) servers: Operated by registries (Verisign for .com/.net, PIR for .org). They know which nameservers are authoritative for each registered domain under their TLD.
Authoritative nameservers: These are the servers that actually hold the DNS records for a zone. When you configure DNS for example.com, you are configuring authoritative nameservers.
Recursive resolvers: The workhorses that clients actually talk to. They walk the hierarchy on behalf of clients, starting from the root if necessary, and cache results. Examples: your ISP's resolver, 8.8.8.8 (Google), 1.1.1.1 (Cloudflare), or your internal resolver running BIND/Unbound.
Stub resolver: The DNS client library on your machine. It reads /etc/resolv.conf, sends queries to the configured recursive resolver, and returns answers to the application.
DNS Resolution Step by Step¶
When your application looks up app.example.com:
1. Application calls getaddrinfo("app.example.com")
2. Stub resolver checks /etc/nsswitch.conf for resolution order
(typically: files → dns, meaning /etc/hosts first, then DNS)
3. Stub resolver checks /etc/hosts — no match
4. Stub resolver reads /etc/resolv.conf for nameserver IP
5. Stub resolver sends query to recursive resolver (e.g., 10.0.1.10)
6. Recursive resolver checks its cache — cache miss
7. Recursive resolver queries a root server:
"Who handles .com?" → "Ask a.gtld-servers.net"
8. Recursive resolver queries the .com TLD server:
"Who handles example.com?" → "Ask ns1.example.com (10.0.1.50)"
9. Recursive resolver queries ns1.example.com:
"What is app.example.com?" → "10.0.2.100, TTL 300"
10. Recursive resolver caches the answer for 300 seconds
11. Recursive resolver returns 10.0.2.100 to stub resolver
12. Application connects to 10.0.2.100
Each layer in this chain caches responses according to the TTL. A response with TTL 3600 means every resolver in the chain can reuse that answer for up to 3600 seconds without asking again.
Record Types¶
Core Records¶
| Type | Purpose | Example |
|---|---|---|
| A | IPv4 address mapping | app.example.com. 300 IN A 10.0.2.100 |
| AAAA | IPv6 address mapping | app.example.com. 300 IN AAAA 2001:db8::1 |
| CNAME | Canonical name (alias) | www.example.com. 300 IN CNAME app.example.com. |
| MX | Mail exchange server | example.com. 3600 IN MX 10 mail.example.com. |
| NS | Nameserver delegation | example.com. 86400 IN NS ns1.example.com. |
| SOA | Start of Authority | Serial number, refresh intervals, zone metadata |
| TXT | Arbitrary text data | SPF, DKIM, domain verification tokens |
| SRV | Service location | _http._tcp.example.com. 300 IN SRV 10 0 8080 app.example.com. |
| PTR | Reverse lookup (IP to name) | 100.2.0.10.in-addr.arpa. 3600 IN PTR app.example.com. |
| CAA | Certificate Authority Authorization | example.com. 3600 IN CAA 0 issue "letsencrypt.org" |
SOA Record in Detail¶
The SOA (Start of Authority) record is mandatory for every zone. It contains zone-level metadata:
example.com. IN SOA ns1.example.com. admin.example.com. (
2026031901 ; Serial — MUST increment on every change
3600 ; Refresh — how often secondaries check for updates (1h)
900 ; Retry — how often to retry if refresh fails (15m)
604800 ; Expire — when secondaries stop serving if primary is unreachable (7d)
300 ; Minimum TTL — negative caching TTL (5m)
)
The serial number is the most operationally important field. If you edit a zone file and forget to increment the serial, secondary nameservers will ignore the update because they think they already have the latest version.
Remember: Use the date-based serial format YYYYMMDDNN (e.g.,
2026031901for the first change on March 19, 2026). This is human-readable and naturally increments. TheNNsuffix allows up to 99 changes per day. Never use arbitrary numbers — you cannot go backward, and if you accidentally set the serial to9999999999, recovery requires manual intervention on every secondary.
SRV Records and Service Discovery¶
SRV records encode service location with priority, weight, port, and target:
_service._proto.name. TTL IN SRV priority weight port target.
_http._tcp.example.com. 300 IN SRV 10 60 8080 app1.example.com.
_http._tcp.example.com. 300 IN SRV 10 40 8080 app2.example.com.
_http._tcp.example.com. 300 IN SRV 20 0 8080 app3.example.com.
- Priority: Lower number = preferred (like MX). Priority 10 servers are tried before priority 20.
- Weight: For load distribution among same-priority records. 60/40 split means 60% of traffic to app1, 40% to app2.
- Port: The port the service listens on.
- Target: The hostname of the service.
Consul uses SRV records for service discovery. Kubernetes uses them for headless services.
TTL and Caching¶
TTL (Time-To-Live) is the single most important operational parameter in DNS. It controls how long resolvers cache a response before asking the authoritative server again.
High TTL (3600-86400 seconds):
+ Less query load on authoritative servers
+ Faster responses for clients (served from cache)
- Slow propagation when you change records
- Long recovery time during incidents (clients stuck on old IP)
Low TTL (30-300 seconds):
+ Fast propagation of changes
+ Quick failover during incidents
- Higher query load on authoritative servers
- Slightly more latency on cache misses
- Some resolvers enforce a minimum TTL (often 30-60s) regardless of what you set
TTL Strategy for Migrations¶
Normal state: TTL 3600 (1 hour) or higher
48 hours before: Lower TTL to 60 seconds
(wait for old cached entries to expire)
Migration time: Change the record to new IP (TTL still 60)
Verify: Check from multiple vantage points
Post-migration: Raise TTL back to 3600 after 24-48 hours
The 48-hour lead time matters. If your current TTL is 86400 (24 hours), lowering it to 60 seconds only takes effect after existing cached entries expire — up to 24 hours later. Plan ahead.
Negative Caching¶
When a name does not exist (NXDOMAIN), resolvers cache that negative result too. The cache duration is the SOA minimum TTL field. If your SOA minimum is 3600 seconds and someone queries a name before you create it, the NXDOMAIN is cached for an hour even after you add the record.
EDNS (Extension Mechanisms for DNS)¶
Standard DNS messages are limited to 512 bytes over UDP. EDNS0 (RFC 6891) extends this limit, typically to 4096 bytes. This matters because:
- DNSSEC responses are larger than 512 bytes (signatures add bulk)
- Responses with many records (round-robin A records, large TXT records) may exceed 512 bytes
- Without EDNS, the server returns a truncated response and the client retries over TCP (slower)
# Check EDNS support
dig +edns=0 +bufsize=4096 example.com
# If you see "EDNS: version: 0, flags:; udp: 4096" in output, EDNS is working
Some broken middleboxes (firewalls, load balancers) strip or block EDNS. This causes DNSSEC failures and truncation issues. If you see intermittent DNS failures that correlate with response size, suspect EDNS problems.
DNS over HTTPS (DoH) and DNS over TLS (DoT)¶
Traditional DNS is unencrypted UDP on port 53. Anyone on the network path can see and modify DNS queries. DoH and DoT encrypt DNS traffic.
DNS over TLS (DoT):
- Port 853
- TLS wrapper around standard DNS wire format
- Easy to block (dedicated port)
- Used by: systemd-resolved, Android 9+, Unbound
DNS over HTTPS (DoH):
- Port 443 (same as HTTPS)
- DNS queries inside HTTP/2 or HTTP/3
- Harder to block (mixed with regular HTTPS traffic)
- Used by: Firefox, Chrome, Cloudflare, Google
Configuring DoT with systemd-resolved¶
# /etc/systemd/resolved.conf
[Resolve]
DNS=1.1.1.1#cloudflare-dns.com 8.8.8.8#dns.google
DNSOverTLS=yes
Split-Horizon DNS¶
Split-horizon DNS returns different answers based on the source of the query. Internal clients get private IPs; external clients get public IPs.
BIND Views¶
view "internal" {
match-clients { 10.0.0.0/8; 172.16.0.0/12; 192.168.0.0/16; 127.0.0.0/8; };
zone "example.com" {
type master;
file "zones/example.com.internal";
};
};
view "external" {
match-clients { any; };
zone "example.com" {
type master;
file "zones/example.com.external";
};
};
Cloud Split-Horizon¶
In AWS, use Route 53 private hosted zones associated with your VPCs. Queries from within the VPC resolve to private IPs; queries from outside resolve via the public hosted zone.
In GCP, use Cloud DNS private zones. In Azure, use Azure Private DNS zones.
Zone Files and Zone Transfers¶
Zone File Format¶
$TTL 300
$ORIGIN example.com.
@ IN SOA ns1.example.com. admin.example.com. (
2026031901 ; Serial
3600 ; Refresh
900 ; Retry
604800 ; Expire
300 ; Minimum TTL
)
; Nameservers
IN NS ns1.example.com.
IN NS ns2.example.com.
; Mail
IN MX 10 mail.example.com.
IN MX 20 mail-backup.example.com.
; Glue records (NS records need A records in the same zone)
ns1 IN A 10.0.1.10
ns2 IN A 10.0.1.11
; Services
app IN A 10.0.2.100
app IN A 10.0.2.101 ; Round-robin
mail IN A 10.0.2.200
www IN CNAME app.example.com.
; TXT records
@ IN TXT "v=spf1 mx -all"
@ IN CAA 0 issue "letsencrypt.org"
Key syntax rules:
- @ means the zone origin (example.com.)
- Names without a trailing dot are relative to $ORIGIN
- Names WITH a trailing dot are absolute (fully qualified)
- www means www.example.com. but www.example.com (no dot) means www.example.com.example.com.
Zone Transfers (AXFR/IXFR)¶
Zone transfers replicate zone data from primary to secondary nameservers:
- AXFR (full transfer): Transfers the entire zone. Used for initial replication or when incremental data is unavailable.
- IXFR (incremental transfer): Transfers only changes since a given serial number. More efficient for large zones with small changes.
# BIND primary configuration
zone "example.com" {
type master;
file "zones/example.com";
allow-transfer { 10.0.1.11; 10.0.1.12; }; # Only secondaries
also-notify { 10.0.1.11; 10.0.1.12; }; # Push notifications
};
# BIND secondary configuration
zone "example.com" {
type slave;
file "zones/example.com.slave";
masters { 10.0.1.10; };
};
DNS Server Software¶
BIND (named)¶
The oldest and most widely deployed DNS server. Runs both authoritative and recursive. Configuration is complex but extremely flexible.
# Check config syntax
named-checkconf /etc/named.conf
named-checkzone example.com /var/named/zones/example.com
# Reload a zone without restarting
rndc reload example.com
# Flush resolver cache
rndc flush
# Dump cache to file for inspection
rndc dumpdb -cache
cat /var/named/data/cache_dump.db
Unbound¶
Purpose-built recursive resolver. Faster and more secure than BIND for pure recursion. Does not serve authoritative zones. Excellent for internal resolver infrastructure.
# /etc/unbound/unbound.conf
server:
interface: 0.0.0.0
access-control: 10.0.0.0/8 allow
access-control: 127.0.0.0/8 allow
# Performance
num-threads: 4
msg-cache-size: 128m
rrset-cache-size: 256m
# Security
hide-identity: yes
hide-version: yes
harden-glue: yes
harden-dnssec-stripped: yes
# DNSSEC
auto-trust-anchor-file: "/var/lib/unbound/root.key"
# Forward specific zones to internal DNS
forward-zone:
name: "internal.example.com."
forward-addr: 10.0.1.10
forward-addr: 10.0.1.11
# Forward everything else to upstream
forward-zone:
name: "."
forward-addr: 1.1.1.1
forward-addr: 8.8.8.8
dnsmasq¶
Lightweight DNS forwarder and DHCP server. Common in home routers, development environments, and small networks. Not suitable for large-scale authoritative DNS.
# /etc/dnsmasq.conf
listen-address=127.0.0.1,10.0.1.1
cache-size=10000
no-resolv
server=8.8.8.8
server=8.8.4.4
# Override specific names
address=/app.local/10.0.2.100
# Forward specific domain to internal DNS
server=/internal.example.com/10.0.1.10
CoreDNS¶
Cloud-native, plugin-based DNS server. Written in Go. Default DNS server in Kubernetes since 1.13. Configuration is a Corefile — a chain of plugins that process queries in order.
# Corefile
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}
Plugin order matters. cache after forward means responses from upstream are cached. loop detects forwarding loops and halts CoreDNS if detected. loadbalance randomizes A/AAAA record order for round-robin distribution.
The Linux DNS Stack¶
/etc/resolv.conf¶
# Traditional resolv.conf
nameserver 10.0.1.10
nameserver 10.0.1.11
search example.com internal.example.com
options timeout:2 attempts:3 ndots:1
nameserver: Up to 3 recursive resolvers. Tried in order, failover on timeout.search: Suffix list appended to short names.appbecomesapp.example.com, thenapp.internal.example.com.ndots: If a name has fewer dots thanndots, search domains are tried first. Default is 1. Kubernetes sets this to 5.timeout: Seconds to wait for a response before trying the next nameserver.attempts: Number of times to try the entire nameserver list.
/etc/nsswitch.conf¶
Controls the order of name resolution sources:
This means: check /etc/hosts first, then DNS, then fall back to the local hostname. If /etc/hosts has an entry for app.example.com, DNS is never queried for that name.
systemd-resolved¶
Modern Linux systems often use systemd-resolved as a local caching stub resolver. It listens on 127.0.0.53 and manages DNS configuration per-interface.
# Check status
resolvectl status
# Query through resolved
resolvectl query app.example.com
# Flush cache
resolvectl flush-caches
# Show cache statistics
resolvectl statistics
# Set DNS for an interface
resolvectl dns eth0 10.0.1.10 10.0.1.11
resolvectl domain eth0 example.com
# Check current DNS configuration
resolvectl dns
resolvectl domain
When systemd-resolved is active, /etc/resolv.conf is typically a symlink to /run/systemd/resolve/stub-resolv.conf (pointing to 127.0.0.53). Do not edit it directly — configure via systemd-resolved or /etc/systemd/resolved.conf.
DNS Load Balancing¶
Round-Robin¶
The simplest approach: multiple A records for the same name.
app.example.com. 300 IN A 10.0.2.100
app.example.com. 300 IN A 10.0.2.101
app.example.com. 300 IN A 10.0.2.102
Resolvers rotate the order of returned records. Clients typically connect to the first address. This gives rough load distribution but no health checking — if 10.0.2.101 is down, ~33% of clients will try it first and experience delays.
Weighted Routing (Route 53)¶
{
"Name": "app.example.com",
"Type": "A",
"SetIdentifier": "primary",
"Weight": 70,
"TTL": 60,
"ResourceRecords": [{"Value": "10.0.2.100"}]
}
{
"Name": "app.example.com",
"Type": "A",
"SetIdentifier": "secondary",
"Weight": 30,
"TTL": 60,
"ResourceRecords": [{"Value": "10.0.2.101"}]
}
70% of DNS responses return the primary IP, 30% return the secondary. Useful for canary deployments and gradual traffic shifting.
Latency-Based Routing¶
Route 53 and Cloud DNS can route based on the resolver's geographic location, returning the IP of the nearest datacenter. This is measured latency, not geographic distance.
Failover Routing¶
Active-passive: return the primary record unless a health check fails, then return the secondary.
# Route 53 health check + failover
aws route53 create-health-check --caller-reference $(date +%s) \
--health-check-config '{
"IPAddress": "10.0.2.100",
"Port": 443,
"Type": "HTTPS",
"ResourcePath": "/health",
"RequestInterval": 10,
"FailureThreshold": 3
}'
DNSSEC Basics¶
DNSSEC adds cryptographic signatures to DNS responses, allowing resolvers to verify that answers have not been tampered with.
Without DNSSEC:
Resolver asks: "What is app.example.com?"
Attacker intercepts, returns malicious IP
Resolver has no way to detect the forgery
With DNSSEC:
Authoritative server signs each record with a private key
Resolver verifies signature using the published DNSKEY record
Forged or modified responses fail signature verification
Key Types¶
- ZSK (Zone Signing Key): Signs individual records in the zone. Rotated frequently (every 1-3 months).
- KSK (Key Signing Key): Signs the DNSKEY record set. Rotated less frequently (every 1-2 years).
- DS (Delegation Signer): A hash of the KSK, published in the parent zone. Creates the chain of trust from parent to child.
Chain of Trust¶
Root (.) — signs .com DS record
|
.com TLD — signs example.com DS record
|
example.com — KSK signs DNSKEY RRset
— ZSK signs all other records
Checking DNSSEC¶
# Check if a domain has DNSSEC
dig example.com +dnssec +short
# Full DNSSEC validation trace
dig example.com +trace +dnssec
# Check DS record in parent zone
dig DS example.com @a.gtld-servers.net +short
Private DNS Zones¶
AWS Route 53¶
# Create a private hosted zone
aws route53 create-hosted-zone \
--name internal.example.com \
--vpc VPCRegion=us-east-1,VPCId=vpc-abc123 \
--caller-reference $(date +%s) \
--hosted-zone-config PrivateZone=true
# Associate with additional VPCs
aws route53 associate-vpc-with-hosted-zone \
--hosted-zone-id Z123456 \
--vpc VPCRegion=us-west-2,VPCId=vpc-def456
GCP Cloud DNS¶
gcloud dns managed-zones create internal-zone \
--description="Internal DNS" \
--dns-name="internal.example.com." \
--visibility=private \
--networks=my-vpc
Azure Private DNS¶
az network private-dns zone create \
--resource-group myRG \
--name internal.example.com
az network private-dns link vnet create \
--resource-group myRG \
--zone-name internal.example.com \
--name mylink \
--virtual-network myVnet \
--registration-enabled false
Service Discovery via DNS¶
Consul DNS Interface¶
Consul provides a DNS interface for service discovery on port 8600 by default:
# Query a service
dig @127.0.0.1 -p 8600 web.service.consul SRV
# Returns:
# web.service.consul. 0 IN SRV 1 1 8080 node1.node.dc1.consul.
# web.service.consul. 0 IN SRV 1 1 8080 node2.node.dc1.consul.
# Query with tag
dig @127.0.0.1 -p 8600 production.web.service.consul
# Forward .consul domain from your resolver
# dnsmasq:
server=/consul/127.0.0.1#8600
# Unbound:
forward-zone:
name: "consul."
forward-addr: 127.0.0.1@8600
Kubernetes DNS¶
Kubernetes assigns DNS names to Services and Pods automatically:
Service: <service>.<namespace>.svc.cluster.local
Pod: <pod-ip-dashed>.<namespace>.pod.cluster.local
Headless: <pod-name>.<service>.<namespace>.svc.cluster.local
Examples:
api-server.production.svc.cluster.local → ClusterIP
10-0-2-100.production.pod.cluster.local → Pod IP
api-server-0.api-server.production.svc.cluster.local → StatefulSet pod
Headless services (ClusterIP: None) return individual Pod IPs instead of a single virtual IP, enabling client-side load balancing and StatefulSet addressing.
DNS in Cloud Environments¶
Route 53 Key Concepts¶
- Hosted zones: Containers for DNS records. Public zones serve the internet; private zones serve VPCs.
- Alias records: Route 53 extension that maps to AWS resources (ELB, CloudFront, S3) without a CNAME. Works at the zone apex.
- Health checks: Probe endpoints and remove unhealthy records from responses.
- Routing policies: Simple, weighted, latency, failover, geolocation, multivalue answer.
Cloud DNS Resolver Endpoints¶
In hybrid environments (on-premises + cloud), you need DNS forwarding between networks:
On-premises DNS server → Route 53 Resolver Inbound Endpoint
(resolves cloud private zones from on-prem)
Route 53 Resolver Outbound Endpoint → On-premises DNS server
(resolves on-prem zones from cloud)
This replaces hacks like running BIND forwarders in EC2 instances.
Wiki Navigation¶
Prerequisites¶
- Networking Deep Dive (Topic Pack, L1)
Next Steps¶
- AWS Route 53 (Topic Pack, L2)
Related Content¶
- AWS Route 53 (Topic Pack, L2) — DNS
- Case Study: CoreDNS Timeout Pod DNS (Case Study, L2) — DNS
- Case Study: DNS Looks Broken — TLS Expired, Fix Is Cert-Manager (Case Study, L2) — DNS
- Case Study: DNS Resolution Slow (Case Study, L1) — DNS
- Case Study: DNS Split Horizon Confusion (Case Study, L2) — DNS
- DHCP & IP Address Management (Topic Pack, L1) — DNS
- DNS Flashcards (CLI) (flashcard_deck, L1) — DNS
- DNS Operations (Topic Pack, L2) — DNS
- Incident Simulator (18 scenarios) (CLI) (Exercise Set, L2) — DNS
- Networking Deep Dive (Topic Pack, L1) — DNS
Pages that link here¶
- Anti-Primer: DNS Deep Dive
- DHCP & IP Address Management
- DNS Deep Dive
- DNS Operations
- DNS Resolution Taking 5+ Seconds Intermittently
- DNS Split-Horizon Confusion
- Incident Replay: DNS Resolution Slow
- Networking Deep Dive
- Networking Drills
- Production Readiness Review: Answer Key
- Production Readiness Review: Study Plans
- Symptoms
- Symptoms: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager