Skip to content

Portal | Level: L2: Operations | Topics: DNS | Domain: Networking

DNS Operations - Primer

Why This Matters

DNS is the invisible foundation under every service you run. When DNS works, nobody thinks about it. When it breaks, everything breaks — and the symptoms look like application failures, network issues, or authentication problems until someone finally thinks to check name resolution. I have spent more hours debugging "application issues" that turned out to be DNS problems than I care to admit. Understanding DNS deeply — from BIND zone files to CoreDNS in Kubernetes — is a core ops skill that pays off every single week.

DNS is also one of the most commonly misconfigured pieces of infrastructure. A bad TTL decision can mean hours of stale records after a migration. A missing PTR record can break email delivery. A split-horizon mistake can make internal services unreachable from the wrong network.

Core Concepts

1. DNS Resolution Flow

User types app.example.com
         |
         v
Local Resolver Cache (/etc/resolv.conf)
         |  (cache miss)
         v
Recursive Resolver (ISP or 8.8.8.8)
         |
         v
Root Servers (.)
         |  "Ask .com servers"
         v
TLD Servers (.com)
         |  "Ask example.com nameservers"
         v
Authoritative Server (ns1.example.com)
         |  "app.example.com = 10.0.1.50"
         v
Answer cached at each layer (per TTL)

Name origin: DNS was invented by Paul Mockapetris in 1983 (RFCs 882 and 883, later superseded by RFCs 1034/1035). Before DNS, hostname-to-IP mappings lived in a single file called HOSTS.TXT maintained by the Stanford Research Institute. Every machine on the ARPANET fetched a fresh copy via FTP. By the early 1980s, the file was changing so frequently that it was already stale by the time it was downloaded. DNS replaced this with a distributed, hierarchical database.

2. Record Types That Matter

Type Purpose Example
A IPv4 address app.example.com. 300 IN A 10.0.1.50
AAAA IPv6 address app.example.com. 300 IN AAAA 2001:db8::1
CNAME Alias to another name www.example.com. 300 IN CNAME app.example.com.
MX Mail exchange example.com. 3600 IN MX 10 mail.example.com.
NS Nameserver delegation example.com. 86400 IN NS ns1.example.com.
PTR Reverse lookup 50.1.0.10.in-addr.arpa. 3600 IN PTR app.example.com.
SOA Start of authority Serial, refresh, retry, expire, minimum TTL
SRV Service location _http._tcp.example.com. 300 IN SRV 10 0 8080 app.example.com.
TXT Arbitrary text SPF records, DKIM, domain verification
CAA Certificate authority auth example.com. 3600 IN CAA 0 issue "letsencrypt.org"

3. BIND Configuration

BIND (named) is the most widely deployed authoritative DNS server. It has been around since the 1980s and runs a significant portion of the internet's DNS infrastructure.

Name origin: BIND stands for Berkeley Internet Name Domain. It was written by four UC Berkeley graduate students in the early 1980s as part of a DARPA grant. The daemon is called named — literally "name daemon." BIND is now maintained by ISC (Internet Systems Consortium) and is the most widely deployed DNS software on Earth.

# /etc/named.conf (main config)
options {
    listen-on port 53 { 127.0.0.1; 10.0.1.10; };
    directory       "/var/named";
    allow-query     { localhost; 10.0.0.0/8; };
    allow-transfer  { 10.0.1.11; };  # Secondary DNS
    recursion no;                      # Authoritative only
    dnssec-validation auto;
};

zone "example.com" IN {
    type master;
    file "example.com.zone";
    allow-update { none; };
    notify yes;
};

zone "1.0.10.in-addr.arpa" IN {
    type master;
    file "10.0.1.rev";
};
# /var/named/example.com.zone
$TTL 300
@   IN  SOA ns1.example.com. admin.example.com. (
        2026031501  ; Serial (YYYYMMDDNN)
        3600        ; Refresh (1 hour)
        900         ; Retry (15 min)
        604800      ; Expire (1 week)
        300         ; Minimum TTL (5 min)
    )

    IN  NS      ns1.example.com.
    IN  NS      ns2.example.com.
    IN  MX  10  mail.example.com.

ns1         IN  A       10.0.1.10
ns2         IN  A       10.0.1.11
app         IN  A       10.0.1.50
app         IN  A       10.0.1.51    ; Round-robin
mail        IN  A       10.0.1.60
www         IN  CNAME   app.example.com.
staging     IN  A       10.0.2.50

Critical: The serial number MUST increase with every change. If it does not, secondary servers will not pick up the update.

4. Split-Horizon DNS

Different answers for internal vs. external clients. Essential for environments where internal services use private IPs but external clients need public IPs.

# /etc/named.conf with views
view "internal" {
    match-clients { 10.0.0.0/8; 172.16.0.0/12; 192.168.0.0/16; };
    zone "example.com" {
        type master;
        file "example.com.internal.zone";
    };
};

view "external" {
    match-clients { any; };
    zone "example.com" {
        type master;
        file "example.com.external.zone";
    };
};
# Internal zone: private IPs
app     IN  A   10.0.1.50

# External zone: public IPs
app     IN  A   203.0.113.50

5. DNS Debugging

# dig - the primary DNS debugging tool
dig app.example.com                    # Basic A record lookup
dig app.example.com @8.8.8.8          # Query specific server
dig app.example.com +short            # Just the answer
dig app.example.com +trace            # Full resolution path
dig -x 10.0.1.50                      # Reverse lookup
dig example.com MX                    # MX records
dig example.com NS                    # Nameservers
dig example.com SOA                   # SOA record (serial)
dig example.com ANY +noall +answer    # All records
dig example.com +dnssec               # Show DNSSEC info

# Check zone transfer
dig @ns1.example.com example.com AXFR

# nslookup (simpler, available everywhere)
nslookup app.example.com
nslookup -type=MX example.com
nslookup app.example.com 10.0.1.10   # Query specific server

# DNS traffic capture
tcpdump -n -i eth0 port 53 -w dns.pcap
tcpdump -n -i eth0 port 53 -l        # Live output

# Check /etc/resolv.conf
cat /etc/resolv.conf
# nameserver 10.0.1.10
# nameserver 10.0.1.11
# search example.com
# options timeout:2 attempts:3

6. TTL Strategy

TTL (Time-To-Live) = how long resolvers cache the answer

High TTL (3600-86400):
  + Less load on authoritative servers
  + Faster resolution for clients (cached)
  - Slow propagation of changes
  - Long outage if you need to change an IP quickly

Low TTL (30-300):
  + Fast propagation of changes
  + Quick failover during incidents
  - More load on authoritative servers
  - Slightly higher latency for first-time lookups

Strategy:
  Normal operations: 300-3600 seconds
  Before a migration: Lower to 60 seconds 48 hours before
  During migration: Keep at 60 seconds
  After migration verified: Raise back to 300-3600

7. CoreDNS in Kubernetes

CoreDNS is the default DNS server in Kubernetes. It resolves service names, pod names, and external names for all cluster traffic.

Kubernetes DNS resolution:
  my-service                  → my-service.default.svc.cluster.local
  my-service.other-ns         → my-service.other-ns.svc.cluster.local
  my-service.other-ns.svc     → my-service.other-ns.svc.cluster.local

Pod DNS policy:
  dnsPolicy: ClusterFirst     → Use CoreDNS (default)
  dnsPolicy: Default          → Use node's /etc/resolv.conf
  dnsPolicy: None             → Use custom dnsConfig
# CoreDNS Corefile (from ConfigMap)
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health {
          lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
          ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
          max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }

8. DNSSEC Basics

DNSSEC adds cryptographic signatures to DNS responses.

Without DNSSEC:
  Client asks: "What is app.example.com?"
  Attacker intercepts and returns: "192.168.1.1" (malicious IP)
  Client has no way to verify the answer

With DNSSEC:
  Authoritative server signs responses with private key
  Resolver verifies signature using published public key (DNSKEY)
  Forged responses fail signature verification

Key types:
  KSK (Key Signing Key) → signs the zone's DNSKEY records
  ZSK (Zone Signing Key) → signs all other records in the zone
  DS (Delegation Signer) → published in parent zone, chains trust

Common Pitfalls

Remember: Mnemonic for DNS record types: A CNAME MX NS SOA = "A Cat Might Nap Soundly On Anything." A (address), CNAME (alias), MX (mail), NS (nameserver), SOA (start of authority). The most frequently looked-up types in that order.

Debug clue: When dig returns NXDOMAIN, the name does not exist at all. When it returns NOERROR with an empty answer section, the name exists but has no record of the type you asked for (e.g., querying AAAA for a name that only has an A record). These two conditions look identical to applications but have very different causes.

War story: One of the most infamous DNS outages was the 2016 Dyn DDoS attack. The Mirai botnet flooded Dyn's recursive resolvers with traffic from compromised IoT devices, taking down Twitter, Reddit, Netflix, and GitHub for hours. The root cause was not a DNS misconfiguration — it was that too many major services depended on a single DNS provider without a secondary.

  1. Forgetting the trailing dot — In zone files, app.example.com (no dot) becomes app.example.com.example.com. The trailing dot means "fully qualified."
  2. Not incrementing the SOA serial — You edit the zone file but forget to increment the serial. Secondary servers never pick up the change. Use YYYYMMDDNN format.
  3. CNAME at the zone apexexample.com. IN CNAME other.com. is illegal per RFC. Use ALIAS or ANAME if your DNS provider supports it, or use A records.
  4. TTL too high before a migration — Your records have 86400 TTL. You change the IP. Some clients cache the old IP for 24 hours. Lower TTL before you need fast changes.
  5. Missing PTR records — Forward lookup works but reverse does not. This breaks email delivery, SSH host verification, and some logging systems.
  6. Allowing zone transfers to anyoneallow-transfer { any; } lets anyone dump your entire zone. Restrict to secondary nameservers only.
  7. Search domain appended unexpectedly/etc/resolv.conf has search example.com. A lookup for app first tries app.example.com. This causes confusion when internal and external names collide.
  8. CoreDNS ndots causing slow external lookups — Kubernetes default ndots: 5 means api.github.com tries 4 cluster suffixes before the real lookup. Override with dnsConfig for pods that make many external calls.

Wiki Navigation

Prerequisites

Next Steps