DNSSEC & DNS Security — Street-Level Ops¶

Quick Diagnosis Commands¶

# Check if a domain has DNSSEC enabled (DS record in parent)
dig DS example.com
dig DS example.com @a.gtld-servers.net.   # ask parent directly

# Check zone's DNSKEY records
dig DNSKEY example.com +short

# Verify resolver is validating (AD flag = Authenticated Data)
dig +dnssec A www.example.com | grep "flags:"
# flags: qr rd ra ad  ← AD present = validation succeeded

# Test resolution with validation disabled (bypass DNSSEC)
dig +cd A www.example.com    # +cd = checking disabled

# Test intentionally broken DNSSEC domain
dig A www.dnssec-failed.org
# Should return SERVFAIL if resolver is validating

# Check RRSIG validity (check expiry dates)
dig +dnssec RRSIG SOA example.com

> **One-liner:** The AD (Authenticated Data) flag in a DNS response means the resolver validated the entire DNSSEC chain from root to the answer. No AD flag means either the zone is unsigned, validation is disabled on the resolver, or the chain is broken.

# Trace full DNSSEC chain
dig +trace +dnssec A example.com

# Check NSEC3 configuration
dig NSEC3PARAM example.com

Gotcha: SERVFAIL After Adding DNSSEC¶

Symptom: Domain was working, you added DNSSEC, now everything returns SERVFAIL.

Rule: The most common cause is a mismatch between the DS record published in the parent zone and the DNSKEY in the child zone. The resolver validates the chain and fails.

# Step 1: Get the DS records from the parent
dig DS example.com @a.gtld-servers.net.
# example.com.  IN DS 67890 13 2 AABC1234...

# Step 2: Get the DNSKEY from the child zone and compute expected DS
dig DNSKEY example.com | grep -v "^;" | dnssec-dsfromkey -f - example.com
# example.com. IN DS 67890 13 2 AABC1234...
# If these don't match → DS mismatch. Update the DS at the registrar.

# Step 3: Check if the DNSKEY is actually being served
dig DNSKEY example.com @ns1.example.com.   # query authoritative directly
# If DNSKEY is missing at authoritative but DS exists in parent → chain broken

# Step 4: Check RRSIG covering the DNSKEY
dig +dnssec DNSKEY example.com | grep RRSIG
# If no RRSIG on DNSKEY → zone is not being signed

# Step 5: Temporarily disable validation at resolver to confirm it's DNSSEC
dig +cd A example.com
# If this succeeds but non-+cd fails → definitely a DNSSEC chain break

Gotcha: Signature Expiry Breaking Resolution¶

Symptom: Zone was working, now SERVFAIL, no recent changes. RRSIG records are present but old.

Rule: RRSIG records expire. If the signing process (cron job, auto-signer) stopped running, signatures expire and the zone becomes unresolvable.

# Check signature expiry
dig +dnssec RRSIG SOA example.com
# RRSIG SOA 13 2 3600 20260115000000 20260101000000 ...
#           sig-expiry=20260115  sig-inception=20260101
# If today > sig-expiry → expired

# Check BIND signing status
rndc dnssec -status example.com

# Force re-sign immediately
rndc sign example.com
# or
rndc loadkeys example.com

# Check if the inline signing process is running
journalctl -u named | grep "zone.*signed"

# Prevent recurrence: verify the signing interval
# In named.conf dnssec-policy:
dnssec-policy "default" {
    signatures-validity 30d;          # signatures valid 30 days
    signatures-validity-dnskey 30d;
    signatures-refresh 5d;            # re-sign when 5 days remain
};

Gotcha: Zone Enumeration via NSEC Walking¶

Symptom: Security audit shows all internal subdomains exposed via NSEC chain walking.

Rule: NSEC reveals the sorted list of all names in the zone. Switch to NSEC3 with a random salt.

# Demonstrate NSEC walking (educational — run on test zone only)
ldns-walk example.com 2>/dev/null | head -20
# Returns every name in the zone in alphabetical order

# Migrate to NSEC3
# In BIND named.conf dnssec-policy:
dnssec-policy "default" {
    nsec3param iterations 0 optout no salt-length 8;
};

# Apply
rndc reload example.com
rndc sign example.com

# Verify NSEC3 is in use (no more NSEC records)
dig +dnssec NSEC example.com    # should return empty
dig NSEC3PARAM example.com      # should return NSEC3PARAM record

Pattern: DNSSEC Monitoring with Nagios/Icinga¶

Automate signature expiry alerts:

# check_dnssec_expiry.sh — alert when signatures expire within N days
#!/bin/bash
ZONE=$1
WARN_DAYS=${2:-14}
CRIT_DAYS=${3:-7}

# Get SOA RRSIG expiry
EXPIRY=$(dig +short +dnssec RRSIG SOA ${ZONE} | awk '{print $5}' | head -1)
# Format: 20260415000000

EXPIRY_DATE=$(date -d "${EXPIRY:0:8}" +%s)
NOW=$(date +%s)
DAYS_LEFT=$(( (EXPIRY_DATE - NOW) / 86400 ))

if [ $DAYS_LEFT -lt $CRIT_DAYS ]; then
    echo "CRITICAL: DNSSEC signature for $ZONE expires in $DAYS_LEFT days"
    exit 2
elif [ $DAYS_LEFT -lt $WARN_DAYS ]; then
    echo "WARNING: DNSSEC signature for $ZONE expires in $DAYS_LEFT days"
    exit 1
else
    echo "OK: DNSSEC signature for $ZONE expires in $DAYS_LEFT days"
    exit 0
fi

Pattern: Pre-Flight Check Before KSK Rollover¶

A failed KSK rollover can take down your entire domain. Always run this before submitting a new DS to the registrar:

# 1. Confirm new KSK is in the DNSKEY RRset
dig DNSKEY example.com | grep -c "DNSKEY"
# Should be 2 (old KSK + new KSK during double-sign phase)

# 2. Confirm DNSKEY RRset is signed by both KSKs
dig +dnssec DNSKEY example.com | grep RRSIG

# 3. Compute DS record for the new KSK
dnssec-dsfromkey Kexample.com.+013+newKeyTag.key
# example.com. IN DS 99999 13 2 XXYYZZ...

# 4. Double-check old DS still exists at parent (must remain until TTL expires)
dig DS example.com @a.gtld-servers.net.

# 5. Query both parent DS and child DNSKEY from multiple vantage points
for ns in 8.8.8.8 1.1.1.1 9.9.9.9; do
    echo "=== $ns ==="
    dig DS example.com @$ns +short
done

# 6. After submitting new DS — wait for parent TTL before removing old KSK
dig DS example.com @a.gtld-servers.net. +short
# When only the new DS appears here, you can remove the old KSK from DNSKEY

Scenario: DNS Cache Poisoning Suspected¶

Symptom: Users report being redirected to a wrong IP for a site. DNSSEC validation shows the correct record, but some users see different IPs.

# Step 1: Check what your resolver returns vs authoritative
dig A example.com @8.8.8.8            # Google's resolver
dig A example.com @your-resolver       # your resolver
dig A example.com @ns1.example.com.   # authoritative

# Step 2: Check if the resolver is validating (AD flag)
dig +dnssec A example.com @your-resolver | grep "flags:"
# No AD flag = not validating = vulnerable

# Step 3: Enable DNSSEC validation on BIND
# /etc/named.conf:
options {
    dnssec-validation auto;
};
# Reload:
rndc reload

# Step 4: Check resolver randomizes source ports
# Capture DNS queries outbound
tcpdump -i eth0 udp port 53 -n -c 100 | \
    awk '{match($5, /\.([0-9]+):$/, a); print a[1]}' | sort -n | uniq -c
# If all queries use the same source port → randomization broken → vulnerable
# Source port diversity should be spread across high ports (1024-65535)

# Step 5: Check resolver software version
dig +short chaos version.bind @your-resolver
# Old BIND versions (<9.4) don't randomize ports — upgrade immediately

War story: A registrar's web UI had a "DNSSEC" toggle that looked like an on/off switch. An admin clicked it to "update" the DS record. The UI deleted the old DS and published a new one with a different algorithm — but the zone was still signing with the old algorithm. Every validating resolver on the internet returned SERVFAIL for the domain. It took 48 hours to fully propagate the fix because of parent zone TTLs.

Emergency: DNSSEC Misconfiguration Taking Down Production Zone¶

Domain returning SERVFAIL, all services unreachable. DNSSEC chain is broken.

# Option 1: Disable DNSSEC temporarily (nuclear option — removes protection)
# Remove the DS record at the registrar
# Most registrars have a web UI — remove ALL DS records
# Wait for parent TTL to expire (could be 24-48 hours for .com)
# After TTL expires, zone resolves without validation
# Fix the underlying signing problem before re-enabling

# Option 2: Fix the DNSKEY/DS mismatch (preferred)
# Get current DNSKEY being served
dig DNSKEY example.com @ns1.example.com. | dnssec-dsfromkey -f - example.com
# Submit the correct DS hash to the registrar

# Option 3: Force re-sign if signatures expired
rndc sign example.com
rndc reload example.com
# Verify new signatures
dig +dnssec RRSIG SOA example.com

# While fixing, check if a specific subdomain is reachable
dig +cd A www.example.com   # +cd bypasses validation
# If +cd works but normal resolution fails → DNSSEC chain broken (not DNS itself)

# Check propagation to major resolvers
for ns in 8.8.8.8 1.1.1.1 9.9.9.9 208.67.222.222; do
    echo -n "$ns: "
    dig +short A www.example.com @$ns || echo "FAILED"
done

Useful One-Liners¶

# Validate a zone file locally before deploying
named-checkzone example.com /etc/bind/zones/example.com.zone

# Verify signed zone file
named-checkzone -i full example.com /etc/bind/zones/example.com.zone.signed

# Check all signatures in a zone file for expiry
dnssec-verify -o example.com example.com.zone.signed

# Query with DNSSEC flags explicitly
dig +dnssec +noall +answer A example.com

# Show full DNSSEC chain for debugging
dig +trace +dnssec A www.example.com 2>&1 | grep -E "RRSIG|DS|DNSKEY|NSEC"

# Check if resolver supports EDNS0 with DNSSEC OK bit
dig +dnssec +noall +comments A example.com | grep "EDNS"
# Should show: ; EDNS: version: 0, flags: do;  ← DO=DNSSEC OK bit set

# Test DoT with kdig
kdig -d @1.1.1.1 +tls-ca A example.com

# Test DoH
curl -s "https://cloudflare-dns.com/dns-query?name=example.com&type=A" \
  -H "accept: application/dns-json" | jq '.Answer[].data'

# Check if split-horizon internal zone is leaking
dig A internal.corp.example.com @8.8.8.8
# Should return NXDOMAIN — if it returns an IP, zone is leaking

Default trap: BIND's default dnssec-validation setting is auto, which uses the built-in root trust anchor. But if your BIND installation is old and the root KSK has been rolled since it was installed, validation may silently fail. Run rndc secroots to verify the trust anchors are current. Managed keys should show the current root KSK (key tag 20326).