Portal | Level: L2: Operations | Topics: MTU, Linux Networking Tools | Domain: Networking
Scenario: MTU Black Hole — Large Packets Silently Dropped¶
Situation¶
At 11:40 UTC, users report that the internal wiki (served over HTTPS) loads partially or hangs indefinitely. Small API responses work fine, but any page with substantial content never finishes loading. SSH to the wiki server works normally. The issue began after the infrastructure team migrated the wiki server into a new VLAN that traverses a VPN tunnel to reach the corporate network.
What You Know¶
- SSH to the wiki server works (small packets)
- Small HTTPS responses (health check endpoint returning
{"status":"ok"}) work fine - Large HTTPS responses (actual wiki pages) hang mid-transfer or never complete
- The wiki server was recently moved behind a site-to-site VPN tunnel (IPsec or WireGuard)
- No recent application or OS changes on the wiki server itself
- ICMP is filtered by an intermediate firewall (corporate security policy)
Investigation Steps¶
1. Confirm the problem is packet-size dependent¶
Command(s):
# Send pings of increasing size with Don't Fragment bit set
# -M do = set DF bit (do not fragment)
# -s = payload size (add 28 bytes for IP+ICMP headers)
ping -M do -s 1400 -c 3 wiki.internal.example.com
ping -M do -s 1450 -c 3 wiki.internal.example.com
ping -M do -s 1472 -c 3 wiki.internal.example.com
ping -M do -s 1473 -c 3 wiki.internal.example.com
# If ping is blocked, use a TCP-based MTU probe
tracepath wiki.internal.example.com
tracepath will attempt to discover the path MTU and report where the bottleneck is. A normal network returns ICMP Type 3, Code 4 ("Fragmentation Needed") — if that message is blocked by a firewall, the sender never learns to reduce packet size, creating a black hole.
2. Check the local interface MTU and look for a tunnel¶
Command(s):
# Check MTU on all interfaces
ip link show
ip -d link show
# Check if there is a tunnel interface with reduced MTU
ip tunnel show
wg show 2>/dev/null
# Check the route MTU
ip route get wiki.internal.example.com
ip route show to wiki.internal.example.com
3. Capture traffic to confirm retransmits and stalled transfers¶
Command(s):
# On the wiki server, capture the HTTPS session
tcpdump -nn -i eth0 host 10.10.5.100 and port 443 -w /tmp/mtu_debug.pcap
# Trigger a large response
curl -v -o /dev/null https://wiki.internal.example.com/large-page
# Analyze the capture
tcpdump -nn -r /tmp/mtu_debug.pcap | head -50
# Look for retransmissions of specific sequence numbers
tcpdump -nn -r /tmp/mtu_debug.pcap 'tcp[tcpflags] & tcp-syn != 0'
# Check for TCP retransmits in kernel stats
ss -ti dst wiki.internal.example.com
netstat -s | grep -i retransmit
ss -ti will show high retransmit counts on the socket.
4. Verify PMTUD is broken by checking for ICMP unreachable messages¶
Command(s):
# Listen for ICMP "need to fragment" messages that should be coming back
tcpdump -nn -i eth0 icmp
# In another terminal, generate large packets
ping -M do -s 1472 -c 5 wiki.internal.example.com
# Check if the kernel has cached a lower PMTU
ip route get wiki.internal.example.com
# Look for "mtu" in the output — if PMTUD worked, it would show a reduced MTU
ip route get would show a cached lower MTU value. If the intermediate firewall is blocking all ICMP, you see nothing — no errors, no cached MTU reduction. This silence is the black hole.
Root Cause¶
The wiki server was moved to a network segment that reaches users through a VPN tunnel (IPsec). The tunnel adds 50-80 bytes of encapsulation overhead, reducing the effective path MTU to approximately 1420-1450 bytes. When the server sends a full 1500-byte TCP segment, the tunnel endpoint cannot forward it without fragmentation. Normally, it would send back an ICMP "Fragmentation Needed" message so the server can reduce its segment size (Path MTU Discovery). However, the corporate firewall blocks all ICMP traffic, including these essential PMTUD messages. The server never learns to send smaller packets, so it retransmits the same too-large packet repeatedly until the connection times out. Small packets (SSH keystrokes, short API responses, TCP handshakes) fit within the reduced MTU and work fine.
Fix¶
Immediate:
# Option 1: Reduce the MTU on the server's interface to fit within the tunnel
ip link set dev eth0 mtu 1400
# Option 2: Clamp TCP MSS at the tunnel endpoint to avoid the problem
# On the Linux router/firewall performing the tunneling:
iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN \
-j TCPMSS --clamp-mss-to-pmtu
# Option 3: If you control the tunnel, set the tunnel interface MTU correctly
ip link set dev tun0 mtu 1400
# Verify the fix
ping -M do -s 1372 -c 3 wiki.internal.example.com
curl -v -o /dev/null https://wiki.internal.example.com/large-page
Preventive:
- Never block ICMP Type 3 (Destination Unreachable) at firewalls. At minimum, allow Type 3 Code 4 (Fragmentation Needed). This is the single most common cause of MTU black holes.
- Configure --clamp-mss-to-pmtu on all tunnel endpoints. This rewrites the TCP MSS option during the handshake so endpoints agree on a safe segment size without relying on PMTUD.
- Document the MTU for every network segment, especially those involving tunnels, VPNs, or overlay networks (VXLAN overhead is 50 bytes, IPsec is 50-80 bytes, GRE is 24 bytes, WireGuard is 60 bytes).
- Add monitoring that tests large transfers, not just ping. A health check that downloads a 10KB payload will catch MTU issues that a simple connectivity check misses.
Common Mistakes¶
- Thinking "SSH works, so the network is fine." SSH interactive sessions send small packets that fit under the reduced MTU. The problem only manifests with larger payloads.
- Reducing MTU too aggressively. Setting MTU to 1200 "to be safe" wastes bandwidth. Calculate the actual tunnel overhead and subtract it from 1500.
- Blaming the application or TLS. The stalled transfer looks like an app hang or TLS negotiation failure, leading engineers down the wrong path for hours.
- Not understanding why ICMP matters. Blocking all ICMP "for security" breaks Path MTU Discovery. This is one of the most impactful misconfigurations in corporate networks.
- Forgetting to persist the MTU change.
ip link set mtuis lost on reboot. Update the interface configuration file or networkd/netplan config.
Interview Angle¶
Q: HTTPS to a server hangs but SSH works. What do you check?
Good answer shape: Immediately identify this as a potential MTU/PMTUD issue because SSH uses small packets while HTTPS transfers large payloads. Explain Path MTU Discovery: when a packet is too large and has the DF bit set, routers should send back ICMP "Fragmentation Needed" so the sender can reduce segment size. If a firewall blocks that ICMP message, the sender never adapts and keeps retransmitting the same oversized packet — creating a black hole. Describe testing with ping -M do -s <size> to find the threshold, checking for tunnel interfaces that reduce effective MTU, and using tcpdump to confirm retransmissions of large segments. The fix is either reducing the interface MTU, clamping TCP MSS at the tunnel endpoint, or (ideally) allowing ICMP Type 3 Code 4 through the firewall.
Wiki Navigation¶
Prerequisites¶
- Networking Deep Dive (Topic Pack, L1)
Related Content¶
- Case Study: Jumbo Frames Partial (Case Study, L2) — Linux Networking Tools, MTU
- Networking Deep Dive (Topic Pack, L1) — Linux Networking Tools, MTU
- Case Study: API Latency Spike — BGP Route Leak, Fix Is Network ACL (Case Study, L2) — Linux Networking Tools
- Case Study: ARP Flux Duplicate IP (Case Study, L2) — Linux Networking Tools
- Case Study: DHCP Relay Broken (Case Study, L1) — Linux Networking Tools
- Case Study: Duplex Mismatch Symptoms (Case Study, L1) — Linux Networking Tools
- Case Study: IPTables Blocking Unexpected (Case Study, L2) — Linux Networking Tools
- Case Study: MTU Blackhole TLS Stalls (Case Study, L2) — MTU
- Case Study: SSH Timeout — MTU Mismatch, Fix Is Terraform Variable (Case Study, L2) — MTU
- Case Study: Service Mesh 503s — Envoy Misconfigured, RBAC Policy (Case Study, L2) — Linux Networking Tools