networking
l2
runbook
mtu
networking-troubleshooting --- Portal | Level: L2: Operations | Topics: MTU, Networking Troubleshooting | Domain: Networking

Runbook: MTU Mismatch¶

Field	Value
Domain	Networking
Alert	Large packet drops, TCP connections working but large file transfers failing, ICMP unreachable fragmentation-needed messages in logs
Severity	P2
Est. Resolution Time	30-60 minutes
Escalation Timeout	45 minutes — page if not resolved
Last Tested	2026-03-19
Prerequisites	SSH access to cluster nodes, ability to run ping/tracepath on nodes, kubectl access

Quick Assessment (30 seconds)¶

# Run this first — it tells you the scope of the problem
ip link show && ping -M do -s 1400 <TARGET_IP>

If output shows: ping -M do -s 1400 succeeds but ping -M do -s 1450 fails → MTU is between 1428 and 1472 bytes somewhere in the path, continue from Step 2 If output shows: All pings fail regardless of size → This is a routing issue, not MTU — see Network Partition

Step 1: Identify the Symptom Pattern¶

Why: MTU mismatches produce a very specific pattern — small requests succeed, large transfers silently stall or fail. Confirming this pattern before chasing MTU saves time.

# Test that small requests work
curl -v --max-time 10 http://<SERVICE_IP>:<PORT>/

# Test that a large download stalls or fails
curl -v --max-time 30 http://<SERVICE_IP>:<PORT>/large-file -o /dev/null

# In Kubernetes: test from a pod hitting a service that does large responses
kubectl exec -it <POD_NAME> -n <NAMESPACE> -- \
  curl -v --max-time 30 http://<SERVICE_NAME>.<NAMESPACE>.svc.cluster.local/large-endpoint

Expected output (confirming MTU issue):

# Small request: completes in < 1s
# Large request: starts, transfers some bytes, then hangs

If this fails: If both small and large fail completely, it is not an MTU issue. Rule out firewall, DNS, or routing problems first.

Step 2: Find the MTU Break Point with Ping¶

Why: PMTUD (Path MTU Discovery) relies on ICMP "fragmentation needed" messages. Testing with different packet sizes identifies the exact MTU ceiling across the path.

# Run from a node (not a pod) to find the physical MTU limit
# Start high and work down until ping succeeds
# -M do = don't fragment, -s = payload size (actual IP packet size = -s + 28)
ping -M do -s 1472 <REMOTE_NODE_IP>   # Tests 1500 byte IP packet (standard Ethernet)
ping -M do -s 1422 <REMOTE_NODE_IP>   # Tests 1450 byte IP packet
ping -M do -s 1372 <REMOTE_NODE_IP>   # Tests 1400 byte IP packet

# Run from inside a pod to find the overlay (CNI tunnel) MTU limit
kubectl exec -it <POD_NAME> -n <NAMESPACE> -- \
  ping -M do -s 1372 <REMOTE_POD_IP>
kubectl exec -it <POD_NAME> -n <NAMESPACE> -- \
  ping -M do -s 1422 <REMOTE_POD_IP>

Expected output:

# The largest size that succeeds tells you the effective MTU
PING 10.0.1.5: 1400 data bytes
1428 bytes from 10.0.1.5: icmp_seq=1 ttl=64 time=0.5 ms   # Success
PING 10.0.1.5: 1450 data bytes
--- 10.0.1.5 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss         # Fail

If this fails: If all sizes fail with ICMP Message too long: mtu=<N>, that N is your effective MTU ceiling. Use it in Step 4.

Step 3: Check Interface MTU on Nodes¶

Why: The node's physical NIC MTU sets the ceiling for everything above it. The CNI overlay must be lower, not equal to or higher.

# SSH to affected nodes and check all interface MTUs
ssh <NODE_USERNAME>@<NODE_IP>

# Show all interfaces with MTU
ip link show

# Or just the primary and tunnel interfaces
ip link show eth0
ip link show flannel.1   # Flannel VXLAN
ip link show cilium_vxlan  # Cilium VXLAN
ip link show tunl0       # Calico IPIP
ip link show vxlan.calico  # Calico VXLAN

Expected output:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP
   ...
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue

If this fails: If the tunnel interface MTU equals the physical MTU (e.g., both at 1500), the encapsulation overhead is not being accounted for — this is the root cause. Proceed to Step 4.

Step 4: Check CNI Overlay MTU (Must Be Lower than Node MTU)¶

Why: VXLAN adds 50 bytes of overhead; Geneve adds 60 bytes; IPIP adds 20 bytes. The CNI overlay MTU must be node_MTU - overhead or packets will be silently dropped.

# Check Flannel CNI config
cat /run/flannel/subnet.env

# Check Calico VXLAN MTU config
kubectl get configmap calico-config -n kube-system -o yaml | grep -i mtu

# Check Cilium MTU
kubectl exec -n kube-system <CILIUM_POD> -- cilium config | grep mtu

# Check the actual MTU in the CNI config file
cat /etc/cni/net.d/<CNI_CONFIG_FILE>

Expected output:

# For a node with 1500 MTU using VXLAN (50 byte overhead):
# CNI MTU should be 1450 or lower
mtu: 1450

If this fails: If CNI MTU matches node MTU, encapsulation is causing packet drops. Calculate correct value:

VXLAN/Geneve:  node_MTU - 50 = pod_MTU
IPIP:          node_MTU - 20 = pod_MTU
WireGuard:     node_MTU - 60 = pod_MTU

Step 5: Update CNI Configuration MTU¶

Why: The fix is to set the correct MTU in the CNI configuration so that pods never generate packets too large for the overlay.

# For Flannel — edit the net-conf.json ConfigMap
kubectl edit configmap kube-flannel-cfg -n kube-flannel
# Find the "Backend" section and set "MTU": <CORRECT_MTU>

# For Calico — patch the calico-config ConfigMap
kubectl patch configmap calico-config -n kube-system \
  --type merge \
  -p '{"data":{"veth_mtu":"<CORRECT_MTU>"}}'

# For Cilium — update via cilium-config or Helm values
kubectl patch configmap cilium-config -n kube-system \
  --type merge \
  -p '{"data":{"tunnel-mtu":"<CORRECT_MTU>"}}'

Expected output:

configmap/kube-flannel-cfg edited

If this fails: If the CNI uses a Helm release for configuration, update the values file and re-run helm upgrade. Do not edit the ConfigMap directly on a Helm-managed release or it will be overwritten on the next upgrade.

Step 6: Restart CNI Pods to Apply New MTU¶

Why: CNI pods read MTU configuration at startup. A rolling restart is needed to apply the new value; it also re-creates tunnel interfaces with the correct MTU.

# Restart CNI DaemonSet pods one at a time to avoid network downtime
# Flannel
kubectl rollout restart daemonset kube-flannel-ds -n kube-flannel

# Calico
kubectl rollout restart daemonset calico-node -n calico-system

# Cilium
kubectl rollout restart daemonset cilium -n kube-system

# Watch rollout progress
kubectl rollout status daemonset <CNI_DAEMONSET_NAME> -n <CNI_NAMESPACE>

# Verify MTU on tunnel interface after restart
ssh <NODE_USERNAME>@<NODE_IP> ip link show <TUNNEL_INTERFACE>

Expected output:

daemonset.apps/calico-node successfully rolled out

If this fails: If rollout gets stuck (pods won't start), the new MTU value may be invalid. Check CNI logs: kubectl logs -n <CNI_NAMESPACE> <CNI_POD> --previous.

Verification¶

# Confirm the issue is resolved — test large packets between pods
kubectl exec -it <POD_NAME> -n <NAMESPACE> -- \
  ping -M do -s 1400 <REMOTE_POD_IP> -c 5

# Confirm large downloads work
kubectl exec -it <POD_NAME> -n <NAMESPACE> -- \
  curl -o /dev/null --max-time 30 http://<SERVICE_NAME>/large-endpoint

Success looks like: Large ping succeeds with 0% packet loss. Large download completes without hanging. If still broken: Escalate — see below.

Escalation¶

Condition	Who to Page	What to Say
Not resolved in 45 min	Platform/Network on-call	"MTU mismatch confirmed in cluster , large transfers failing cluster-wide, CNI MTU fix not resolving"
Data loss suspected	SRE lead	"MTU mismatch may have caused silent data truncation for services receiving large payloads"
Scope expanding to cloud network	Infrastructure team	"MTU issue may be at cloud provider level (jumbo frames not enabled), requires VPC/NIC configuration change"

Post-Incident¶

Update monitoring if alert was noisy or missing
File postmortem if P1/P2
Update this runbook if steps were wrong or incomplete

Common Mistakes¶

Not accounting for encapsulation overhead: The single most common mistake is setting CNI MTU equal to node MTU. VXLAN adds 50 bytes; IPIP adds 20 bytes. The overlay MTU must always be lower than the physical MTU by at least the encapsulation overhead.
Changing MTU without a rolling restart plan: Updating the ConfigMap has no effect until the CNI pods restart and re-create tunnel interfaces. Plan the rollout to avoid taking down pod networking on multiple nodes simultaneously.

Cross-References¶

Topic Pack: Kubernetes Networking and CNI (deep background)
Related Runbook: Network Partition

Case Study: Jumbo Frames Partial (Case Study, L2) — MTU
Case Study: MTU Blackhole TLS Stalls (Case Study, L2) — MTU
Case Study: SSH Timeout — MTU Mismatch, Fix Is Terraform Variable (Case Study, L2) — MTU
MTU (Topic Pack, L1) — MTU
MTU Flashcards (CLI) (flashcard_deck, L1) — MTU
Networking Deep Dive (Topic Pack, L1) — MTU
Networking Troubleshooting (Topic Pack, L1) — Networking Troubleshooting
Runbook: DNS Resolution Failure (Runbook, L1) — Networking Troubleshooting
Runbook: Load Balancer Health Check Failure (Runbook, L2) — Networking Troubleshooting
Runbook: Network Partition (Split Brain / Partial Connectivity) (Runbook, L2) — Networking Troubleshooting

Runbook: MTU Mismatch¶

Quick Assessment (30 seconds)¶

Step 1: Identify the Symptom Pattern¶

Step 2: Find the MTU Break Point with Ping¶

Step 3: Check Interface MTU on Nodes¶

Step 4: Check CNI Overlay MTU (Must Be Lower than Node MTU)¶

Step 5: Update CNI Configuration MTU¶

Step 6: Restart CNI Pods to Apply New MTU¶

Verification¶

Escalation¶

Post-Incident¶

Common Mistakes¶

Cross-References¶

Wiki Navigation¶

Pages that link here¶

Runbook: MTU Mismatch¶

Quick Assessment (30 seconds)¶

Step 1: Identify the Symptom Pattern¶

Step 2: Find the MTU Break Point with Ping¶

Step 3: Check Interface MTU on Nodes¶

Step 4: Check CNI Overlay MTU (Must Be Lower than Node MTU)¶

Step 5: Update CNI Configuration MTU¶

Step 6: Restart CNI Pods to Apply New MTU¶

Verification¶

Escalation¶

Post-Incident¶

Common Mistakes¶

Cross-References¶

Wiki Navigation¶

Related Content¶

Pages that link here¶