- networking
- l2
- runbook
- mtu
- networking-troubleshooting --- Portal | Level: L2: Operations | Topics: MTU, Networking Troubleshooting | Domain: Networking
Runbook: MTU Mismatch¶
| Field | Value |
|---|---|
| Domain | Networking |
| Alert | Large packet drops, TCP connections working but large file transfers failing, ICMP unreachable fragmentation-needed messages in logs |
| Severity | P2 |
| Est. Resolution Time | 30-60 minutes |
| Escalation Timeout | 45 minutes — page if not resolved |
| Last Tested | 2026-03-19 |
| Prerequisites | SSH access to cluster nodes, ability to run ping/tracepath on nodes, kubectl access |
Quick Assessment (30 seconds)¶
# Run this first — it tells you the scope of the problem
ip link show && ping -M do -s 1400 <TARGET_IP>
ping -M do -s 1400 succeeds but ping -M do -s 1450 fails → MTU is between 1428 and 1472 bytes somewhere in the path, continue from Step 2
If output shows: All pings fail regardless of size → This is a routing issue, not MTU — see Network Partition
Step 1: Identify the Symptom Pattern¶
Why: MTU mismatches produce a very specific pattern — small requests succeed, large transfers silently stall or fail. Confirming this pattern before chasing MTU saves time.
# Test that small requests work
curl -v --max-time 10 http://<SERVICE_IP>:<PORT>/
# Test that a large download stalls or fails
curl -v --max-time 30 http://<SERVICE_IP>:<PORT>/large-file -o /dev/null
# In Kubernetes: test from a pod hitting a service that does large responses
kubectl exec -it <POD_NAME> -n <NAMESPACE> -- \
curl -v --max-time 30 http://<SERVICE_NAME>.<NAMESPACE>.svc.cluster.local/large-endpoint
Step 2: Find the MTU Break Point with Ping¶
Why: PMTUD (Path MTU Discovery) relies on ICMP "fragmentation needed" messages. Testing with different packet sizes identifies the exact MTU ceiling across the path.
# Run from a node (not a pod) to find the physical MTU limit
# Start high and work down until ping succeeds
# -M do = don't fragment, -s = payload size (actual IP packet size = -s + 28)
ping -M do -s 1472 <REMOTE_NODE_IP> # Tests 1500 byte IP packet (standard Ethernet)
ping -M do -s 1422 <REMOTE_NODE_IP> # Tests 1450 byte IP packet
ping -M do -s 1372 <REMOTE_NODE_IP> # Tests 1400 byte IP packet
# Run from inside a pod to find the overlay (CNI tunnel) MTU limit
kubectl exec -it <POD_NAME> -n <NAMESPACE> -- \
ping -M do -s 1372 <REMOTE_POD_IP>
kubectl exec -it <POD_NAME> -n <NAMESPACE> -- \
ping -M do -s 1422 <REMOTE_POD_IP>
# The largest size that succeeds tells you the effective MTU
PING 10.0.1.5: 1400 data bytes
1428 bytes from 10.0.1.5: icmp_seq=1 ttl=64 time=0.5 ms # Success
PING 10.0.1.5: 1450 data bytes
--- 10.0.1.5 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss # Fail
ICMP Message too long: mtu=<N>, that N is your effective MTU ceiling. Use it in Step 4.
Step 3: Check Interface MTU on Nodes¶
Why: The node's physical NIC MTU sets the ceiling for everything above it. The CNI overlay must be lower, not equal to or higher.
# SSH to affected nodes and check all interface MTUs
ssh <NODE_USERNAME>@<NODE_IP>
# Show all interfaces with MTU
ip link show
# Or just the primary and tunnel interfaces
ip link show eth0
ip link show flannel.1 # Flannel VXLAN
ip link show cilium_vxlan # Cilium VXLAN
ip link show tunl0 # Calico IPIP
ip link show vxlan.calico # Calico VXLAN
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP
...
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue
Step 4: Check CNI Overlay MTU (Must Be Lower than Node MTU)¶
Why: VXLAN adds 50 bytes of overhead; Geneve adds 60 bytes; IPIP adds 20 bytes. The CNI overlay MTU must be node_MTU - overhead or packets will be silently dropped.
# Check Flannel CNI config
cat /run/flannel/subnet.env
# Check Calico VXLAN MTU config
kubectl get configmap calico-config -n kube-system -o yaml | grep -i mtu
# Check Cilium MTU
kubectl exec -n kube-system <CILIUM_POD> -- cilium config | grep mtu
# Check the actual MTU in the CNI config file
cat /etc/cni/net.d/<CNI_CONFIG_FILE>
# For a node with 1500 MTU using VXLAN (50 byte overhead):
# CNI MTU should be 1450 or lower
mtu: 1450
VXLAN/Geneve: node_MTU - 50 = pod_MTU
IPIP: node_MTU - 20 = pod_MTU
WireGuard: node_MTU - 60 = pod_MTU
Step 5: Update CNI Configuration MTU¶
Why: The fix is to set the correct MTU in the CNI configuration so that pods never generate packets too large for the overlay.
# For Flannel — edit the net-conf.json ConfigMap
kubectl edit configmap kube-flannel-cfg -n kube-flannel
# Find the "Backend" section and set "MTU": <CORRECT_MTU>
# For Calico — patch the calico-config ConfigMap
kubectl patch configmap calico-config -n kube-system \
--type merge \
-p '{"data":{"veth_mtu":"<CORRECT_MTU>"}}'
# For Cilium — update via cilium-config or Helm values
kubectl patch configmap cilium-config -n kube-system \
--type merge \
-p '{"data":{"tunnel-mtu":"<CORRECT_MTU>"}}'
helm upgrade. Do not edit the ConfigMap directly on a Helm-managed release or it will be overwritten on the next upgrade.
Step 6: Restart CNI Pods to Apply New MTU¶
Why: CNI pods read MTU configuration at startup. A rolling restart is needed to apply the new value; it also re-creates tunnel interfaces with the correct MTU.
# Restart CNI DaemonSet pods one at a time to avoid network downtime
# Flannel
kubectl rollout restart daemonset kube-flannel-ds -n kube-flannel
# Calico
kubectl rollout restart daemonset calico-node -n calico-system
# Cilium
kubectl rollout restart daemonset cilium -n kube-system
# Watch rollout progress
kubectl rollout status daemonset <CNI_DAEMONSET_NAME> -n <CNI_NAMESPACE>
# Verify MTU on tunnel interface after restart
ssh <NODE_USERNAME>@<NODE_IP> ip link show <TUNNEL_INTERFACE>
kubectl logs -n <CNI_NAMESPACE> <CNI_POD> --previous.
Verification¶
# Confirm the issue is resolved — test large packets between pods
kubectl exec -it <POD_NAME> -n <NAMESPACE> -- \
ping -M do -s 1400 <REMOTE_POD_IP> -c 5
# Confirm large downloads work
kubectl exec -it <POD_NAME> -n <NAMESPACE> -- \
curl -o /dev/null --max-time 30 http://<SERVICE_NAME>/large-endpoint
Escalation¶
| Condition | Who to Page | What to Say |
|---|---|---|
| Not resolved in 45 min | Platform/Network on-call | "MTU mismatch confirmed in cluster |
| Data loss suspected | SRE lead | "MTU mismatch may have caused silent data truncation for services receiving large payloads" |
| Scope expanding to cloud network | Infrastructure team | "MTU issue may be at cloud provider level (jumbo frames not enabled), requires VPC/NIC configuration change" |
Post-Incident¶
- Update monitoring if alert was noisy or missing
- File postmortem if P1/P2
- Update this runbook if steps were wrong or incomplete
Common Mistakes¶
- Not accounting for encapsulation overhead: The single most common mistake is setting CNI MTU equal to node MTU. VXLAN adds 50 bytes; IPIP adds 20 bytes. The overlay MTU must always be lower than the physical MTU by at least the encapsulation overhead.
- Changing MTU without a rolling restart plan: Updating the ConfigMap has no effect until the CNI pods restart and re-create tunnel interfaces. Plan the rollout to avoid taking down pod networking on multiple nodes simultaneously.
Cross-References¶
- Topic Pack: Kubernetes Networking and CNI (deep background)
- Related Runbook: Network Partition
Wiki Navigation¶
Related Content¶
- Case Study: Jumbo Frames Partial (Case Study, L2) — MTU
- Case Study: MTU Blackhole TLS Stalls (Case Study, L2) — MTU
- Case Study: SSH Timeout — MTU Mismatch, Fix Is Terraform Variable (Case Study, L2) — MTU
- MTU (Topic Pack, L1) — MTU
- MTU Flashcards (CLI) (flashcard_deck, L1) — MTU
- Networking Deep Dive (Topic Pack, L1) — MTU
- Networking Troubleshooting (Topic Pack, L1) — Networking Troubleshooting
- Runbook: DNS Resolution Failure (Runbook, L1) — Networking Troubleshooting
- Runbook: Load Balancer Health Check Failure (Runbook, L2) — Networking Troubleshooting
- Runbook: Network Partition (Split Brain / Partial Connectivity) (Runbook, L2) — Networking Troubleshooting
Pages that link here¶
- Jumbo Frames Enabled But Some Paths Failing
- MTU (Maximum Transmission Unit) - Primer
- MTU Black Hole / TLS Stalls
- Mtu
- Networking Troubleshooting
- Operational Runbooks
- Runbook: DNS Resolution Failure
- Runbook: Load Balancer Health Check Failure
- Runbook: Network Partition (Split Brain / Partial Connectivity)
- Scenario: MTU Black Hole — Large Packets Silently Dropped