Symptoms: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable¶
Domains: linux_ops | networking | cloud Level: L2 Estimated time: 30-45 min
Initial Alert¶
Engineer reports via Slack at 15:30 UTC:
Can't SSH into the new batch of EC2 instances in us-east-1.
SSH hangs after "Authenticated" — never gets a shell prompt.
The old instances in the same VPC work fine.
Observable Symptoms¶
ssh -v ec2-user@10.0.12.45showsAuthenticated to 10.0.12.45then hangs indefinitely.ssh -v ec2-user@10.0.4.10(old instance in same VPC) connects instantly.- ICMP ping to the new instances works:
64 bytes from 10.0.12.45: icmp_seq=1 ttl=64 time=0.8 ms. - AWS console shows the instances as
runningwith status checks passed. - Security groups allow SSH (port 22) inbound from the VPN CIDR.
- The new instances are in a new subnet
10.0.12.0/24created last week. Old instances are in10.0.4.0/24. curl http://10.0.12.45:8080/healthalso hangs (does not time out, just hangs).- Small HTTP requests work:
curl http://10.0.12.45:8080/pingreturnsponginstantly.
The Misleading Signal¶
SSH hanging after authentication looks like an SSH server configuration issue — maybe a PAM module, a slow DNS lookup in sshd_config (UseDNS yes), a broken /etc/profile script, or a systemd unit dependency issue. The fact that the instances are new makes it look like an AMI or user-data problem. The focus goes to Linux-level SSH troubleshooting.