Skip to content

Symptoms: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable

Domains: linux_ops | networking | cloud Level: L2 Estimated time: 30-45 min

Initial Alert

Engineer reports via Slack at 15:30 UTC:

Can't SSH into the new batch of EC2 instances in us-east-1.
SSH hangs after "Authenticated" — never gets a shell prompt.
The old instances in the same VPC work fine.

Observable Symptoms

  • ssh -v ec2-user@10.0.12.45 shows Authenticated to 10.0.12.45 then hangs indefinitely.
  • ssh -v ec2-user@10.0.4.10 (old instance in same VPC) connects instantly.
  • ICMP ping to the new instances works: 64 bytes from 10.0.12.45: icmp_seq=1 ttl=64 time=0.8 ms.
  • AWS console shows the instances as running with status checks passed.
  • Security groups allow SSH (port 22) inbound from the VPN CIDR.
  • The new instances are in a new subnet 10.0.12.0/24 created last week. Old instances are in 10.0.4.0/24.
  • curl http://10.0.12.45:8080/health also hangs (does not time out, just hangs).
  • Small HTTP requests work: curl http://10.0.12.45:8080/ping returns pong instantly.

The Misleading Signal

SSH hanging after authentication looks like an SSH server configuration issue — maybe a PAM module, a slow DNS lookup in sshd_config (UseDNS yes), a broken /etc/profile script, or a systemd unit dependency issue. The fact that the instances are new makes it look like an AMI or user-data problem. The focus goes to Linux-level SSH troubleshooting.