Symptoms: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule¶
Domains: devops_tooling | linux_ops | networking Level: L2 Estimated time: 30-45 min
Initial Alert¶
CI/CD notification at 16:30 UTC:
:warning: Ansible Playbook TIMEOUT — infrastructure-update
Playbook: devops/ansible/playbooks/rolling-update.yml
Status: Hung at task "Pull updated config from git repo" on app-server-03
Duration: 15 minutes (timeout: 10 minutes)
Previous 2 servers completed successfully
Observable Symptoms¶
- The Ansible playbook hangs on
app-server-03at a task that clones a Git repository from an internal GitLab server. - The same playbook ran successfully on
app-server-01andapp-server-02(the first two hosts in the inventory). - SSH from the Ansible control node to
app-server-03works fine (Ansible can connect and run simple tasks). - Tasks before the Git clone (package updates, config file copies) completed successfully on
app-server-03. - Manually running
git clone git@gitlab.internal:infra/configs.gitfromapp-server-03hangs indefinitely. - The same
git clonefromapp-server-01andapp-server-02works fine. app-server-03was freshly provisioned last week. The other servers have been running for months.
The Misleading Signal¶
An Ansible playbook hanging on a specific host looks like a host-level issue — maybe SSH is slow, a task has a dependency that is missing on this host, or there is a resource exhaustion problem. The fact that it is a newly provisioned server makes engineers suspect a provisioning issue — missing packages, wrong SSH keys, or an incomplete configuration. The focus goes to comparing app-server-03's configuration against the working servers.