Remediation: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule¶
Immediate Fix (Networking — Domain C)¶
The fix requires updating the security group to allow SSH to the tools subnet, and fixing the sudoers configuration.
Step 1: Update the security group¶
$ aws ec2 authorize-security-group-egress \
--group-id sg-0abc123 \
--protocol tcp \
--port 22 \
--cidr 10.0.5.0/24 \
--description "Allow SSH to tools subnet (GitLab)"
Step 2: Verify connectivity¶
$ ssh deploy@app-server-03 "nc -zv gitlab.internal 22 -w 5"
Connection to gitlab.internal (10.0.5.100) 22 port [tcp/ssh] succeeded!
Step 3: Fix the sudoers configuration¶
# On app-server-03
$ sudo tee /etc/sudoers.d/ansible <<'EOF'
Defaults env_keep += "SSH_AUTH_SOCK"
EOF
$ sudo chmod 440 /etc/sudoers.d/ansible
Or via Ansible itself:
$ ansible app-server-03 -m copy -a "content='Defaults env_keep += \"SSH_AUTH_SOCK\"\n' dest=/etc/sudoers.d/ansible mode=0440" --become
Step 4: Re-run the playbook¶
$ ansible-playbook devops/ansible/playbooks/rolling-update.yml --limit app-server-03
PLAY RECAP ****
app-server-03 : ok=12 changed=5 unreachable=0 failed=0
Step 5: Update the provisioning template¶
# In the Terraform/CloudFormation template for new servers,
# ensure the security group includes the tools subnet egress rule
# and the user-data script includes the sudoers SSH_AUTH_SOCK preservation
Verification¶
Domain A (DevOps Tooling) — Playbook completes¶
$ ansible-playbook devops/ansible/playbooks/rolling-update.yml --check
PLAY RECAP ****
app-server-01 : ok=12 changed=0 unreachable=0 failed=0
app-server-02 : ok=12 changed=0 unreachable=0 failed=0
app-server-03 : ok=12 changed=0 unreachable=0 failed=0
Domain B (Linux Ops) — SSH agent forwarding works with sudo¶
$ ssh -A deploy@app-server-03 "sudo -u deploy ssh-add -l"
4096 SHA256:abc123... /home/ansible/.ssh/id_ed25519 (ED25519)
Domain C (Networking) — Firewall allows SSH to tools subnet¶
$ aws ec2 describe-security-groups --group-ids sg-0abc123 \
--query 'SecurityGroups[].IpPermissions[?ToPort==`22`]' --output table
| FromPort | ToPort | IpProtocol | CidrIp |
| 22 | 22 | tcp | 10.0.1.0/24 |
| 22 | 22 | tcp | 10.0.5.0/24 |
Prevention¶
- Monitoring: Add connectivity checks as part of the Ansible playbook pre-flight. Test SSH to all required internal services before starting the main tasks.
- name: Pre-flight — verify connectivity to GitLab
wait_for:
host: gitlab.internal
port: 22
timeout: 10
delegate_to: "{{ inventory_hostname }}"
-
Runbook: New server provisioning must include security group verification against a known-good template. Document all required egress rules (SSH to GitLab, HTTP to package repos, HTTPS to monitoring).
-
Architecture: Use a deploy key stored on each server (via Vault or Secrets Manager) instead of SSH agent forwarding. This eliminates the agent forwarding dependency and the sudoers issue. Alternatively, use HTTPS with a token for Git operations.