Skip to content

Remediation: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule

Immediate Fix (Networking — Domain C)

The fix requires updating the security group to allow SSH to the tools subnet, and fixing the sudoers configuration.

Step 1: Update the security group

$ aws ec2 authorize-security-group-egress \
    --group-id sg-0abc123 \
    --protocol tcp \
    --port 22 \
    --cidr 10.0.5.0/24 \
    --description "Allow SSH to tools subnet (GitLab)"

Step 2: Verify connectivity

$ ssh deploy@app-server-03 "nc -zv gitlab.internal 22 -w 5"
Connection to gitlab.internal (10.0.5.100) 22 port [tcp/ssh] succeeded!

Step 3: Fix the sudoers configuration

# On app-server-03
$ sudo tee /etc/sudoers.d/ansible <<'EOF'
Defaults env_keep += "SSH_AUTH_SOCK"
EOF
$ sudo chmod 440 /etc/sudoers.d/ansible

Or via Ansible itself:

$ ansible app-server-03 -m copy -a "content='Defaults env_keep += \"SSH_AUTH_SOCK\"\n' dest=/etc/sudoers.d/ansible mode=0440" --become

Step 4: Re-run the playbook

$ ansible-playbook devops/ansible/playbooks/rolling-update.yml --limit app-server-03
PLAY RECAP ****
app-server-03    : ok=12   changed=5    unreachable=0    failed=0

Step 5: Update the provisioning template

# In the Terraform/CloudFormation template for new servers,
# ensure the security group includes the tools subnet egress rule
# and the user-data script includes the sudoers SSH_AUTH_SOCK preservation

Verification

Domain A (DevOps Tooling) — Playbook completes

$ ansible-playbook devops/ansible/playbooks/rolling-update.yml --check
PLAY RECAP ****
app-server-01    : ok=12   changed=0    unreachable=0    failed=0
app-server-02    : ok=12   changed=0    unreachable=0    failed=0
app-server-03    : ok=12   changed=0    unreachable=0    failed=0

Domain B (Linux Ops) — SSH agent forwarding works with sudo

$ ssh -A deploy@app-server-03 "sudo -u deploy ssh-add -l"
4096 SHA256:abc123... /home/ansible/.ssh/id_ed25519 (ED25519)

Domain C (Networking) — Firewall allows SSH to tools subnet

$ aws ec2 describe-security-groups --group-ids sg-0abc123 \
    --query 'SecurityGroups[].IpPermissions[?ToPort==`22`]' --output table
| FromPort | ToPort | IpProtocol | CidrIp         |
| 22       | 22     | tcp        | 10.0.1.0/24    |
| 22       | 22     | tcp        | 10.0.5.0/24    |

Prevention

  • Monitoring: Add connectivity checks as part of the Ansible playbook pre-flight. Test SSH to all required internal services before starting the main tasks.
- name: Pre-flight — verify connectivity to GitLab
  wait_for:
    host: gitlab.internal
    port: 22
    timeout: 10
  delegate_to: "{{ inventory_hostname }}"
  • Runbook: New server provisioning must include security group verification against a known-good template. Document all required egress rules (SSH to GitLab, HTTP to package repos, HTTPS to monitoring).

  • Architecture: Use a deploy key stored on each server (via Vault or Secrets Manager) instead of SSH agent forwarding. This eliminates the agent forwarding dependency and the sudoers issue. Alternatively, use HTTPS with a token for Git operations.