Skip to content

Portal | Level: L1: Foundations | Topics: Ansible | Domain: DevOps & Tooling

Runbook: Ansible Playbook Failure

Symptoms

  • Playbook exits with non-zero return code
  • Tasks fail with UNREACHABLE, FAILED, or CHANGED (unexpected)
  • Ansible hangs on a task or host
  • Partial runs leave hosts in inconsistent state

Severity

Scope Severity
Single host, non-prod Low
Multiple hosts or prod Medium
Fleet-wide, customer-facing High

Triage (first 5 minutes)

  1. Read the error output — Ansible prints the failing task, host, and error message
  2. Check connectivity: ansible all -m ping -i inventory
  3. Check the failing task in isolation: ansible-playbook playbook.yml --start-at-task="<task name>" --limit=<failing_host>
  4. Increase verbosity: ansible-playbook playbook.yml -vvv
  5. Check for lock files: ps aux | grep ansible on the control node

Common Causes

SSH / Connectivity

Symptom Likely cause Fix
UNREACHABLE SSH key not loaded, wrong user, firewall ssh-add, check ansible_user, check port 22
Connection timeout Network issue, host down ping <host>, check security groups
Host key verification failed Known hosts mismatch ssh-keyscan <host> >> ~/.ssh/known_hosts

Authentication / Privilege

Symptom Likely cause Fix
Permission denied Wrong sudo password or missing become Add become: true, check ansible_become_pass
Missing sudo password NOPASSWD not set Configure sudoers or pass --ask-become-pass
Vault decrypt failure Wrong vault password Check --vault-password-file or --ask-vault-pass

Task Failures

Symptom Likely cause Fix
Package install fails Stale cache, missing repo apt update / dnf makecache before install task
Template render error Missing variable Check defaults/main.yml, use {{ var | default('') }}
Service won't start Config syntax error ansible-playbook --check --diff to preview
Idempotency violation Task not idempotent Use creates:, when:, or module-native idempotency

Inventory Issues

Symptom Likely cause Fix
No hosts matched Wrong group name or pattern ansible-inventory --list, check group names
Wrong hosts targeted Inventory file mismatch Verify -i <inventory> points to correct file
Dynamic inventory empty Cloud API auth failure Check credentials, test ansible-inventory --list

Investigation Commands

# Connectivity check
ansible all -m ping -i inventory

# List effective inventory
ansible-inventory -i inventory --list

# Dry run with diff
ansible-playbook playbook.yml --check --diff

# Run single task on single host
ansible-playbook playbook.yml --start-at-task="Install packages" --limit=web01

# Debug a variable
ansible -m debug -a "var=hostvars[inventory_hostname]" web01

# Syntax check only
ansible-playbook playbook.yml --syntax-check

# Show execution with timing
ANSIBLE_CALLBACKS_ENABLED=timer ansible-playbook playbook.yml

Rollback

  • If the playbook has a matching rollback playbook, run it
  • If using serial: and the failure was mid-batch, remaining hosts are untouched
  • For config file changes: restore from backup (backup: yes in template/copy modules)
  • For package changes: ansible <hosts> -m yum -a "name=<pkg> state=absent" or pin version

Prevention

  • Always use --check --diff before applying to prod
  • Use serial: to limit blast radius on fleet operations
  • Gate on max_fail_percentage to abort early
  • Test playbooks with Molecule or a staging inventory first
  • Pin package versions in production playbooks
  • Use ansible-lint in CI

Wiki Navigation