runbook
Portal | Level: L1: Foundations | Topics: Ansible | Domain: DevOps & Tooling
Runbook: Ansible Playbook Failure
Symptoms
Playbook exits with non-zero return code
Tasks fail with UNREACHABLE, FAILED, or CHANGED (unexpected)
Ansible hangs on a task or host
Partial runs leave hosts in inconsistent state
Severity
Scope
Severity
Single host, non-prod
Low
Multiple hosts or prod
Medium
Fleet-wide, customer-facing
High
Triage (first 5 minutes)
Read the error output — Ansible prints the failing task, host, and error message
Check connectivity : ansible all -m ping -i inventory
Check the failing task in isolation : ansible-playbook playbook.yml --start-at-task="<task name>" --limit=<failing_host>
Increase verbosity : ansible-playbook playbook.yml -vvv
Check for lock files : ps aux | grep ansible on the control node
Common Causes
SSH / Connectivity
Symptom
Likely cause
Fix
UNREACHABLE
SSH key not loaded, wrong user, firewall
ssh-add, check ansible_user, check port 22
Connection timeout
Network issue, host down
ping <host>, check security groups
Host key verification failed
Known hosts mismatch
ssh-keyscan <host> >> ~/.ssh/known_hosts
Authentication / Privilege
Symptom
Likely cause
Fix
Permission denied
Wrong sudo password or missing become
Add become: true, check ansible_become_pass
Missing sudo password
NOPASSWD not set
Configure sudoers or pass --ask-become-pass
Vault decrypt failure
Wrong vault password
Check --vault-password-file or --ask-vault-pass
Task Failures
Symptom
Likely cause
Fix
Package install fails
Stale cache, missing repo
apt update / dnf makecache before install task
Template render error
Missing variable
Check defaults/main.yml, use {{ var | default('') }}
Service won't start
Config syntax error
ansible-playbook --check --diff to preview
Idempotency violation
Task not idempotent
Use creates:, when:, or module-native idempotency
Inventory Issues
Symptom
Likely cause
Fix
No hosts matched
Wrong group name or pattern
ansible-inventory --list, check group names
Wrong hosts targeted
Inventory file mismatch
Verify -i <inventory> points to correct file
Dynamic inventory empty
Cloud API auth failure
Check credentials, test ansible-inventory --list
Investigation Commands
# Connectivity check
ansible all -m ping -i inventory
# List effective inventory
ansible-inventory -i inventory --list
# Dry run with diff
ansible-playbook playbook.yml --check --diff
# Run single task on single host
ansible-playbook playbook.yml --start-at-task= "Install packages" --limit= web01
# Debug a variable
ansible -m debug -a "var=hostvars[inventory_hostname]" web01
# Syntax check only
ansible-playbook playbook.yml --syntax-check
# Show execution with timing
ANSIBLE_CALLBACKS_ENABLED = timer ansible-playbook playbook.yml
Rollback
If the playbook has a matching rollback playbook, run it
If using serial: and the failure was mid-batch, remaining hosts are untouched
For config file changes: restore from backup (backup: yes in template/copy modules)
For package changes: ansible <hosts> -m yum -a "name=<pkg> state=absent" or pin version
Prevention
Always use --check --diff before applying to prod
Use serial: to limit blast radius on fleet operations
Gate on max_fail_percentage to abort early
Test playbooks with Molecule or a staging inventory first
Pin package versions in production playbooks
Use ansible-lint in CI
Pages that link here
Wiki Navigation
Related Content
Ansible Automation (Topic Pack, L1) — Ansible
Ansible Core Flashcards (CLI) (flashcard_deck, L1) — Ansible
Ansible Deep Dive (Topic Pack, L2) — Ansible
Ansible Drills (Drill, L1) — Ansible
Ansible Exercises (Quest Ladder) (CLI) (Exercise Set, L1) — Ansible
Ansible Lab: Conditionals and Loops (Lab, L1) — Ansible
Ansible Lab: Facts and Variables (Lab, L0) — Ansible
Ansible Lab: Install Nginx (Idempotency) (Lab, L1) — Ansible
Ansible Lab: Ping and Debug (Lab, L0) — Ansible
Ansible Lab: Roles (Lab, L1) — Ansible
April 3, 2026 16:54:59
March 25, 2026 18:50:27