Skip to content

Ansible Playbook Debugging

  • lesson
  • ansible-variable-precedence
  • check-mode
  • diff-mode
  • verbosity
  • handlers
  • facts
  • l2 ---# Ansible Playbook Debugging: Why Did It Do That?

Topics: Ansible variable precedence, check mode, diff mode, verbosity, handlers, facts Level: L2 (Operations) Time: 45–60 minutes Prerequisites: Basic Ansible usage (have run a playbook before)


The Mission

Your Ansible playbook ran. Something changed that shouldn't have. Or something didn't change that should have. The output says "changed" but you can't tell what actually changed. Or worse: it says "ok" but the server isn't configured correctly.

Ansible's declarative model is powerful but opaque. When things go wrong, you need to look inside the execution to understand what happened.


The Verbosity Ladder

# Normal: show task names and status
ansible-playbook playbook.yml

# -v: show task results (return values)
ansible-playbook playbook.yml -v

# -vv: show task input parameters
ansible-playbook playbook.yml -vv

# -vvv: show connection debugging (SSH commands)
ansible-playbook playbook.yml -vvv

# -vvvv: show connection plugin debugging (full SSH debug)
ansible-playbook playbook.yml -vvvv

For most debugging: -vv (see what values were used). For connection issues: -vvv (see SSH commands). -vvvv is for SSH key/auth problems.


Check Mode and Diff Mode: Preview Changes

# Check mode: show what WOULD change without changing anything
ansible-playbook playbook.yml --check

# Diff mode: show the actual file content differences
ansible-playbook playbook.yml --check --diff

# Diff output example:
# TASK [Copy nginx config]
# --- before: /etc/nginx/nginx.conf
# +++ after: /tmp/ansible-generated
# @@ -10,3 +10,3 @@
# -    worker_connections 768;
# +    worker_connections 1024;
# changed: [webserver]

Gotcha: Check mode doesn't work with all modules. Modules that make API calls (cloud modules, custom modules) may not support --check. The module documentation tells you if check mode is supported.


The Variable Precedence Nightmare

Ansible has 22 levels of variable precedence. Yes, twenty-two. When the same variable is defined in multiple places, the highest-precedence one wins. Common confusion points:

(lowest)
 1. role defaults (roles/x/defaults/main.yml)
 ...
 14. play vars (vars: in the play)
 15. play vars_prompt
 16. play vars_files
 17. role vars (roles/x/vars/main.yml)      ← higher than play vars!
 18. block vars
 19. task vars
 20. include_vars
 21. set_facts / register
 22. extra vars (-e on command line)         ← ALWAYS wins
(highest)

The most common trap: role vars/ beats play vars:.

# roles/nginx/vars/main.yml
nginx_worker_connections: 768     # ← This wins (precedence 17)

# playbook.yml
- hosts: webservers
  vars:
    nginx_worker_connections: 1024  # ← This loses (precedence 14)
  roles:
    - nginx
# Result: 768. Not 1024. Surprise.
# Debug: see ALL variables for a host (including where they come from)
ansible -m debug -a "var=hostvars[inventory_hostname]" webserver

# Or in a playbook:
- debug:
    var: nginx_worker_connections
    # Shows the value + where it came from

Remember: Extra vars (-e) ALWAYS win. If something isn't working, override it on the command line to confirm the value is the issue:

ansible-playbook playbook.yml -e "nginx_worker_connections=1024"


Common Ansible Mistakes

Handler not firing

tasks:
  - name: Update nginx config
    template:
      src: nginx.conf.j2
      dest: /etc/nginx/nginx.conf
    notify: restart nginx    # ← only fires if task reports "changed"

handlers:
  - name: restart nginx
    service:
      name: nginx
      state: restarted

If the config file is already identical to the template, the task reports "ok" (not "changed"), and the handler never fires. This is correct behavior — but if you renamed the handler and forgot to update the notify, it silently does nothing.

# Force handler execution:
ansible-playbook playbook.yml --force-handlers

Task reports "changed" every run

If a task reports "changed" on every run, it's not idempotent:

# BAD — always reports changed (shell doesn't know if it's idempotent)
- name: Set timezone
  shell: timedatectl set-timezone America/New_York

# GOOD — only changes if timezone is different
- name: Set timezone
  timezone:
    name: America/New_York

Use built-in modules instead of shell/command whenever possible. Built-in modules know how to check current state and only change what's needed.


Flashcard Check

Q1: Role vars/main.yml vs play vars: — which wins?

Role vars (precedence 17) beats play vars (precedence 14). This surprises everyone. Use role defaults/ (precedence 1) for values users should override.

Q2: --check --diff — what does it show?

What WOULD change without changing anything. Diff shows actual content differences in files. Essential for previewing playbook effects.

Q3: Handler not firing — most common cause?

Task reports "ok" instead of "changed" (config already matches). Or the handler name was changed but notify: wasn't updated.


Cheat Sheet

Task Command
Preview changes ansible-playbook play.yml --check --diff
Debug variables ansible -m debug -a "var=VAR" host
Override a variable ansible-playbook play.yml -e "var=value"
Force handlers ansible-playbook play.yml --force-handlers
Show facts ansible host -m setup
Verbose output -v (results) -vv (inputs) -vvv (SSH)
Step-by-step ansible-playbook play.yml --step
Start at task ansible-playbook play.yml --start-at-task "Task Name"
List tasks ansible-playbook play.yml --list-tasks
Syntax check ansible-playbook play.yml --syntax-check

Takeaways

  1. Variable precedence has 22 levels. Role vars/ beats play vars:. Use role defaults/ for overridable values. Extra vars (-e) always wins.

  2. --check --diff before every production run. See what would change before it changes. Non-negotiable for production playbooks.

  3. Handlers only fire on "changed." If the task reports "ok," the handler is skipped. This is correct but catches people when they rename handlers.

  4. Use modules, not shell. Modules are idempotent by design. shell and command always report "changed" unless you add creates: or changed_when:.


Exercises

  1. Inspect variable precedence. Create a minimal playbook with a role that defines my_var: "from role defaults" in defaults/main.yml and my_var: "from role vars" in vars/main.yml. Set my_var: "from play vars" in the play's vars: section. Add a debug task to print my_var. Run the playbook and confirm which value wins. Then run again with -e "my_var=from_cli" and confirm extra vars override everything.

  2. Preview changes with check and diff mode. Write a playbook that uses the copy module to write a known string to /tmp/ansible-test.txt. Run it once to create the file. Change the string in the playbook, then run with --check --diff. Verify the diff output shows the old and new content without actually modifying the file. Confirm by cat-ing the file.

  3. Debug a handler that doesn't fire. Write a playbook with a template task that notifies a handler named restart service. Intentionally misspell the handler name in the notify: directive (e.g., restart_service). Run the playbook and observe that no error is raised and the handler never fires. Fix the name, run again, and confirm the handler executes.

  4. Fix a non-idempotent task. Write a playbook that uses shell: echo "hello" >> /tmp/ansible-idem.txt. Run it three times and check the file — it grows each run. Replace the shell task with copy: content="hello\n" dest=/tmp/ansible-idem.txt. Run three times and confirm the task reports "changed" only once, then "ok" on subsequent runs.

  5. Step through a playbook. Create a playbook with 4-5 tasks. Run it with --step and practice answering y/n/c (yes, no, continue) to selectively execute tasks. Then use --start-at-task "Task Name" to skip to a specific task. Finally, use --list-tasks to see all task names without running anything.


  • Terraform vs Ansible vs Helm — when to use Ansible vs alternatives
  • PXE Boot: From Network to Running Server — Ansible as post-install configuration
  • Deploy a Web App From Nothing — Ansible at the server configuration layer