Portal | Level: L2: Operations | Topics: Ansible Deep Dive, Ansible | Domain: DevOps & Tooling
Ansible Deep Dive - Primer¶
Beyond the basics. This covers the internals, advanced patterns, and production-grade techniques that separate playbook authors from Ansible operators.
Inventory Deep Dive¶
Static Inventory Formats¶
INI format is simple, YAML gives you structure:
# inventory/hosts.yml
all:
children:
webservers:
hosts:
web1.example.com:
http_port: 8080
ansible_host: 10.0.1.10
web2.example.com:
http_port: 8081
vars:
app_env: production
deploy_user: deployer
dbservers:
hosts:
db1.example.com:
ansible_port: 2222
db2.example.com:
production:
children:
webservers:
dbservers:
Dynamic Inventory¶
For cloud infrastructure, static inventory doesn't scale. Dynamic inventory plugins query cloud APIs at runtime.
AWS EC2 inventory plugin:
# inventory/aws_ec2.yml
plugin: amazon.aws.aws_ec2
regions:
- us-east-1
- us-west-2
keyed_groups:
- key: tags.Environment
prefix: env
- key: instance_type
prefix: type
- key: placement.availability_zone
prefix: az
filters:
tag:ManagedBy: ansible
instance-state-name: running
compose:
ansible_host: private_ip_address
# Test dynamic inventory
ansible-inventory -i inventory/aws_ec2.yml --list
ansible-inventory -i inventory/aws_ec2.yml --graph
GCP inventory plugin:
# inventory/gcp.yml
plugin: google.cloud.gcp_compute
projects:
- myproject-123456
zones:
- us-central1-a
- us-central1-b
filters:
- labels.managed_by = ansible
keyed_groups:
- key: labels.environment
prefix: env
compose:
ansible_host: networkInterfaces[0].networkIP
host_vars and group_vars¶
Directory-based variable organization:
inventory/
hosts.yml
group_vars/
all.yml # Variables for all hosts
webservers.yml # Variables for webservers group
dbservers.yml # Variables for dbservers group
production.yml # Variables for production group
host_vars/
web1.example.com.yml # Variables for specific host
db1.example.com.yml
# inventory/group_vars/all.yml
ntp_servers:
- 0.pool.ntp.org
- 1.pool.ntp.org
timezone: UTC
monitoring_agent: prometheus-node-exporter
# inventory/group_vars/webservers.yml
nginx_worker_processes: auto
nginx_worker_connections: 2048
ssl_certificate_path: /etc/ssl/certs/app.pem
Variables in host_vars/ override group_vars/. This is part of the 22-level precedence hierarchy.
Variable Precedence¶
Remember: The only two precedence levels most people need to memorize: role defaults are the bottom (meant to be overridden), and extra vars (
-e) are the top (always win). Everything else falls in between. When debugging "why does this variable have this value," useansible -m debug -a "var=my_var"against a host to see the resolved value.
Ansible has 22 levels of variable precedence. The most important ones, from lowest to highest:
1. command line values (e.g., -u user)
2. role defaults (defaults/main.yml)
3. inventory file or script group vars
4. inventory group_vars/all
5. playbook group_vars/all
6. inventory group_vars/*
7. playbook group_vars/*
8. inventory file or script host vars
9. inventory host_vars/*
10. playbook host_vars/*
11. host facts / cached set_facts
12. play vars
13. play vars_prompt
14. play vars_files
15. role vars (vars/main.yml)
16. block vars (only for tasks in block)
17. task vars (only for the task)
18. include_vars
19. set_facts / registered vars
20. role (and include_role) params
21. include params
22. extra vars (-e "key=value") — ALWAYS WIN
The practical rules:
- Role defaults (
defaults/main.yml): Low precedence. Meant to be overridden. Put sensible defaults here. - Role vars (
vars/main.yml): High precedence. Hard to override. Use for values that should NOT change per environment. - Extra vars (
-e): Always win. Use for one-time overrides. Don't rely on them for regular operation. - group_vars/all: Middle ground. Good for organization-wide settings.
# Debug variable precedence: see what value a host gets
ansible -i inventory/ web1.example.com -m debug -a "var=http_port"
# Dump all variables for a host
ansible -i inventory/ web1.example.com -m setup
ansible -i inventory/ web1.example.com -m debug -a "var=hostvars[inventory_hostname]"
Playbook Anatomy¶
Plays, Tasks, and Handlers¶
---
# A playbook contains one or more plays
- name: Configure web servers
hosts: webservers
become: true
gather_facts: true
vars:
app_version: "2.4.1"
pre_tasks:
- name: Update apt cache
ansible.builtin.apt:
update_cache: true
cache_valid_time: 3600
roles:
- role: common
- role: nginx
vars:
nginx_port: 8080
- role: app_deploy
tags: [deploy]
tasks:
- name: Ensure application config
ansible.builtin.template:
src: app.conf.j2
dest: /etc/app/config.yml
owner: appuser
group: appuser
mode: "0640"
notify: Restart application
- name: Ensure application is running
ansible.builtin.systemd:
name: myapp
state: started
enabled: true
post_tasks:
- name: Verify application health
ansible.builtin.uri:
url: "http://localhost:8080/health"
return_content: true
register: health_check
failed_when: health_check.status != 200
handlers:
- name: Restart application
ansible.builtin.systemd:
name: myapp
state: restarted
Execution order: pre_tasks -> roles -> tasks -> post_tasks. Handlers run at the end of each section (or when flushed).
imports vs includes¶
This distinction matters for control flow:
# IMPORT: static, parsed at playbook load time
- name: Import common tasks
ansible.builtin.import_tasks: common.yml
# Conditionals apply to EACH task inside the file
when: ansible_os_family == "Debian"
# Tags apply to EACH task inside the file
tags: [setup]
# INCLUDE: dynamic, parsed at runtime
- name: Include OS-specific tasks
ansible.builtin.include_tasks: "{{ ansible_os_family | lower }}.yml"
# Conditionals apply to the INCLUDE itself (all or nothing)
when: setup_os_packages
# Tags apply to the INCLUDE itself, not inner tasks
tags: [setup]
Rule of thumb: Use import_tasks when the file is always the same. Use include_tasks when the filename is dynamic or you need conditional inclusion of the entire file.
Import quirks:
- import_tasks + when: the condition is applied to every task inside the file (can be confusing)
- import_tasks cannot use loops
- import_role makes the role's handlers available to the whole play
Include quirks:
- include_tasks + tags: inner tasks don't inherit the tag (use apply block to force it)
- include_tasks with loops: runs the entire file once per loop iteration
- Variables set inside included files may not be available outside
Jinja2 Templating¶
Filters¶
tasks:
- name: Demonstrate filters
ansible.builtin.debug:
msg: |
Default value: {{ missing_var | default('fallback') }}
Mandatory: {{ required_var | mandatory }}
To JSON: {{ my_dict | to_nice_json }}
To YAML: {{ my_dict | to_nice_yaml }}
Regex replace: {{ hostname | regex_replace('\.example\.com$', '') }}
IP address: {{ ansible_default_ipv4.address | ansible.utils.ipaddr('address') }}
Join list: {{ my_list | join(', ') }}
Unique: {{ my_list | unique }}
Flatten: {{ nested_list | flatten }}
Select: {{ users | selectattr('active', 'equalto', true) | list }}
Map: {{ users | map(attribute='name') | list }}
Combine dicts: {{ defaults | combine(overrides, recursive=True) }}
Hash: {{ 'password' | password_hash('sha512') }}
B64 encode: {{ 'hello' | b64encode }}
B64 decode: {{ encoded_string | b64decode }}
Path basename: {{ '/etc/nginx/nginx.conf' | basename }}
Path dirname: {{ '/etc/nginx/nginx.conf' | dirname }}
Tests¶
tasks:
- name: Check if variable is defined
ansible.builtin.debug:
msg: "Variable exists"
when: my_var is defined
- name: Check string matching
ansible.builtin.debug:
msg: "It's a web server"
when: inventory_hostname is match("web.*")
- name: Check if path exists
ansible.builtin.stat:
path: /etc/app/config.yml
register: config_file
- name: Template only if file missing
ansible.builtin.template:
src: config.yml.j2
dest: /etc/app/config.yml
when: not config_file.stat.exists
Lookups¶
Lookups read data from external sources on the control node:
vars:
# Read a file
ssh_key: "{{ lookup('file', '~/.ssh/id_rsa.pub') }}"
# Read an environment variable
home_dir: "{{ lookup('env', 'HOME') }}"
# Read from HashiCorp Vault
db_password: "{{ lookup('hashi_vault', 'secret/data/prod/db:password') }}"
# Read from AWS SSM Parameter Store
api_key: "{{ lookup('amazon.aws.aws_ssm', '/prod/api_key') }}"
# Generate a password
app_secret: "{{ lookup('password', '/dev/null length=32 chars=ascii_letters,digits') }}"
# Read from a CSV file
users: "{{ lookup('csvfile', 'jdoe file=users.csv delimiter=, col=2') }}"
# Read lines from a file
allowed_ips: "{{ lookup('file', 'allowed_ips.txt').splitlines() }}"
Roles¶
Standard Role Structure¶
roles/
nginx/
defaults/main.yml # Default variables (lowest precedence)
vars/main.yml # Role variables (high precedence)
tasks/main.yml # Task list
handlers/main.yml # Handlers
templates/ # Jinja2 templates
files/ # Static files
meta/main.yml # Role metadata and dependencies
library/ # Custom modules for this role
module_utils/ # Custom module utilities
tests/ # Role tests
# roles/nginx/meta/main.yml
---
dependencies:
- role: common
- role: ssl_certificates
vars:
cert_domain: "{{ nginx_server_name }}"
galaxy_info:
author: ops-team
description: NGINX web server role
min_ansible_version: "2.14"
platforms:
- name: Ubuntu
versions: [jammy, noble]
- name: EL
versions: [8, 9]
# roles/nginx/defaults/main.yml
---
nginx_worker_processes: auto
nginx_worker_connections: 1024
nginx_server_name: "{{ inventory_hostname }}"
nginx_listen_port: 80
nginx_ssl_enabled: false
nginx_ssl_port: 443
nginx_access_log: /var/log/nginx/access.log
nginx_error_log: /var/log/nginx/error.log
# roles/nginx/tasks/main.yml
---
- name: Include OS-specific variables
ansible.builtin.include_vars: "{{ ansible_os_family | lower }}.yml"
- name: Install NGINX
ansible.builtin.package:
name: nginx
state: present
become: true
- name: Configure NGINX
ansible.builtin.template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
owner: root
group: root
mode: "0644"
validate: nginx -t -c %s
notify: Reload NGINX
become: true
- name: Enable and start NGINX
ansible.builtin.systemd:
name: nginx
state: started
enabled: true
become: true
Role Dependencies¶
Roles can depend on other roles. Dependencies run before the role:
# roles/app_deploy/meta/main.yml
dependencies:
- role: common
- role: nginx
vars:
nginx_listen_port: "{{ app_port }}"
- role: monitoring
when: monitoring_enabled | default(true)
Gotcha: By default, role dependencies only run once per play, even if multiple roles depend on the same role. Set allow_duplicates: true in meta to override.
Collections¶
Collections package roles, modules, plugins, and playbooks into distributable units:
# Install a collection from Ansible Galaxy
ansible-galaxy collection install amazon.aws
# Install from a requirements file
ansible-galaxy collection install -r requirements.yml
# List installed collections
ansible-galaxy collection list
# requirements.yml
collections:
- name: amazon.aws
version: ">=7.0.0"
- name: community.general
version: ">=8.0.0"
- name: ansible.posix
- name: community.docker
roles:
- name: geerlingguy.docker
version: "7.1.0"
Using collection modules in playbooks:
tasks:
# Fully qualified collection name (FQCN) -- recommended
- name: Create S3 bucket
amazon.aws.s3_bucket:
name: my-bucket
state: present
# Short name (works if collection is in configured paths)
- name: Manage Docker container
community.docker.docker_container:
name: myapp
image: myapp:latest
state: started
Always use FQCNs in production playbooks. Short names are ambiguous if multiple collections provide a module with the same name.
Ansible Vault¶
Under the hood: Ansible Vault uses AES-256-CTR encryption with HMAC-SHA256 for integrity. The vault password is stretched using PBKDF2 with a random salt. Each encrypted file or string is self-contained — the salt, HMAC, and ciphertext are all embedded in the
$ANSIBLE_VAULT;1.1;AES256header block.
Vault encrypts sensitive data so it can live in version control safely.
# Encrypt a file
ansible-vault encrypt secrets.yml
# Decrypt a file
ansible-vault decrypt secrets.yml
# Edit an encrypted file (decrypts in-place, re-encrypts on save)
ansible-vault edit secrets.yml
# View without decrypting to disk
ansible-vault view secrets.yml
# Encrypt a string (for inline use)
ansible-vault encrypt_string 'supersecret' --name 'db_password'
# Output:
# db_password: !vault |
# $ANSIBLE_VAULT;1.1;AES256
# 3863616...
# Re-key (change the vault password)
ansible-vault rekey secrets.yml
Multiple Vault IDs¶
Different teams, different secrets, different passwords:
# Encrypt with a vault ID
ansible-vault encrypt --vault-id dev@prompt secrets-dev.yml
ansible-vault encrypt --vault-id prod@/path/to/prod-password-file secrets-prod.yml
# Run playbook with multiple vault IDs
ansible-playbook site.yml \
--vault-id dev@prompt \
--vault-id prod@/path/to/prod-password-file
# Inline vault-encrypted variables with vault IDs
db_password: !vault |
$ANSIBLE_VAULT;1.2;AES256;prod
3863616...
api_key: !vault |
$ANSIBLE_VAULT;1.2;AES256;dev
6162636...
Vault in Practice¶
# group_vars/production/vault.yml (encrypted)
vault_db_password: "s3cr3t_pr0d_p4ss"
vault_api_key: "ak_prod_abc123"
# group_vars/production/vars.yml (plaintext, references vault vars)
db_password: "{{ vault_db_password }}"
api_key: "{{ vault_api_key }}"
This pattern keeps encrypted values in a separate file and references them from plaintext files. You can grep for variable usage without decrypting.
Connection Plugins¶
# SSH (default)
- hosts: linux_servers
connection: ssh
# Local (run on the control node)
- hosts: localhost
connection: local
# WinRM (Windows)
- hosts: windows_servers
vars:
ansible_connection: winrm
ansible_winrm_transport: ntlm
ansible_winrm_server_cert_validation: ignore
# Docker (connect to running containers)
- hosts: my_container
vars:
ansible_connection: community.docker.docker
ansible_docker_extra_args: "--tls"
# Network devices
- hosts: switches
vars:
ansible_connection: ansible.netcommon.network_cli
ansible_network_os: cisco.ios.ios
Privilege Escalation¶
# Play-level become
- hosts: webservers
become: true
become_user: root
become_method: sudo
tasks:
# This runs as root
- name: Install package
ansible.builtin.apt:
name: nginx
state: present
# Override to run as a different user
- name: Deploy application
ansible.builtin.copy:
src: app.jar
dest: /opt/app/app.jar
become_user: appuser
# Disable become for this task
- name: Check local file
ansible.builtin.stat:
path: /tmp/marker
become: false
Other become methods: su, pbrun, pfexec, doas, machinectl, runas (Windows).
Error Handling¶
block/rescue/always¶
tasks:
- name: Handle deployment with rollback
block:
- name: Deploy new version
ansible.builtin.copy:
src: "app-{{ new_version }}.jar"
dest: /opt/app/app.jar
notify: Restart application
- name: Verify health
ansible.builtin.uri:
url: http://localhost:8080/health
register: health
retries: 5
delay: 10
until: health.status == 200
rescue:
- name: Rollback to previous version
ansible.builtin.copy:
src: "app-{{ old_version }}.jar"
dest: /opt/app/app.jar
notify: Restart application
- name: Alert on failure
ansible.builtin.slack:
token: "{{ slack_token }}"
msg: "Deployment of {{ new_version }} failed on {{ inventory_hostname }}, rolled back."
always:
- name: Clean up temp files
ansible.builtin.file:
path: /tmp/deploy-staging
state: absent
Fine-Grained Error Control¶
tasks:
# Override what counts as "failed"
- name: Check if service exists
ansible.builtin.command: systemctl status myapp
register: service_status
failed_when: service_status.rc not in [0, 3, 4]
# rc 3 = inactive, rc 4 = not found -- not failures for this check
# Override what counts as "changed"
- name: Check current version
ansible.builtin.command: /opt/app/version.sh
register: current_version
changed_when: false # This command never changes anything
# Ignore errors and handle manually
- name: Try to stop old service
ansible.builtin.systemd:
name: legacy-app
state: stopped
ignore_errors: true
register: stop_result
- name: Remove old service if it was stopped
ansible.builtin.file:
path: /etc/systemd/system/legacy-app.service
state: absent
when: stop_result is succeeded
Tags¶
Tags control which tasks run:
tasks:
- name: Install packages
ansible.builtin.apt:
name: "{{ item }}"
state: present
loop: "{{ packages }}"
tags: [install, setup]
- name: Configure application
ansible.builtin.template:
src: config.yml.j2
dest: /etc/app/config.yml
tags: [configure]
- name: Deploy code
ansible.builtin.copy:
src: app.jar
dest: /opt/app/app.jar
tags: [deploy]
# Run only tagged tasks
ansible-playbook site.yml --tags deploy
# Run everything except tagged tasks
ansible-playbook site.yml --skip-tags install
# List all tags in a playbook
ansible-playbook site.yml --list-tags
Special tags:
- always -- runs unless explicitly skipped with --skip-tags always
- never -- only runs if explicitly requested with --tags never_tag_name
Delegation¶
Run a task on a different host than the play target:
tasks:
# Run on the load balancer to remove this host
- name: Remove from load balancer
ansible.builtin.command: /usr/local/bin/lb-remove {{ inventory_hostname }}
delegate_to: loadbalancer.example.com
# Run on localhost (control node)
- name: Add DNS record
amazon.aws.route53:
zone: example.com
record: "{{ inventory_hostname }}"
type: A
value: "{{ ansible_default_ipv4.address }}"
delegate_to: localhost
# Run once for the whole group (not per host)
- name: Send deployment notification
ansible.builtin.slack:
token: "{{ slack_token }}"
msg: "Deploying to {{ ansible_play_hosts | length }} servers"
delegate_to: localhost
run_once: true
Async Tasks¶
For long-running operations that might exceed SSH timeout:
tasks:
- name: Run long database migration
ansible.builtin.command: /opt/app/migrate.sh
async: 3600 # Maximum runtime: 1 hour
poll: 0 # Don't wait (fire and forget)
register: migration
- name: Do other work while migration runs
ansible.builtin.apt:
name: monitoring-agent
state: latest
- name: Wait for migration to complete
ansible.builtin.async_status:
jid: "{{ migration.ansible_job_id }}"
register: migration_result
until: migration_result.finished
retries: 120
delay: 30
Strategy Plugins¶
Control how tasks execute across hosts:
# Linear (default): each task runs on all hosts before moving to the next task
- hosts: webservers
strategy: linear
# Free: each host runs through tasks as fast as it can, independently
- hosts: webservers
strategy: free
# Debug: interactive debugger on task failure
- hosts: webservers
strategy: debug
free strategy is faster when hosts have different speeds (some finish tasks quickly, others are slow). But output is interleaved and harder to read.
Check Mode¶
Dry-run without making changes:
# Dry run the entire playbook
ansible-playbook site.yml --check
# Dry run with diff (show what would change)
ansible-playbook site.yml --check --diff
tasks:
# Force this task to always run, even in check mode
- name: Gather application state
ansible.builtin.command: /opt/app/status.sh
check_mode: false
register: app_status
changed_when: false
# Force this task to only run in check mode
- name: Report planned changes
ansible.builtin.debug:
msg: "Would deploy version {{ new_version }}"
check_mode: true
Not all modules support check mode. Modules that don't will be skipped during --check runs. The command and shell modules never support check mode (they always show "skipped").
Wiki Navigation¶
Prerequisites¶
- Ansible Automation (Topic Pack, L1)
Related Content¶
- Ansible Lab: Roles (Lab, L1) — Ansible, Ansible Deep Dive
- Ansible Lab: Templates and Handlers (Lab, L1) — Ansible, Ansible Deep Dive
- Ansible Lab: Vault (Secrets Management) (Lab, L2) — Ansible, Ansible Deep Dive
- Break/Fix: Handler Name Mismatch (Scenario, L1) — Ansible, Ansible Deep Dive
- Break/Fix: Jinja2 Syntax Error in Template (Scenario, L1) — Ansible, Ansible Deep Dive
- Break/Fix: Undefined Variable + Bare Jinja2 (Scenario, L1) — Ansible, Ansible Deep Dive
- Ansible Automation (Topic Pack, L1) — Ansible
- Ansible Core Flashcards (CLI) (flashcard_deck, L1) — Ansible
- Ansible Drills (Drill, L1) — Ansible
- Ansible Exercises (Quest Ladder) (CLI) (Exercise Set, L1) — Ansible
Pages that link here¶
- Ansible - Skill Check
- Ansible Automation
- Ansible Deep Dive
- Ansible Drills
- Anti-Primer: Ansible Deep Dive
- Comparison: Configuration Management
- Fleet Operations at Scale
- Production Readiness Review: Answer Key
- Production Readiness Review: Study Plans
- RHCE (EX294) Exam Preparation
- Symptoms: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Symptoms: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Track: Infrastructure & Data Center Operations