Ansible Deep Dive - Street Ops¶
What experienced Ansible operators deal with daily. Debugging, testing, performance, and production patterns.
Debugging Failed Tasks¶
Increasing Verbosity¶
# Standard verbosity levels
ansible-playbook site.yml -v # Show task results
ansible-playbook site.yml -vv # Show task input parameters
ansible-playbook site.yml -vvv # Show SSH connection details
ansible-playbook site.yml -vvvv # Show connection plugin internals
# Start at a specific task (skip everything before it)
ansible-playbook site.yml --start-at-task="Deploy application"
# Step through tasks one at a time (interactive confirmation)
ansible-playbook site.yml --step
# List all tasks without executing
ansible-playbook site.yml --list-tasks
# List all hosts that would be targeted
ansible-playbook site.yml --list-hosts
Register and Debug¶
tasks:
- name: Check disk space
ansible.builtin.command: df -h /
register: disk_result
changed_when: false
- name: Show full result object
ansible.builtin.debug:
var: disk_result
- name: Show just stdout
ansible.builtin.debug:
msg: "Disk usage: {{ disk_result.stdout_lines[-1] }}"
- name: Show return code
ansible.builtin.debug:
msg: "Return code: {{ disk_result.rc }}"
The register variable contains: stdout, stdout_lines, stderr, stderr_lines, rc (return code), changed, failed, skipped, and module-specific keys.
Common Debug Patterns¶
tasks:
# Print all facts for a host
- name: Dump all facts
ansible.builtin.debug:
var: ansible_facts
# Print a specific fact
- name: Show OS family
ansible.builtin.debug:
var: ansible_os_family
# Print variable with its type
- name: Inspect variable
ansible.builtin.debug:
msg: "Value: {{ my_var }} (type: {{ my_var | type_debug }})"
# Conditional debug
- name: Alert on low memory
ansible.builtin.debug:
msg: "WARNING: Only {{ ansible_memfree_mb }}MB free on {{ inventory_hostname }}"
when: ansible_memfree_mb < 512
Testing Playbooks¶
ansible-lint¶
# Install
pip install ansible-lint
# Run on a playbook
ansible-lint site.yml
# Run on all YAML in a directory
ansible-lint roles/
# Skip specific rules
ansible-lint -x command-instead-of-module,no-changed-when site.yml
# Show all available rules
ansible-lint -L
Common lint rules that catch real problems:
- command-instead-of-module: Using shell: apt install instead of apt module
- no-changed-when: Command/shell tasks without changed_when
- risky-file-permissions: Files created without explicit mode
- yaml[truthy]: Using yes/no instead of true/false
- name[missing]: Tasks without names
Molecule (Role Testing Framework)¶
# Install molecule with Docker driver
pip install 'molecule[docker]'
# Initialize a new role with molecule tests
molecule init role my_role
# Or add molecule to an existing role
cd roles/nginx
molecule init scenario
# molecule/default/molecule.yml
---
driver:
name: docker
platforms:
- name: ubuntu-noble
image: ubuntu:noble
pre_build_image: true
command: /lib/systemd/systemd
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:rw
- name: rocky-9
image: rockylinux:9
pre_build_image: true
command: /lib/systemd/systemd
privileged: true
provisioner:
name: ansible
playbooks:
converge: converge.yml
verify: verify.yml
verifier:
name: ansible
# molecule/default/verify.yml
---
- name: Verify
hosts: all
tasks:
- name: Check nginx is installed
ansible.builtin.package_facts:
manager: auto
- name: Assert nginx is installed
ansible.builtin.assert:
that: "'nginx' in ansible_facts.packages"
- name: Check nginx is running
ansible.builtin.service_facts:
- name: Assert nginx is running
ansible.builtin.assert:
that: "ansible_facts.services['nginx.service'].state == 'running'"
- name: Check nginx responds
ansible.builtin.uri:
url: http://localhost:80
register: result
- name: Assert nginx responds with 200
ansible.builtin.assert:
that: result.status == 200
# Run the full test sequence
molecule test
# Sequence: dependency → lint → cleanup → destroy → syntax → create →
# prepare → converge → idempotence → verify → cleanup → destroy
# Just converge (create + run playbook, leave containers up)
molecule converge
# Run verification
molecule verify
# SSH into the test container
molecule login -h ubuntu-noble
# Destroy test containers
molecule destroy
Check Mode Testing¶
# Dry run -- shows what would change without changing anything
ansible-playbook site.yml --check --diff
# Useful in CI: run check mode and fail if anything would change
# (validates idempotency -- second run should show no changes)
ansible-playbook site.yml --check --diff 2>&1 | tee check_output.txt
if grep -q "changed=" check_output.txt; then
echo "NOT IDEMPOTENT: changes detected on second run"
exit 1
fi
Rolling Updates¶
Serial Execution¶
- name: Rolling update web servers
hosts: webservers
serial: 2 # Update 2 hosts at a time
max_fail_percentage: 25 # Abort if >25% fail
pre_tasks:
- name: Remove from load balancer
ansible.builtin.command: /usr/local/bin/lb-remove {{ inventory_hostname }}
delegate_to: loadbalancer.example.com
tasks:
- name: Stop application
ansible.builtin.systemd:
name: myapp
state: stopped
- name: Deploy new version
ansible.builtin.copy:
src: "app-{{ app_version }}.jar"
dest: /opt/app/app.jar
notify: Restart application
- name: Start application
ansible.builtin.systemd:
name: myapp
state: started
- name: Wait for application to be ready
ansible.builtin.uri:
url: http://localhost:8080/health
register: health
retries: 10
delay: 5
until: health.status == 200
post_tasks:
- name: Add back to load balancer
ansible.builtin.command: /usr/local/bin/lb-add {{ inventory_hostname }}
delegate_to: loadbalancer.example.com
handlers:
- name: Restart application
ansible.builtin.systemd:
name: myapp
state: restarted
Serial can be a number, percentage, or list:
serial:
- 1 # First batch: 1 host (canary)
- 5 # Second batch: 5 hosts
- "25%" # Remaining: 25% at a time
Secret Management with Vault¶
Operational Patterns¶
# Use a vault password file (no interactive prompt)
echo "mypassword" > ~/.vault_pass.txt
chmod 600 ~/.vault_pass.txt
export ANSIBLE_VAULT_PASSWORD_FILE=~/.vault_pass.txt
# Or set in ansible.cfg
# [defaults]
# vault_password_file = ~/.vault_pass.txt
# Encrypt specific variables in a file (not the whole file)
ansible-vault encrypt_string 'db_p@ssw0rd' --name 'db_password' >> vars/secrets.yml
# View encrypted variable values
ansible localhost -m debug -a "var=db_password" -e @vars/secrets.yml --vault-password-file ~/.vault_pass.txt
Vault in CI/CD¶
# GitHub Actions example
- name: Run Ansible playbook
env:
ANSIBLE_VAULT_PASSWORD: ${{ secrets.ANSIBLE_VAULT_PASSWORD }}
run: |
echo "$ANSIBLE_VAULT_PASSWORD" > /tmp/.vault_pass
ansible-playbook -i inventory/production site.yml \
--vault-password-file /tmp/.vault_pass
rm -f /tmp/.vault_pass
Dynamic Inventory for Cloud¶
AWS EC2 Pattern¶
# inventory/aws_ec2.yml
plugin: amazon.aws.aws_ec2
regions:
- us-east-1
keyed_groups:
- key: tags.Role
prefix: role
- key: tags.Environment
prefix: env
filters:
tag:ManagedBy: ansible
instance-state-name: running
hostnames:
- private-ip-address
compose:
ansible_host: private_ip_address
ansible_user: "'ubuntu'"
# Test the inventory
ansible-inventory -i inventory/aws_ec2.yml --graph
# Output:
# @all:
# |--@env_production:
# | |--10.0.1.10
# | |--10.0.1.11
# |--@role_webserver:
# | |--10.0.1.10
# |--@role_database:
# | |--10.0.1.11
# Use in playbook
ansible-playbook -i inventory/aws_ec2.yml site.yml --limit role_webserver
GCP Compute Pattern¶
# inventory/gcp_compute.yml
plugin: google.cloud.gcp_compute
projects:
- my-project-123456
auth_kind: serviceaccount
service_account_file: /path/to/sa.json
keyed_groups:
- key: labels.role
prefix: role
- key: zone
prefix: zone
filters:
- labels.managed_by = ansible
- status = RUNNING
compose:
ansible_host: networkInterfaces[0].accessConfigs[0].natIP | default(networkInterfaces[0].networkIP)
Handling Different OS Families¶
tasks:
- name: Include OS-specific variables
ansible.builtin.include_vars: "{{ ansible_os_family | lower }}.yml"
- name: Install packages (Debian/Ubuntu)
ansible.builtin.apt:
name: "{{ packages }}"
state: present
update_cache: true
cache_valid_time: 3600
when: ansible_os_family == "Debian"
- name: Install packages (RHEL/Rocky)
ansible.builtin.dnf:
name: "{{ packages }}"
state: present
when: ansible_os_family == "RedHat"
# Or use the generic package module (less control but cross-platform)
- name: Install packages (any OS)
ansible.builtin.package:
name: "{{ common_packages }}"
state: present
# vars/debian.yml
packages:
- nginx
- python3-pip
- libpq-dev
service_name: nginx
# vars/redhat.yml
packages:
- nginx
- python3-pip
- libpq-devel
service_name: nginx
Idempotency Checking¶
tasks:
# BAD: always shows "changed"
- name: Set hostname
ansible.builtin.command: hostnamectl set-hostname {{ inventory_hostname }}
# GOOD: only "changed" when hostname actually changes
- name: Set hostname
ansible.builtin.hostname:
name: "{{ inventory_hostname }}"
# When you must use command, add changed_when
- name: Check current hostname
ansible.builtin.command: hostname
register: current_hostname
changed_when: false
- name: Set hostname if different
ansible.builtin.command: hostnamectl set-hostname {{ inventory_hostname }}
when: current_hostname.stdout != inventory_hostname
Performance Optimization¶
SSH Pipelining¶
Reduces the number of SSH connections per task from 2 to 1:
Requires requiretty to be disabled in sudoers on target hosts. Most modern distros don't have this issue.
Forks¶
Control parallelism:
Fact Caching¶
Gathering facts on 500 hosts is slow. Cache them:
# ansible.cfg
[defaults]
gathering = smart # Only gather if not cached
fact_caching = jsonfile # Or redis, memcached
fact_caching_connection = /tmp/ansible_facts_cache
fact_caching_timeout = 86400 # 24 hours
Selective Fact Gathering¶
# Disable facts entirely if you don't need them
- hosts: webservers
gather_facts: false
tasks:
- name: Deploy config
ansible.builtin.copy:
src: app.conf
dest: /etc/app/config
# Or gather only specific facts
- hosts: webservers
gather_facts: false
tasks:
- name: Gather only network facts
ansible.builtin.setup:
gather_subset:
- network
- hardware
Mitogen (Third-Party Accelerator)¶
Mitogen replaces Ansible's SSH-based execution with a more efficient RPC mechanism. Can be 2-7x faster.
# ansible.cfg
[defaults]
strategy_plugins = /path/to/mitogen/ansible_mitogen/plugins/strategy
strategy = mitogen_linear
Caveat: Mitogen is a third-party project. Test thoroughly before using in production. It may not support all connection plugins or become methods.
Async for Long Tasks¶
tasks:
# Run package updates in parallel across all hosts
- name: Update all packages
ansible.builtin.apt:
upgrade: dist
async: 1800 # Allow up to 30 minutes
poll: 0 # Don't wait
register: update_job
# Continue with other tasks...
- name: Wait for updates to complete
ansible.builtin.async_status:
jid: "{{ update_job.ansible_job_id }}"
register: update_result
until: update_result.finished
retries: 60
delay: 30
Managing Large Inventories¶
# Test connectivity to all hosts
ansible all -i inventory/ -m ping --forks 50
# Run ad-hoc commands
ansible webservers -i inventory/ -m command -a "uptime" --forks 30
# Gather facts and cache them
ansible all -i inventory/ -m setup --forks 50
# Check which hosts match a pattern
ansible-inventory -i inventory/ --list --limit 'webservers:&production'
# Graph the inventory structure
ansible-inventory -i inventory/ --graph
Inventory Patterns¶
# All hosts in webservers AND production
ansible 'webservers:&production' -m ping
# All hosts in webservers OR dbservers
ansible 'webservers:dbservers' -m ping
# All hosts in webservers but NOT in maintenance
ansible 'webservers:!maintenance' -m ping
# Regex match
ansible '~web[0-9]+\.example\.com' -m ping
ansible-navigator vs ansible-playbook¶
ansible-navigator is the modern replacement for ansible-playbook:
# Install
pip install ansible-navigator
# Run a playbook (same as ansible-playbook but with a TUI)
ansible-navigator run site.yml -i inventory/
# Run in stdout mode (like ansible-playbook)
ansible-navigator run site.yml -i inventory/ --mode stdout
# Explore inventory
ansible-navigator inventory -i inventory/ --mode interactive
# View documentation
ansible-navigator doc ansible.builtin.copy
# Replay a previous run
ansible-navigator replay /path/to/artifact.json
ansible-navigator uses execution environments (container images with Ansible and dependencies). This ensures consistent execution regardless of the control node's installed packages.
# ansible-navigator.yml
---
ansible-navigator:
execution-environment:
image: quay.io/ansible/creator-ee:latest
pull:
policy: missing
mode: stdout
playbook-artifact:
enable: true
save-as: artifacts/{playbook_name}-{ts_utc}.json
Ad-Hoc Commands¶
Quick tasks without writing a playbook:
# Check connectivity
ansible all -m ping
# Run a command
ansible webservers -m command -a "df -h /"
# Copy a file
ansible webservers -m copy -a "src=hotfix.conf dest=/etc/app/config.conf backup=yes" --become
# Install a package
ansible webservers -m apt -a "name=htop state=present" --become
# Restart a service
ansible webservers -m systemd -a "name=nginx state=restarted" --become
# Gather facts
ansible web1.example.com -m setup -a "filter=ansible_distribution*"