Skip to content

Portal | Level: L2: Operations | Topics: Ansible Deep Dive, Ansible | Domain: DevOps & Tooling

Ansible Deep Dive - Primer

Beyond the basics. This covers the internals, advanced patterns, and production-grade techniques that separate playbook authors from Ansible operators.

Inventory Deep Dive

Static Inventory Formats

INI format is simple, YAML gives you structure:

# inventory/hosts.yml
all:
  children:
    webservers:
      hosts:
        web1.example.com:
          http_port: 8080
          ansible_host: 10.0.1.10
        web2.example.com:
          http_port: 8081
      vars:
        app_env: production
        deploy_user: deployer
    dbservers:
      hosts:
        db1.example.com:
          ansible_port: 2222
        db2.example.com:
    production:
      children:
        webservers:
        dbservers:

Dynamic Inventory

For cloud infrastructure, static inventory doesn't scale. Dynamic inventory plugins query cloud APIs at runtime.

AWS EC2 inventory plugin:

# inventory/aws_ec2.yml
plugin: amazon.aws.aws_ec2
regions:
  - us-east-1
  - us-west-2
keyed_groups:
  - key: tags.Environment
    prefix: env
  - key: instance_type
    prefix: type
  - key: placement.availability_zone
    prefix: az
filters:
  tag:ManagedBy: ansible
  instance-state-name: running
compose:
  ansible_host: private_ip_address
# Test dynamic inventory
ansible-inventory -i inventory/aws_ec2.yml --list
ansible-inventory -i inventory/aws_ec2.yml --graph

GCP inventory plugin:

# inventory/gcp.yml
plugin: google.cloud.gcp_compute
projects:
  - myproject-123456
zones:
  - us-central1-a
  - us-central1-b
filters:
  - labels.managed_by = ansible
keyed_groups:
  - key: labels.environment
    prefix: env
compose:
  ansible_host: networkInterfaces[0].networkIP

host_vars and group_vars

Directory-based variable organization:

inventory/
  hosts.yml
  group_vars/
    all.yml           # Variables for all hosts
    webservers.yml    # Variables for webservers group
    dbservers.yml     # Variables for dbservers group
    production.yml    # Variables for production group
  host_vars/
    web1.example.com.yml   # Variables for specific host
    db1.example.com.yml
# inventory/group_vars/all.yml
ntp_servers:
  - 0.pool.ntp.org
  - 1.pool.ntp.org
timezone: UTC
monitoring_agent: prometheus-node-exporter

# inventory/group_vars/webservers.yml
nginx_worker_processes: auto
nginx_worker_connections: 2048
ssl_certificate_path: /etc/ssl/certs/app.pem

Variables in host_vars/ override group_vars/. This is part of the 22-level precedence hierarchy.

Variable Precedence

Remember: The only two precedence levels most people need to memorize: role defaults are the bottom (meant to be overridden), and extra vars (-e) are the top (always win). Everything else falls in between. When debugging "why does this variable have this value," use ansible -m debug -a "var=my_var" against a host to see the resolved value.

Ansible has 22 levels of variable precedence. The most important ones, from lowest to highest:

1.  command line values (e.g., -u user)
2.  role defaults (defaults/main.yml)
3.  inventory file or script group vars
4.  inventory group_vars/all
5.  playbook group_vars/all
6.  inventory group_vars/*
7.  playbook group_vars/*
8.  inventory file or script host vars
9.  inventory host_vars/*
10. playbook host_vars/*
11. host facts / cached set_facts
12. play vars
13. play vars_prompt
14. play vars_files
15. role vars (vars/main.yml)
16. block vars (only for tasks in block)
17. task vars (only for the task)
18. include_vars
19. set_facts / registered vars
20. role (and include_role) params
21. include params
22. extra vars (-e "key=value") — ALWAYS WIN

The practical rules:

  • Role defaults (defaults/main.yml): Low precedence. Meant to be overridden. Put sensible defaults here.
  • Role vars (vars/main.yml): High precedence. Hard to override. Use for values that should NOT change per environment.
  • Extra vars (-e): Always win. Use for one-time overrides. Don't rely on them for regular operation.
  • group_vars/all: Middle ground. Good for organization-wide settings.
# Debug variable precedence: see what value a host gets
ansible -i inventory/ web1.example.com -m debug -a "var=http_port"

# Dump all variables for a host
ansible -i inventory/ web1.example.com -m setup
ansible -i inventory/ web1.example.com -m debug -a "var=hostvars[inventory_hostname]"

Playbook Anatomy

Plays, Tasks, and Handlers

---
# A playbook contains one or more plays
- name: Configure web servers
  hosts: webservers
  become: true
  gather_facts: true
  vars:
    app_version: "2.4.1"

  pre_tasks:
    - name: Update apt cache
      ansible.builtin.apt:
        update_cache: true
        cache_valid_time: 3600

  roles:
    - role: common
    - role: nginx
      vars:
        nginx_port: 8080
    - role: app_deploy
      tags: [deploy]

  tasks:
    - name: Ensure application config
      ansible.builtin.template:
        src: app.conf.j2
        dest: /etc/app/config.yml
        owner: appuser
        group: appuser
        mode: "0640"
      notify: Restart application

    - name: Ensure application is running
      ansible.builtin.systemd:
        name: myapp
        state: started
        enabled: true

  post_tasks:
    - name: Verify application health
      ansible.builtin.uri:
        url: "http://localhost:8080/health"
        return_content: true
      register: health_check
      failed_when: health_check.status != 200

  handlers:
    - name: Restart application
      ansible.builtin.systemd:
        name: myapp
        state: restarted

Execution order: pre_tasks -> roles -> tasks -> post_tasks. Handlers run at the end of each section (or when flushed).

imports vs includes

This distinction matters for control flow:

# IMPORT: static, parsed at playbook load time
- name: Import common tasks
  ansible.builtin.import_tasks: common.yml
  # Conditionals apply to EACH task inside the file
  when: ansible_os_family == "Debian"
  # Tags apply to EACH task inside the file
  tags: [setup]

# INCLUDE: dynamic, parsed at runtime
- name: Include OS-specific tasks
  ansible.builtin.include_tasks: "{{ ansible_os_family | lower }}.yml"
  # Conditionals apply to the INCLUDE itself (all or nothing)
  when: setup_os_packages
  # Tags apply to the INCLUDE itself, not inner tasks
  tags: [setup]

Rule of thumb: Use import_tasks when the file is always the same. Use include_tasks when the filename is dynamic or you need conditional inclusion of the entire file.

Import quirks: - import_tasks + when: the condition is applied to every task inside the file (can be confusing) - import_tasks cannot use loops - import_role makes the role's handlers available to the whole play

Include quirks: - include_tasks + tags: inner tasks don't inherit the tag (use apply block to force it) - include_tasks with loops: runs the entire file once per loop iteration - Variables set inside included files may not be available outside

Jinja2 Templating

Filters

tasks:
  - name: Demonstrate filters
    ansible.builtin.debug:
      msg: |
        Default value: {{ missing_var | default('fallback') }}
        Mandatory:     {{ required_var | mandatory }}
        To JSON:       {{ my_dict | to_nice_json }}
        To YAML:       {{ my_dict | to_nice_yaml }}
        Regex replace: {{ hostname | regex_replace('\.example\.com$', '') }}
        IP address:    {{ ansible_default_ipv4.address | ansible.utils.ipaddr('address') }}
        Join list:     {{ my_list | join(', ') }}
        Unique:        {{ my_list | unique }}
        Flatten:       {{ nested_list | flatten }}
        Select:        {{ users | selectattr('active', 'equalto', true) | list }}
        Map:           {{ users | map(attribute='name') | list }}
        Combine dicts: {{ defaults | combine(overrides, recursive=True) }}
        Hash:          {{ 'password' | password_hash('sha512') }}
        B64 encode:    {{ 'hello' | b64encode }}
        B64 decode:    {{ encoded_string | b64decode }}
        Path basename: {{ '/etc/nginx/nginx.conf' | basename }}
        Path dirname:  {{ '/etc/nginx/nginx.conf' | dirname }}

Tests

tasks:
  - name: Check if variable is defined
    ansible.builtin.debug:
      msg: "Variable exists"
    when: my_var is defined

  - name: Check string matching
    ansible.builtin.debug:
      msg: "It's a web server"
    when: inventory_hostname is match("web.*")

  - name: Check if path exists
    ansible.builtin.stat:
      path: /etc/app/config.yml
    register: config_file

  - name: Template only if file missing
    ansible.builtin.template:
      src: config.yml.j2
      dest: /etc/app/config.yml
    when: not config_file.stat.exists

Lookups

Lookups read data from external sources on the control node:

vars:
  # Read a file
  ssh_key: "{{ lookup('file', '~/.ssh/id_rsa.pub') }}"

  # Read an environment variable
  home_dir: "{{ lookup('env', 'HOME') }}"

  # Read from HashiCorp Vault
  db_password: "{{ lookup('hashi_vault', 'secret/data/prod/db:password') }}"

  # Read from AWS SSM Parameter Store
  api_key: "{{ lookup('amazon.aws.aws_ssm', '/prod/api_key') }}"

  # Generate a password
  app_secret: "{{ lookup('password', '/dev/null length=32 chars=ascii_letters,digits') }}"

  # Read from a CSV file
  users: "{{ lookup('csvfile', 'jdoe file=users.csv delimiter=, col=2') }}"

  # Read lines from a file
  allowed_ips: "{{ lookup('file', 'allowed_ips.txt').splitlines() }}"

Roles

Standard Role Structure

roles/
  nginx/
    defaults/main.yml     # Default variables (lowest precedence)
    vars/main.yml         # Role variables (high precedence)
    tasks/main.yml        # Task list
    handlers/main.yml     # Handlers
    templates/            # Jinja2 templates
    files/                # Static files
    meta/main.yml         # Role metadata and dependencies
    library/              # Custom modules for this role
    module_utils/         # Custom module utilities
    tests/                # Role tests
# roles/nginx/meta/main.yml
---
dependencies:
  - role: common
  - role: ssl_certificates
    vars:
      cert_domain: "{{ nginx_server_name }}"

galaxy_info:
  author: ops-team
  description: NGINX web server role
  min_ansible_version: "2.14"
  platforms:
    - name: Ubuntu
      versions: [jammy, noble]
    - name: EL
      versions: [8, 9]
# roles/nginx/defaults/main.yml
---
nginx_worker_processes: auto
nginx_worker_connections: 1024
nginx_server_name: "{{ inventory_hostname }}"
nginx_listen_port: 80
nginx_ssl_enabled: false
nginx_ssl_port: 443
nginx_access_log: /var/log/nginx/access.log
nginx_error_log: /var/log/nginx/error.log
# roles/nginx/tasks/main.yml
---
- name: Include OS-specific variables
  ansible.builtin.include_vars: "{{ ansible_os_family | lower }}.yml"

- name: Install NGINX
  ansible.builtin.package:
    name: nginx
    state: present
  become: true

- name: Configure NGINX
  ansible.builtin.template:
    src: nginx.conf.j2
    dest: /etc/nginx/nginx.conf
    owner: root
    group: root
    mode: "0644"
    validate: nginx -t -c %s
  notify: Reload NGINX
  become: true

- name: Enable and start NGINX
  ansible.builtin.systemd:
    name: nginx
    state: started
    enabled: true
  become: true

Role Dependencies

Roles can depend on other roles. Dependencies run before the role:

# roles/app_deploy/meta/main.yml
dependencies:
  - role: common
  - role: nginx
    vars:
      nginx_listen_port: "{{ app_port }}"
  - role: monitoring
    when: monitoring_enabled | default(true)

Gotcha: By default, role dependencies only run once per play, even if multiple roles depend on the same role. Set allow_duplicates: true in meta to override.

Collections

Collections package roles, modules, plugins, and playbooks into distributable units:

# Install a collection from Ansible Galaxy
ansible-galaxy collection install amazon.aws

# Install from a requirements file
ansible-galaxy collection install -r requirements.yml

# List installed collections
ansible-galaxy collection list
# requirements.yml
collections:
  - name: amazon.aws
    version: ">=7.0.0"
  - name: community.general
    version: ">=8.0.0"
  - name: ansible.posix
  - name: community.docker

roles:
  - name: geerlingguy.docker
    version: "7.1.0"

Using collection modules in playbooks:

tasks:
  # Fully qualified collection name (FQCN) -- recommended
  - name: Create S3 bucket
    amazon.aws.s3_bucket:
      name: my-bucket
      state: present

  # Short name (works if collection is in configured paths)
  - name: Manage Docker container
    community.docker.docker_container:
      name: myapp
      image: myapp:latest
      state: started

Always use FQCNs in production playbooks. Short names are ambiguous if multiple collections provide a module with the same name.

Ansible Vault

Under the hood: Ansible Vault uses AES-256-CTR encryption with HMAC-SHA256 for integrity. The vault password is stretched using PBKDF2 with a random salt. Each encrypted file or string is self-contained — the salt, HMAC, and ciphertext are all embedded in the $ANSIBLE_VAULT;1.1;AES256 header block.

Vault encrypts sensitive data so it can live in version control safely.

# Encrypt a file
ansible-vault encrypt secrets.yml

# Decrypt a file
ansible-vault decrypt secrets.yml

# Edit an encrypted file (decrypts in-place, re-encrypts on save)
ansible-vault edit secrets.yml

# View without decrypting to disk
ansible-vault view secrets.yml

# Encrypt a string (for inline use)
ansible-vault encrypt_string 'supersecret' --name 'db_password'
# Output:
# db_password: !vault |
#   $ANSIBLE_VAULT;1.1;AES256
#   3863616...

# Re-key (change the vault password)
ansible-vault rekey secrets.yml

Multiple Vault IDs

Different teams, different secrets, different passwords:

# Encrypt with a vault ID
ansible-vault encrypt --vault-id dev@prompt secrets-dev.yml
ansible-vault encrypt --vault-id prod@/path/to/prod-password-file secrets-prod.yml

# Run playbook with multiple vault IDs
ansible-playbook site.yml \
  --vault-id dev@prompt \
  --vault-id prod@/path/to/prod-password-file
# Inline vault-encrypted variables with vault IDs
db_password: !vault |
  $ANSIBLE_VAULT;1.2;AES256;prod
  3863616...

api_key: !vault |
  $ANSIBLE_VAULT;1.2;AES256;dev
  6162636...

Vault in Practice

# group_vars/production/vault.yml (encrypted)
vault_db_password: "s3cr3t_pr0d_p4ss"
vault_api_key: "ak_prod_abc123"

# group_vars/production/vars.yml (plaintext, references vault vars)
db_password: "{{ vault_db_password }}"
api_key: "{{ vault_api_key }}"

This pattern keeps encrypted values in a separate file and references them from plaintext files. You can grep for variable usage without decrypting.

Connection Plugins

# SSH (default)
- hosts: linux_servers
  connection: ssh

# Local (run on the control node)
- hosts: localhost
  connection: local

# WinRM (Windows)
- hosts: windows_servers
  vars:
    ansible_connection: winrm
    ansible_winrm_transport: ntlm
    ansible_winrm_server_cert_validation: ignore

# Docker (connect to running containers)
- hosts: my_container
  vars:
    ansible_connection: community.docker.docker
    ansible_docker_extra_args: "--tls"

# Network devices
- hosts: switches
  vars:
    ansible_connection: ansible.netcommon.network_cli
    ansible_network_os: cisco.ios.ios

Privilege Escalation

# Play-level become
- hosts: webservers
  become: true
  become_user: root
  become_method: sudo

  tasks:
    # This runs as root
    - name: Install package
      ansible.builtin.apt:
        name: nginx
        state: present

    # Override to run as a different user
    - name: Deploy application
      ansible.builtin.copy:
        src: app.jar
        dest: /opt/app/app.jar
      become_user: appuser

    # Disable become for this task
    - name: Check local file
      ansible.builtin.stat:
        path: /tmp/marker
      become: false

Other become methods: su, pbrun, pfexec, doas, machinectl, runas (Windows).

Error Handling

block/rescue/always

tasks:
  - name: Handle deployment with rollback
    block:
      - name: Deploy new version
        ansible.builtin.copy:
          src: "app-{{ new_version }}.jar"
          dest: /opt/app/app.jar
        notify: Restart application

      - name: Verify health
        ansible.builtin.uri:
          url: http://localhost:8080/health
        register: health
        retries: 5
        delay: 10
        until: health.status == 200

    rescue:
      - name: Rollback to previous version
        ansible.builtin.copy:
          src: "app-{{ old_version }}.jar"
          dest: /opt/app/app.jar
        notify: Restart application

      - name: Alert on failure
        ansible.builtin.slack:
          token: "{{ slack_token }}"
          msg: "Deployment of {{ new_version }} failed on {{ inventory_hostname }}, rolled back."

    always:
      - name: Clean up temp files
        ansible.builtin.file:
          path: /tmp/deploy-staging
          state: absent

Fine-Grained Error Control

tasks:
  # Override what counts as "failed"
  - name: Check if service exists
    ansible.builtin.command: systemctl status myapp
    register: service_status
    failed_when: service_status.rc not in [0, 3, 4]
    # rc 3 = inactive, rc 4 = not found -- not failures for this check

  # Override what counts as "changed"
  - name: Check current version
    ansible.builtin.command: /opt/app/version.sh
    register: current_version
    changed_when: false  # This command never changes anything

  # Ignore errors and handle manually
  - name: Try to stop old service
    ansible.builtin.systemd:
      name: legacy-app
      state: stopped
    ignore_errors: true
    register: stop_result

  - name: Remove old service if it was stopped
    ansible.builtin.file:
      path: /etc/systemd/system/legacy-app.service
      state: absent
    when: stop_result is succeeded

Tags

Tags control which tasks run:

tasks:
  - name: Install packages
    ansible.builtin.apt:
      name: "{{ item }}"
      state: present
    loop: "{{ packages }}"
    tags: [install, setup]

  - name: Configure application
    ansible.builtin.template:
      src: config.yml.j2
      dest: /etc/app/config.yml
    tags: [configure]

  - name: Deploy code
    ansible.builtin.copy:
      src: app.jar
      dest: /opt/app/app.jar
    tags: [deploy]
# Run only tagged tasks
ansible-playbook site.yml --tags deploy

# Run everything except tagged tasks
ansible-playbook site.yml --skip-tags install

# List all tags in a playbook
ansible-playbook site.yml --list-tags

Special tags: - always -- runs unless explicitly skipped with --skip-tags always - never -- only runs if explicitly requested with --tags never_tag_name

Delegation

Run a task on a different host than the play target:

tasks:
  # Run on the load balancer to remove this host
  - name: Remove from load balancer
    ansible.builtin.command: /usr/local/bin/lb-remove {{ inventory_hostname }}
    delegate_to: loadbalancer.example.com

  # Run on localhost (control node)
  - name: Add DNS record
    amazon.aws.route53:
      zone: example.com
      record: "{{ inventory_hostname }}"
      type: A
      value: "{{ ansible_default_ipv4.address }}"
    delegate_to: localhost

  # Run once for the whole group (not per host)
  - name: Send deployment notification
    ansible.builtin.slack:
      token: "{{ slack_token }}"
      msg: "Deploying to {{ ansible_play_hosts | length }} servers"
    delegate_to: localhost
    run_once: true

Async Tasks

For long-running operations that might exceed SSH timeout:

tasks:
  - name: Run long database migration
    ansible.builtin.command: /opt/app/migrate.sh
    async: 3600        # Maximum runtime: 1 hour
    poll: 0            # Don't wait (fire and forget)
    register: migration

  - name: Do other work while migration runs
    ansible.builtin.apt:
      name: monitoring-agent
      state: latest

  - name: Wait for migration to complete
    ansible.builtin.async_status:
      jid: "{{ migration.ansible_job_id }}"
    register: migration_result
    until: migration_result.finished
    retries: 120
    delay: 30

Strategy Plugins

Control how tasks execute across hosts:

# Linear (default): each task runs on all hosts before moving to the next task
- hosts: webservers
  strategy: linear

# Free: each host runs through tasks as fast as it can, independently
- hosts: webservers
  strategy: free

# Debug: interactive debugger on task failure
- hosts: webservers
  strategy: debug

free strategy is faster when hosts have different speeds (some finish tasks quickly, others are slow). But output is interleaved and harder to read.

Check Mode

Dry-run without making changes:

# Dry run the entire playbook
ansible-playbook site.yml --check

# Dry run with diff (show what would change)
ansible-playbook site.yml --check --diff
tasks:
  # Force this task to always run, even in check mode
  - name: Gather application state
    ansible.builtin.command: /opt/app/status.sh
    check_mode: false
    register: app_status
    changed_when: false

  # Force this task to only run in check mode
  - name: Report planned changes
    ansible.builtin.debug:
      msg: "Would deploy version {{ new_version }}"
    check_mode: true

Not all modules support check mode. Modules that don't will be skipped during --check runs. The command and shell modules never support check mode (they always show "skipped").


Wiki Navigation

Prerequisites