Skip to content

Interview Cheatsheet Ansible

  • lesson ---# Ansible Interview Cheatsheet

Model: Control node → SSH/WinRM → managed nodes | Inventory = who | Playbook = what | Module = idempotent unit | Role = reusable package - ansible-core = engine + builtins | ansible = engine + community collections | Always use FQCNs: ansible.builtin.package - Agentless: targets need only Python + SSH | Push-based: you initiate from control node - Ansible configures servers (packages, files, services). Terraform builds infrastructure. Helm deploys K8s apps.


Golden Rules

  1. Modules over shell — modules know state, shell doesn't | ansible.builtin.package not shell: apt install
  2. Desired state, not imperative scripts | Chase idempotence: 2nd run should be mostly ok, not changed
  3. template for files you own entirely | lineinfile for surgical edits | blockinfile for managed blocks
  4. validate before replacing critical configs: validate: nginx -t -c %s catches broken templates before writing
  5. Use handlers for restart/reload (fire only on change, only once) — not inline service restarts
  6. Test second-run idempotence — if anything shows "changed" on re-run, you have a bug

Inventory

  • Static YAML or dynamic plugins (AWS EC2, VMware, K8s, constructed) | group_vars/ host_vars/ for env-specific data
  • ansible-inventory --graph to visualize | ansible-inventory --list for full variable dump
  • Group by function AND environment | Keep names stable even if IPs change

Key inventory variables: ansible_host ansible_user ansible_port ansible_connection ansible_python_interpreter ansible_become


Variables — Precedence Matters

(lowest)  role defaults → inventory vars → group_vars → host_vars → play vars
          → role vars (HIGHER than play vars!) → task vars → extra vars -e  (highest, always wins)
  • defaults/main.yml = knobs users should turn (low precedence) | vars/main.yml = constants (high precedence)
  • Never put the same variable in both — you'll debug precedence at 2 AM
  • Debug: ansible host -m debug -a "var=app_port" | Modern access: ansible_facts['distribution']
  • "When behavior is weird, it's usually precedence, inventory targeting, or an idempotence bug"

Safe Rollout

- hosts: web
  serial: [1, "10%", "25%"]      # canary → widen → full
  max_fail_percentage: 10         # circuit breaker — stop if >10% fail
  • --check --diff before every prod run (non-negotiable) | --limit web1 = single-host test
  • delegate_to: localhost for API/LB calls — changes WHERE task runs, not WHOSE variables it sees
  • block/rescue/always = try/catch/finally for structured rollback

Minimal safe example:

- hosts: web
  serial: 25%
  tasks:
    - name: Render nginx config safely
      ansible.builtin.template:
        src: nginx.conf.j2
        dest: /etc/nginx/nginx.conf
        validate: "nginx -t -c %s"
      notify: reload nginx

  handlers:
    - name: reload nginx
      ansible.builtin.service:
        name: nginx
        state: reloaded


Handlers

  • Run at end of play, not after the notifying task
  • Only fire if task reports "changed" (not "ok")
  • meta: flush_handlers forces immediate execution
  • Handler name mismatch with notify: = silent no-op (no error, just doesn't fire)
  • If play fails before handler phase → handler never runs → config changed but service still on old version

Vault

  • AES-256 encryption for secrets safe to commit to git
  • vault/vars split pattern: vault.yml (encrypted values) + vars.yml (plaintext references like db_pass: "{{ vault_db_pass }}")
  • Keep secrets out of shell history:
    echo -n 'secret' | ansible-vault encrypt_string --stdin-name db_password
    
  • --vault-password-file ~/.vault_pass for CI | no_log: true on tasks that handle secrets
  • Multiple vault IDs for team separation: --vault-id dev@prompt --vault-id prod@/path/to/pass

Error Handling

Tool Use When
failed_when: Define custom failure conditions
changed_when: Control what counts as "changed" (e.g., changed_when: false for read-only commands)
block/rescue/always Structured rollback (try/catch/finally)
ignore_errors: true Last resort — silently swallows errors, hides bugs
  • ignore_errors does NOT catch: syntax errors, undefined variables, connection failures
  • Check mode: excellent for declarative modules, limited for command/shell (partial support with creates/removes)

Debugging

# Validate without running
ansible-playbook site.yml --syntax-check
ansible-playbook site.yml --list-hosts --list-tasks --list-tags

# Dry run with diffs
ansible-playbook site.yml --check --diff --limit web1

# Inspect host state
ansible host -m setup                                       # all facts
ansible host -m debug -a "var=hostvars[inventory_hostname]"  # all variables
ansible host -m debug -a "var=app_port"                      # specific variable

# Verbosity: -v (results) -vv (inputs) -vvv (SSH) -vvvv (SSH protocol)

# Resume after failure
ansible-playbook site.yml --start-at-task="Deploy app"

Performance

Setting Effect
forks = 20 Parallel hosts (default 5 is too low)
pipelining = True 2-3x faster (fewer SSH round trips)
gathering = smart Skip facts if cached
fact_caching = jsonfile Cache facts to disk
ControlPersist = 60s Reuse SSH connections
interpreter_python = auto_silent Suppress discovery warnings

Roles and Collections

Role structure:

roles/myapp/
├── defaults/main.yml     # Overridable knobs (low precedence)
├── vars/main.yml         # Constants (high precedence)
├── tasks/main.yml        # The work
├── handlers/main.yml     # Restart/reload actions
├── templates/            # Jinja2 (.j2) files
├── files/                # Static files
└── meta/main.yml         # Dependencies, metadata

  • include_role = dynamic (runtime) | import_role = static (parse time) — affects tag/condition propagation
  • Collections = modern packaging (roles + modules + plugins) | Install: ansible-galaxy collection install amazon.aws

Modern Ecosystem

Tool Purpose
ansible-core Engine + CLI + builtins
ansible Engine + curated community collections
Collections Modern packaging model (FQCNs)
Execution Environments Containerized ansible runtime (reproducible)
ansible-navigator Modern CLI/TUI for EE workflows
ansible-builder Build EE container images
ansible-lint Static analysis and quality checks
Molecule Role testing framework (idempotence check)

Footguns

Footgun Consequence
Run against all without --limit Half-tested changes hit every server
Global become: true All files owned by root, app can't read own config
Restart services inline (not via handlers) Restarts even when nothing changed
ignore_errors: true Silently swallows real failures for months
Templates without validate Broken config deployed, service crashes on restart
Secrets on CLI or in plaintext vars Shell history, CI logs capture credentials
Same var in defaults/ AND vars/ Precedence silently stomps user overrides
Not running --check --diff before prod Discover template typos by breaking production
Handler name doesn't match notify: Silent no-op — handler never fires, no error

30-second answer

"Ansible is agentless, declarative automation for configuration management and orchestration. Its power is idempotent modules, inventory-driven targeting, and reusable roles. The things that actually matter in production are variable precedence discipline, blast-radius control with serial and --limit, validate before replacing configs, and keeping shell out of your YAML. For safe rollouts I use serial with max_fail_percentage as a circuit breaker, --check --diff before every prod run, and block/rescue for rollback logic."