Interview Cheatsheet Ansible
- lesson ---# Ansible Interview Cheatsheet
Model: Control node → SSH/WinRM → managed nodes | Inventory = who | Playbook = what | Module = idempotent unit | Role = reusable package
- ansible-core = engine + builtins | ansible = engine + community collections | Always use FQCNs: ansible.builtin.package
- Agentless: targets need only Python + SSH | Push-based: you initiate from control node
- Ansible configures servers (packages, files, services). Terraform builds infrastructure. Helm deploys K8s apps.
Golden Rules¶
- Modules over shell — modules know state, shell doesn't |
ansible.builtin.packagenotshell: apt install - Desired state, not imperative scripts | Chase idempotence: 2nd run should be mostly
ok, notchanged templatefor files you own entirely |lineinfilefor surgical edits |blockinfilefor managed blocksvalidatebefore replacing critical configs:validate: nginx -t -c %scatches broken templates before writing- Use handlers for restart/reload (fire only on change, only once) — not inline service restarts
- Test second-run idempotence — if anything shows "changed" on re-run, you have a bug
Inventory¶
- Static YAML or dynamic plugins (AWS EC2, VMware, K8s, constructed) |
group_vars/host_vars/for env-specific data ansible-inventory --graphto visualize |ansible-inventory --listfor full variable dump- Group by function AND environment | Keep names stable even if IPs change
Key inventory variables: ansible_host ansible_user ansible_port ansible_connection ansible_python_interpreter ansible_become
Variables — Precedence Matters¶
(lowest) role defaults → inventory vars → group_vars → host_vars → play vars
→ role vars (HIGHER than play vars!) → task vars → extra vars -e (highest, always wins)
defaults/main.yml= knobs users should turn (low precedence) |vars/main.yml= constants (high precedence)- Never put the same variable in both — you'll debug precedence at 2 AM
- Debug:
ansible host -m debug -a "var=app_port"| Modern access:ansible_facts['distribution'] - "When behavior is weird, it's usually precedence, inventory targeting, or an idempotence bug"
Safe Rollout¶
- hosts: web
serial: [1, "10%", "25%"] # canary → widen → full
max_fail_percentage: 10 # circuit breaker — stop if >10% fail
--check --diffbefore every prod run (non-negotiable) |--limit web1= single-host testdelegate_to: localhostfor API/LB calls — changes WHERE task runs, not WHOSE variables it seesblock/rescue/always= try/catch/finally for structured rollback
Minimal safe example:
- hosts: web
serial: 25%
tasks:
- name: Render nginx config safely
ansible.builtin.template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
validate: "nginx -t -c %s"
notify: reload nginx
handlers:
- name: reload nginx
ansible.builtin.service:
name: nginx
state: reloaded
Handlers¶
- Run at end of play, not after the notifying task
- Only fire if task reports "changed" (not "ok")
meta: flush_handlersforces immediate execution- Handler name mismatch with
notify:= silent no-op (no error, just doesn't fire) - If play fails before handler phase → handler never runs → config changed but service still on old version
Vault¶
- AES-256 encryption for secrets safe to commit to git
- vault/vars split pattern:
vault.yml(encrypted values) +vars.yml(plaintext references likedb_pass: "{{ vault_db_pass }}") - Keep secrets out of shell history:
--vault-password-file ~/.vault_passfor CI |no_log: trueon tasks that handle secrets- Multiple vault IDs for team separation:
--vault-id dev@prompt --vault-id prod@/path/to/pass
Error Handling¶
| Tool | Use When |
|---|---|
failed_when: |
Define custom failure conditions |
changed_when: |
Control what counts as "changed" (e.g., changed_when: false for read-only commands) |
block/rescue/always |
Structured rollback (try/catch/finally) |
ignore_errors: true |
Last resort — silently swallows errors, hides bugs |
ignore_errorsdoes NOT catch: syntax errors, undefined variables, connection failures- Check mode: excellent for declarative modules, limited for
command/shell(partial support withcreates/removes)
Debugging¶
# Validate without running
ansible-playbook site.yml --syntax-check
ansible-playbook site.yml --list-hosts --list-tasks --list-tags
# Dry run with diffs
ansible-playbook site.yml --check --diff --limit web1
# Inspect host state
ansible host -m setup # all facts
ansible host -m debug -a "var=hostvars[inventory_hostname]" # all variables
ansible host -m debug -a "var=app_port" # specific variable
# Verbosity: -v (results) -vv (inputs) -vvv (SSH) -vvvv (SSH protocol)
# Resume after failure
ansible-playbook site.yml --start-at-task="Deploy app"
Performance¶
| Setting | Effect |
|---|---|
forks = 20 |
Parallel hosts (default 5 is too low) |
pipelining = True |
2-3x faster (fewer SSH round trips) |
gathering = smart |
Skip facts if cached |
fact_caching = jsonfile |
Cache facts to disk |
ControlPersist = 60s |
Reuse SSH connections |
interpreter_python = auto_silent |
Suppress discovery warnings |
Roles and Collections¶
Role structure:
roles/myapp/
├── defaults/main.yml # Overridable knobs (low precedence)
├── vars/main.yml # Constants (high precedence)
├── tasks/main.yml # The work
├── handlers/main.yml # Restart/reload actions
├── templates/ # Jinja2 (.j2) files
├── files/ # Static files
└── meta/main.yml # Dependencies, metadata
include_role= dynamic (runtime) |import_role= static (parse time) — affects tag/condition propagation- Collections = modern packaging (roles + modules + plugins) | Install:
ansible-galaxy collection install amazon.aws
Modern Ecosystem¶
| Tool | Purpose |
|---|---|
ansible-core |
Engine + CLI + builtins |
ansible |
Engine + curated community collections |
| Collections | Modern packaging model (FQCNs) |
| Execution Environments | Containerized ansible runtime (reproducible) |
ansible-navigator |
Modern CLI/TUI for EE workflows |
ansible-builder |
Build EE container images |
ansible-lint |
Static analysis and quality checks |
| Molecule | Role testing framework (idempotence check) |
Footguns¶
| Footgun | Consequence |
|---|---|
Run against all without --limit |
Half-tested changes hit every server |
Global become: true |
All files owned by root, app can't read own config |
| Restart services inline (not via handlers) | Restarts even when nothing changed |
ignore_errors: true |
Silently swallows real failures for months |
Templates without validate |
Broken config deployed, service crashes on restart |
| Secrets on CLI or in plaintext vars | Shell history, CI logs capture credentials |
Same var in defaults/ AND vars/ |
Precedence silently stomps user overrides |
Not running --check --diff before prod |
Discover template typos by breaking production |
Handler name doesn't match notify: |
Silent no-op — handler never fires, no error |
30-second answer¶
"Ansible is agentless, declarative automation for configuration management and orchestration. Its power is idempotent modules, inventory-driven targeting, and reusable roles. The things that actually matter in production are variable precedence discipline, blast-radius control with serial and --limit, validate before replacing configs, and keeping shell out of your YAML. For safe rollouts I use serial with max_fail_percentage as a circuit breaker, --check --diff before every prod run, and block/rescue for rollback logic."