Quiz: Ansible¶

28 questions

L1 (8 questions)¶

1. What does idempotent mean in Ansible and why does it matter?

Show answer

Idempotent = running the same playbook multiple times produces the same result. Ansible modules check current state before acting (e.g., package module won't reinstall if already present). Non-idempotent tasks (shell/command) should use creates/removes guards.

2. What is the difference between Ansible inventory and playbook?

Show answer

Inventory defines WHAT to manage (hosts, groups, variables). Playbook defines WHAT TO DO (tasks, roles, handlers). Inventory can be static (INI/YAML file) or dynamic (script/plugin that queries cloud APIs). Keep them separate for reusability.

3. What is the difference between shell and command modules in Ansible?

Show answer

command runs a binary directly (no shell features). shell runs through /bin/sh (supports pipes, redirects, env vars). Prefer command for security (no shell injection). Use shell only when you need shell features. Both are non-idempotent — add creates/removes guards.

4. What are Ansible roles and why use them?

Show answer

Roles are reusable units of automation with a standard directory structure (tasks, handlers, defaults, vars, templates, files). They organize complex playbooks into composable parts. Use roles for repeatable components (nginx setup, monitoring agent install). Share via Ansible Galaxy.

5. What is the difference between vars, defaults, and extra vars in Ansible?

Show answer

defaults (roles/x/defaults/): lowest precedence, easily overridden. vars (roles/x/vars/, playbook vars): higher precedence. extra vars (-e): highest precedence, override everything. Precedence order (highest to lowest): extra vars > task vars > playbook vars > role vars > role defaults > inventory vars.

6. What is Ansible's connection model and how does it differ from agent-based tools?

Show answer

Ansible is agentless — connects via SSH (Linux) or WinRM (Windows), pushes tasks, and disconnects. No agent to install, update, or monitor. Pros: simple, no persistent process. Cons: slower than agents for large fleets, requires SSH access. Compare with Puppet/Chef which use persistent agents.

7. What is an Ansible dynamic inventory and when do you use it?

Show answer

Dynamic inventory queries external sources (AWS EC2, Azure, GCP, CMDB) at runtime to build the host list. Use when your infrastructure changes frequently (cloud, auto-scaling). Plugins: aws_ec2, azure_rm, gcp_compute. Avoids maintaining a static file that drifts from reality.

8. What is Ansible's check mode and how is it useful?

Show answer

check mode (--check) is a dry run — shows what would change without making changes. Modules report 'changed' or 'ok' status. Useful for verifying intent before running in production. Not all modules support check mode (shell/command don't). Combine with --diff to see file content changes.

L2 (15 questions)¶

1. How do you test Ansible playbooks before running them in production?

Show answer

1. --check (dry run, shows what would change).
2. --diff (shows file content changes).
3. Molecule for local testing with Docker/Vagrant.
4. Run against a staging inventory first.
5. Use assert module to verify expected state after tasks.

2. How do you handle secrets in Ansible?

Show answer

Use ansible-vault to encrypt sensitive files or variables. ansible-vault encrypt secrets.yml, then reference with --ask-vault-pass or --vault-password-file at runtime. For dynamic secrets, integrate with HashiCorp Vault via the hashi_vault lookup plugin.

3. How do you debug a failing Ansible playbook?

Show answer

1. -v / -vvv / -vvvv for increasing verbosity.
2. --start-at-task to skip to the failing task.
3. --step to execute one task at a time.
4. debug module to print variables.
5. Register task results and display with debug.
6. Check the failed host's state and compare with task expected state.

4. How do you handle failures gracefully in Ansible?

Show answer

1. ignore_errors: yes to continue on failure (use sparingly).
2. failed_when to customize failure conditions.
3. block/rescue/always for try-catch-finally patterns.
4. any_errors_fatal to stop all hosts on first failure.
5. max_fail_percentage to tolerate partial failures in rolling updates.

5. How do you optimize Ansible playbook performance for large inventories?

Show answer

1. Use SSH pipelining (pipelining = True).
2. Increase forks (default 5).
3. Use async tasks for long operations.
4. Gather only needed facts (gather_subset).
5. Use strategy: free (don't wait for slowest host).
6. Cache facts (fact_caching).
7. Use mitogen plugin for faster task execution.

6. How do you implement rolling updates with Ansible?

Show answer

Use serial keyword to limit how many hosts are updated at once (serial: 2 or serial: '25%'). Combined with pre/post tasks (health checks, LB drain). Add max_fail_percentage to stop if too many hosts fail. Pattern: drain from LB, update, health check, add back to LB.

7. What is the difference between include_tasks and import_tasks, and when does the distinction bite you?

Show answer

import_tasks is static — parsed at playbook load time. Tags, when conditions, and --list-tasks all work as expected. include_tasks is dynamic — evaluated at runtime. Tags applied to include_tasks only affect the include statement itself, NOT the tasks inside it (use apply keyword to propagate). It bites you when:
1. You tag an include_tasks and expect inner tasks to inherit the tag.
2. You use --list-tasks and dynamic includes are invisible.
3. You use notify on a handler inside an include — the handler name must be fully resolved at parse time for imports but can be dynamic for includes.

8. How does Ansible fact caching work, and what are the trade-offs of each backend?

Show answer

Fact caching stores gathered facts between runs so subsequent plays skip gather_facts. Backends:
1. jsonfile — simple, writes per-host JSON to a directory. Fast reads, but stale if hosts change.
2. redis — shared across control nodes, supports TTL, good for multi-user environments.
3. memcached — similar to redis but no persistence. Configure in ansible.cfg: fact_caching = redis, fact_caching_timeout =
86400. Trade-offs: caching speeds up large inventories dramatically but risks acting on stale facts (e.g., a disk was replaced but facts still show the old device). Always set a reasonable TTL and use gather_facts: yes on plays that need current data.

9. What is the Ansible callback plugin system and how do you use it operationally?

Show answer

Callback plugins hook into playbook events (task start, task failure, play recap) to customize output or send notifications. Built-in callbacks:
1. default — standard output.
2. json — machine-readable output for CI.
3. profile_tasks — adds per-task timing (essential for performance tuning).
4. timer — total playbook duration.
5. mail — sends email on failure. Enable via ansible.cfg: callback_whitelist = profile_tasks, timer. Operationally: use profile_tasks to identify slow tasks, json callback for pipeline integration, and custom callbacks to push run results to Slack or PagerDuty. Write custom callbacks by subclassing CallbackBase in callback_plugins/ directory.

10. Explain how Ansible connection plugins work and when you would use something other than SSH.

Show answer

Connection plugins define how Ansible talks to managed nodes.
1. ssh (default) — OpenSSH, supports ControlPersist for connection reuse.
2. paramiko — pure Python SSH, fallback for old SSH versions.
3. local — runs on the control node itself (for localhost tasks).
4. docker — connects to Docker containers via docker exec.
5. kubectl — connects to Kubernetes pods.
6. winrm — for Windows hosts.
7. network_cli — for network devices (Cisco, Juniper).
8. httpapi — REST API-based for modern network gear. Set per-host with ansible_connection variable or per-play with connection keyword. Common mistake: using ssh to manage containers when docker connection is faster and does not require sshd in the container.

11. How does Ansible Automation Platform (AAP) differ from running ansible-playbook on a jump box, and when is AAP justified?

Show answer

AAP (formerly Tower/AWX) adds:
1. RBAC — fine-grained access control (who can run what against which inventory).
2. Credential management — stores SSH keys, cloud creds, vault passwords without exposing them to operators.
3. Job scheduling and workflows — chain playbooks with conditional branching.
4. Audit trail — every job run is logged with who triggered it, what changed, and the full output.
5. API-driven — integrate with ticketing systems, ChatOps, CI/CD.
6. Inventory sync — auto-import from cloud providers on a schedule. Justified when: >3 operators share automation, compliance requires audit logging, or you need self-service automation for non-Ansible users. Overkill for: single admin, small fleet, or CI-only automation where the pipeline already provides RBAC and logging.

12. How do you test Ansible roles with Molecule, and what does a typical test matrix look like?

Show answer

Molecule provides create/converge/verify/destroy lifecycle for role testing. Typical setup:
1. molecule init role myrole --driver-name docker.
2. molecule/default/molecule.yml defines platforms (e.g., Ubuntu 22.04, RHEL 9 containers).
3. converge.yml applies the role.
4. verify.yml runs assertions (Ansible assert module or testinfra Python tests). Test matrix: multiple OS families × role parameters (e.g., with/without TLS, different package versions). In CI, run molecule test (full lifecycle) per matrix entry. Common pitfalls:
1. Docker containers lack systemd — use geerlingguy/docker-* images that include it.
2. Molecule caches images — stale base images cause flaky tests.
3. Side effects between scenarios if not using molecule destroy between them.

13. What is ansible-pull and when would you use it instead of the standard push model?

Show answer

ansible-pull inverts the model — each managed node pulls its playbook from a git repo and runs it locally. Use cron or systemd timer to schedule pulls. Advantages:
1. Scales to thousands of nodes without SSH fan-out bottleneck.
2. New nodes self-configure on first boot (cloud-init runs ansible-pull).
3. No control node infrastructure needed. Disadvantages:
1. Every node needs git and ansible installed.
2. No central run visibility (each node logs locally).
3. Harder to orchestrate cross-host tasks (rolling updates, drain/undrain). Best for: large, homogeneous fleets (web servers, build agents) where each node converges independently. Bad for: complex orchestration, heterogeneous environments, or environments that need centralized audit logging.

14. How do you handle multi-tier application deployments with Ansible where order matters across host groups?

Show answer

Use multiple plays in a single playbook, each targeting a different group. Example: Play 1 targets db servers (migrate schema), Play 2 targets app servers with serial (rolling deploy), Play 3 targets LB (update backend pool). Within each play, pre_tasks/roles/post_tasks control order. For complex dependencies:
1. Use delegate_to to run a task on a different host from within a play.
2. Use run_once: true for tasks that should execute on only one host (e.g., DB migration).
3. Use wait_for to block until a service on another tier is ready.
4. Use meta: refresh_inventory if hosts change mid-playbook (e.g., after scaling). Anti-pattern: putting everything in one play with when conditions — this is hard to read and error-prone.

15. What are Ansible inventory plugins and how do they differ from dynamic inventory scripts?

Show answer

Inventory plugins are the modern replacement for dynamic inventory scripts. Key differences:
1. Plugins are configured via YAML files (e.g., aws_ec2.yml) instead of executable scripts with CLI args.
2. Plugins support caching natively (cache: true, cache_plugin: jsonfile).
3. Plugins can be composed — use multiple plugins in a single inventory directory.
4. Plugins integrate with the collection system (amazon.aws.aws_ec2 vs a standalone Python script).
5. Scripts return JSON on stdout; plugins use the Ansible inventory API. Migrating: old scripts still work via the script inventory plugin, but new projects should use native plugins. Common plugins: aws_ec2, azure_rm, gcp_compute, vmware_vm_inventory, kubernetes.core.k8s. Write custom plugins by subclassing BaseInventoryPlugin.

L3 (5 questions)¶

1. How do you write a custom Ansible module, and when is it worth the effort vs. using shell/command?

Show answer

Write a Python script under library/ in your role or playbook directory. The module receives JSON args on stdin (via AnsibleModule helper), performs work, and returns JSON with changed, failed, and msg keys. Use the AnsibleModule class from ansible.module_utils.basic for argument parsing, check_mode support, and idempotent diff reporting. Worth it when:
1. You call the same shell commands in 5+ playbooks — wrap them in a module for idempotency and error handling.
2. You need check_mode and diff support (shell/command cannot provide this).
3. You interact with an API — a module gives clean parameters instead of curl commands. Not worth it for one-off tasks or when an existing module covers 90% of the need.

2. How does Ansible's strategy plugin system work, and when would you use strategy other than linear?

Show answer

The strategy controls how tasks are dispatched across hosts.
1. linear (default) — runs each task on all hosts before moving to the next task. Simple, predictable, but slow (waits for slowest host).
2. free — each host proceeds independently through the task list as fast as it can. Faster overall but harder to debug (output is interleaved).
3. host_pinned — like free but keeps connection to each host open.
4. debug — interactive debugger on failure. Use free for large fleets where hosts have variable task durations (e.g., package installs that are cached on some hosts). Avoid free when task ordering across hosts matters (e.g., rolling DB migrations). You can write custom strategy plugins for complex orchestration like canary deploys with feedback loops.

3. How do you implement Ansible content collections, and what problems do they solve compared to the pre-collections era?

Show answer

Collections bundle modules, plugins, roles, and playbooks into a namespace (e.g., community.general, amazon.aws). They solve:
1. Dependency hell — pin collection versions independently of ansible-core.
2. Release velocity — collections release on their own schedule, not tied to ansible-core releases.
3. Namespace collisions — FQCN (ansible.builtin.copy vs community.general.copy) prevents ambiguity. Create a collection: ansible-galaxy collection init mynamespace.mycollection. Structure: plugins/modules/, plugins/inventory/, roles/, playbooks/. Publish to Galaxy or private Automation Hub. In CI: pin collections in requirements.yml with version constraints. Operationally, always use FQCN in production playbooks to avoid breakage when the default module resolution changes between ansible-core versions.

4. What are Ansible filter plugins and lookup plugins, and how do they differ in execution context?

Show answer

Filter plugins transform data in Jinja2 expressions (e.g., {{ mylist | map("upper") | list }}). They run on the control node during template rendering. Built-in filters include ipaddr, regex_search, combine (merge dicts), to_json, from_yaml. Lookup plugins fetch data from external sources (e.g., {{ lookup("file", "/etc/hostname") }}, {{ lookup("env", "HOME") }}, {{ lookup("hashi_vault", "secret/data/app") }}). Key difference: lookups always run on the CONTROL node (not the managed host), while filters process data already in memory. This means lookup("file") reads files from the control node, not the target. To read files on the target, use the slurp module. Custom filter plugins go in filter_plugins/ directory; custom lookups in lookup_plugins/.

5. How does Ansible's execution environment (EE) work and why was it introduced?

Show answer

Execution Environments are container images that bundle ansible-core, Python dependencies, collections, and system libraries into a reproducible runtime. Built with ansible-builder from an execution-environment.yml that specifies:
1. base image (EE-minimal or EE-supported).
2. Galaxy requirements (collections).
3. Python requirements (pip).
4. System packages (bindep). Introduced because: dependency conflicts were the #1 support issue — different collections needing different Python library versions on the same control node. EEs provide:
1. Reproducibility across teams and CI.
2. No more "works on my laptop" — same container runs everywhere.
3. Clean separation between ansible-core and collection dependencies. ansible-navigator (replacement for ansible-playbook) runs playbooks inside EEs by default. In AAP, every job template references an EE, ensuring consistent execution.