Ansible Footguns¶

Mistakes that brick servers, corrupt configs, or make your playbooks unreliable.

1. Running against `all` when you meant one host¶

You type ansible-playbook site.yml without -l staging. It runs against every host in inventory — including production. Your half-tested change is now on 200 servers.

Fix: Always use --limit or --check first. Add hosts: "{{ target }}" in playbooks and require the variable: ansible-playbook site.yml -e target=staging.

2. Using `shell` or `command` for everything¶

You write shell: apt-get install nginx instead of apt: name=nginx state=present. It runs every time, isn't idempotent, and won't report "changed" correctly. Your playbook takes 20 minutes because it reinstalls packages every run.

Fix: Use native modules. They're idempotent by design. Only use shell/command when there's no module for what you need, and add creates: or when: to make it idempotent.

3. `become: true` at the playbook level¶

You set become: true globally because one task needs root. Now every task runs as root, including file copies that should be owned by the app user. Files end up owned by root and your app can't read its own config.

Fix: Set become: true on individual tasks that need it, not the whole playbook.

4. Variable precedence surprises¶

You define app_port: 8080 in defaults/main.yml, group_vars/all.yml, and -e app_port=9090. Which one wins? Extra vars win. But the developer who set it in defaults doesn't know someone also set it in group_vars, and they're debugging why their change doesn't take effect.

Fix: Know the precedence order. Keep variables in one place per scope. Document where variables are set. Use ansible -m debug -a "var=app_port" hostname to check resolved values.

5. `notify` + `handlers` not running¶

You change a config file and expect the handler to restart the service. But you also have a failed_when on a later task that triggers. Handlers don't run if the play fails. Your config changed but the service is still running the old config.

Fix: Understand that handlers run at the end of the play, not after the task. Use meta: flush_handlers if you need them to run immediately. Don't rely on handlers for critical state changes.

6. `lineinfile` fighting with itself¶

You use lineinfile to manage /etc/hosts. Two tasks add different lines matching the same regex. They fight each other — one removes what the other adds. Every run shows changed.

Fix: Use blockinfile for multi-line content. Use template for files you fully manage. lineinfile is for surgical one-line changes, not managing whole files.

7. No `--check` before `--diff`¶

You run the playbook on production without --check first. A template had a typo that renders a broken config. The service restarts with the broken config and goes down.

Fix: Always run --check --diff first. Review the diffs. Then run for real.

8. Vault password in shell history¶

You type ansible-vault encrypt_string 'mypassword' and the password is in your bash history. Or you store the vault password in a plain text file on your laptop.

Fix: Use --ask-vault-pass or a vault password file with restrictive permissions (chmod 600). Use a password manager integration.

9. Forgetting `no_log: true` on sensitive tasks¶

Your playbook prints the database password in stdout when it runs. CI logs capture it. Anyone with CI access can read your production credentials.

Fix: Add no_log: true to tasks that handle secrets. Review CI output for leaked credentials.

10. Testing against production inventory by default¶

Your default inventory file points to production. New team members run ansible-playbook site.yml and hit prod. Your "dev" playbook run just modified production servers.

Fix: Don't set a default inventory that points to production. Require explicit -i inventory/staging.yml. Use different inventory files per environment with clear names.

11. Ignoring errors globally¶

You add ignore_errors: true because a task fails intermittently. Now every error in that task is silently swallowed. Six months later the disk is full because the cleanup task has been failing silently.

Fix: Use failed_when with specific conditions instead of blanket ignore_errors. Use register + when patterns to handle expected failures gracefully.

Ansible Footguns¶

1. Running against all when you meant one host¶

2. Using shell or command for everything¶

3. become: true at the playbook level¶

4. Variable precedence surprises¶

5. notify + handlers not running¶

6. lineinfile fighting with itself¶

7. No --check before --diff¶