Quiz: Fleet Operations¶

4 questions

L0 (1 questions)¶

1. What is the difference between treating servers as 'pets' vs 'cattle'?

Show answer

Pets are unique, hand-configured servers that are repaired when sick (e.g., db-master-01). Cattle are identical, automated servers that are replaced when sick (e.g., web-042). Fleet operations requires the cattle mindset — every server in a role should be interchangeable.

L1 (1 questions)¶

1. Why should fleet changes use a rolling strategy with a canary batch instead of deploying to all servers at once?

Show answer

Deploying to all servers at once means a bad change takes down the entire fleet. A rolling strategy (e.g., 1 canary -> 1% -> 10% -> remaining) limits blast radius. If the canary fails, you stop and only 1 server is affected. You validate between batches using health checks, error rates, and latency metrics.

L2 (1 questions)¶

1. How would you detect configuration drift across a fleet of 500 servers?

Show answer

Run parallel commands to compare actual state: 'ansible webservers -f 50 -m command -a "rpm -q nginx"' piped through sort | uniq -c shows version distribution. For config files: collect checksums with 'ansible -m stat -a "path=/etc/nginx/nginx.conf"' and compare. Any host with a different version or checksum has drifted. Automate this as a scheduled check with alerting on unexpected differences.

L3 (1 questions)¶

1. Design the Ansible playbook strategy for a rolling update of 1,500 servers with automatic abort if too many fail.

Show answer

Use serial: [1, '5%', '25%'] to start with a canary of 1, then 5% batches, then 25% batches. Set max_fail_percentage: 2 to abort if more than 2% of any batch fails. Add pre_tasks to drain from load balancer, post_tasks to validate health and re-add, with retries and delays. The combination of graduated batch sizes and fail percentage provides both canary testing and automatic abort.