Anti-Primer: Tar And Compression¶
Everything that can go wrong, will — and in this story, it does.
The Setup¶
A sysadmin is performing a critical Tar And Compression task on a production Linux server at 11 PM. The server hosts a customer-facing application with a strict SLA. The task was supposed to be routine but conditions on the server are not what the runbook assumed.
The Timeline¶
Hour 0: Running as Root Unnecessarily¶
SSHs in as root directly instead of using sudo for specific commands. The deadline was looming, and this seemed like the fastest path forward. But the result is a typo in a command path causes unintended damage to system files that a regular user could not have touched.
Footgun #1: Running as Root Unnecessarily — sSHs in as root directly instead of using sudo for specific commands, leading to a typo in a command path causes unintended damage to system files that a regular user could not have touched.
Nobody notices yet. The engineer moves on to the next task.
Hour 1: No Backup Before Change¶
Modifies a configuration file in-place without creating a backup copy first. Under time pressure, the team chose speed over caution. But the result is the new configuration is wrong; the original content is lost; manual reconstruction required.
Footgun #2: No Backup Before Change — modifies a configuration file in-place without creating a backup copy first, leading to the new configuration is wrong; the original content is lost; manual reconstruction required.
The first mistake is still invisible, making the next shortcut feel justified.
Hour 2: Ignoring Disk Space¶
Does not check available disk space before a large operation. Nobody pushed back because the shortcut looked harmless in the moment. But the result is operation fills the filesystem to 100%; logs stop writing; the application crashes.
Footgun #3: Ignoring Disk Space — does not check available disk space before a large operation, leading to operation fills the filesystem to 100%; logs stop writing; the application crashes.
Pressure is mounting. The team is behind schedule and cutting more corners.
Hour 3: Wrong Target in Destructive Command¶
Runs a destructive command (rm, dd, mkfs) on the wrong device or path due to a typo. The team had gotten away with similar shortcuts before, so nobody raised a flag. But the result is production data is destroyed; recovery requires restoring from backup (if one exists).
Footgun #4: Wrong Target in Destructive Command — runs a destructive command (rm, dd, mkfs) on the wrong device or path due to a typo, leading to production data is destroyed; recovery requires restoring from backup (if one exists).
By hour 3, the compounding failures have reached critical mass. Pages fire. The war room fills up. The team scrambles to understand what went wrong while the system burns.
The Postmortem¶
Root Cause Chain¶
| # | Mistake | Consequence | Could Have Been Prevented By |
|---|---|---|---|
| 1 | Running as Root Unnecessarily | A typo in a command path causes unintended damage to system files that a regular user could not have touched | Primer: Use sudo for specific commands; disable direct root SSH login |
| 2 | No Backup Before Change | The new configuration is wrong; the original content is lost; manual reconstruction required | Primer: Always cp file file.bak before editing; use version control for config files |
| 3 | Ignoring Disk Space | Operation fills the filesystem to 100%; logs stop writing; the application crashes | Primer: Check df -h before any operation that writes significant data |
| 4 | Wrong Target in Destructive Command | Production data is destroyed; recovery requires restoring from backup (if one exists) | Primer: Double-check targets for destructive commands; use --dry-run where available |
Damage Report¶
- Downtime: 1-3 hours of server or service unavailability
- Data loss: Risk of filesystem corruption or configuration loss
- Customer impact: Service errors if the affected server hosts customer-facing workloads
- Engineering time to remediate: 6-12 engineer-hours for diagnosis, repair, and verification
- Reputation cost: Ops team confidence shaken; runbook updates required
What the Primer Teaches¶
- Footgun #1: If the engineer had read the primer, section on running as root unnecessarily, they would have learned: Use sudo for specific commands; disable direct root SSH login.
- Footgun #2: If the engineer had read the primer, section on no backup before change, they would have learned: Always
cp file file.bakbefore editing; use version control for config files. - Footgun #3: If the engineer had read the primer, section on ignoring disk space, they would have learned: Check
df -hbefore any operation that writes significant data. - Footgun #4: If the engineer had read the primer, section on wrong target in destructive command, they would have learned: Double-check targets for destructive commands; use --dry-run where available.
Cross-References¶
- Primer — The right way
- Footguns — The mistakes catalogued
- Street Ops — How to do it in practice