Anti-Primer: Packer¶
Everything that can go wrong, will — and in this story, it does.
The Setup¶
A team is managing infrastructure with Packer for the first time. They are migrating from manual provisioning and need the new setup production-ready by next month. Two engineers are working on different modules simultaneously.
The Timeline¶
Hour 0: No State Management Plan¶
Stores state locally or in version control without locking. The deadline was looming, and this seemed like the fastest path forward. But the result is two engineers apply simultaneously; state is corrupted; resources are orphaned.
Footgun #1: No State Management Plan — stores state locally or in version control without locking, leading to two engineers apply simultaneously; state is corrupted; resources are orphaned.
Nobody notices yet. The engineer moves on to the next task.
Hour 1: Hard-Coded Values Everywhere¶
Hard-codes environment-specific values instead of using variables and environments. Under time pressure, the team chose speed over caution. But the result is deploying to staging accidentally provisions resources with production-sized (and priced) settings.
Footgun #2: Hard-Coded Values Everywhere — hard-codes environment-specific values instead of using variables and environments, leading to deploying to staging accidentally provisions resources with production-sized (and priced) settings.
The first mistake is still invisible, making the next shortcut feel justified.
Hour 2: No Import of Existing Resources¶
Starts writing code for resources that already exist without importing them. Nobody pushed back because the shortcut looked harmless in the moment. But the result is apply creates duplicate resources; now there are two load balancers, two DNS records, and conflicting configs.
Footgun #3: No Import of Existing Resources — starts writing code for resources that already exist without importing them, leading to apply creates duplicate resources; now there are two load balancers, two DNS records, and conflicting configs.
Pressure is mounting. The team is behind schedule and cutting more corners.
Hour 3: Destructive Change Not Detected¶
Changes a parameter that forces resource replacement without realizing it. The team had gotten away with similar shortcuts before, so nobody raised a flag. But the result is apply destroys and recreates a database; all data is lost unless there is a recent backup.
Footgun #4: Destructive Change Not Detected — changes a parameter that forces resource replacement without realizing it, leading to apply destroys and recreates a database; all data is lost unless there is a recent backup.
By hour 3, the compounding failures have reached critical mass. Pages fire. The war room fills up. The team scrambles to understand what went wrong while the system burns.
The Postmortem¶
Root Cause Chain¶
| # | Mistake | Consequence | Could Have Been Prevented By |
|---|---|---|---|
| 1 | No State Management Plan | Two engineers apply simultaneously; state is corrupted; resources are orphaned | Primer: Remote state backend with locking from day one |
| 2 | Hard-Coded Values Everywhere | Deploying to staging accidentally provisions resources with production-sized (and priced) settings | Primer: Parameterize all environment-specific values; use separate variable files per environment |
| 3 | No Import of Existing Resources | Apply creates duplicate resources; now there are two load balancers, two DNS records, and conflicting configs | Primer: Import existing resources into state before writing new code for them |
| 4 | Destructive Change Not Detected | Apply destroys and recreates a database; all data is lost unless there is a recent backup | Primer: Always review the plan output; flag any destroy/replace actions for manual approval |
Damage Report¶
- Downtime: 2-6 hours of infrastructure instability or drift
- Data loss: Risk of data loss if stateful resources are replaced
- Customer impact: Dependent services may experience outages or degraded performance
- Engineering time to remediate: 12-24 engineer-hours for state recovery and drift remediation
- Reputation cost: Infrastructure team credibility damaged; manual intervention required for future changes
What the Primer Teaches¶
- Footgun #1: If the engineer had read the primer, section on no state management plan, they would have learned: Remote state backend with locking from day one.
- Footgun #2: If the engineer had read the primer, section on hard-coded values everywhere, they would have learned: Parameterize all environment-specific values; use separate variable files per environment.
- Footgun #3: If the engineer had read the primer, section on no import of existing resources, they would have learned: Import existing resources into state before writing new code for them.
- Footgun #4: If the engineer had read the primer, section on destructive change not detected, they would have learned: Always review the plan output; flag any destroy/replace actions for manual approval.
Cross-References¶
- Primer — The right way
- Footguns — The mistakes catalogued
- Street Ops — How to do it in practice