Interview Gauntlet: Terraform Plan Shows 47 Resources to Destroy/Recreate¶
Category: Debugging Difficulty: L2-L3 Duration: 15-20 minutes Domains: Terraform, State Management
Round 1: The Opening¶
Interviewer: "You run terraform plan and it shows 47 resources will be destroyed and recreated. You haven't changed any Terraform code. What's going on?"
Strong Answer:¶
"A plan showing mass destruction without code changes is almost always a state issue, not a code issue. The most likely causes: the state file is pointing to different resources than what exists in the cloud, a provider version upgrade changed the resource schema, or someone changed resources out-of-band (in the console) and Terraform wants to correct the drift. My first step: read the plan carefully. terraform plan shows why each resource is being replaced — there's a # forces replacement annotation next to the attribute that triggers recreation. I'd look at the first few resources and see what attribute is forcing the replacement. If it's the same attribute across many resources (like a name_prefix or a tags map), it's likely a systematic issue. I'd also check: terraform state list to see if the state file has the expected resources, terraform state show <resource> to compare state attributes against reality, and terraform version to check if the Terraform core or provider was recently upgraded."
Common Weak Answers:¶
- "Someone must have changed the code." — The premise says no code changes. The answer should investigate other causes.
- "Just run
terraform applyand let it fix itself." — Applying a plan that destroys 47 resources without understanding why is extremely dangerous. This could cause production outages. - "Roll back to the previous state file." — You don't typically "roll back" state files in Terraform. The state should reflect reality, and manipulating it manually requires care.
Round 2: The Probe¶
Interviewer: "You look at the plan output and see that every resource shows ~ tags changed, and several EC2 instances show # forces replacement because the ami attribute changed from an AMI ID to null. But you didn't change the AMI. What's happening?"
What the interviewer is testing: Understanding of how Terraform evaluates data sources, provider behavior changes, and the difference between computed and configured attributes.
Strong Answer:¶
"Two separate issues. The tags change is likely benign — it could be a provider upgrade that changed how tags are serialized, or AWS added default tags (like aws_default_tags in the provider configuration) that Terraform now sees as drift. The ami going from an actual ID to null is the dangerous one. This typically happens when the AMI is referenced via a data source — like data.aws_ami.latest — and the data source query now returns no results. If the AMI was fetched with a filter like name = 'ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*' and the AMI was deregistered or the filter no longer matches anything, the data source returns empty and the ami attribute evaluates to null, which Terraform interprets as 'replace the instance.' I'd check: terraform console and then evaluate the data source directly to see what it returns. Also, terraform refresh followed by terraform plan to see if the state needs updating. If it's a data source returning null, the fix is to either update the filter to match current AMIs, pin to a specific AMI ID, or check if the AMI was deregistered from the account."
Trap Alert:¶
If the candidate bluffs here: The interviewer will ask "How do you check what a data source returns without applying?" The answer is
terraform console— it lets you interactively evaluate expressions including data sources. Alternatively,terraform plan -target=data.aws_ami.latestto plan just the data source. Candidates who say "just run terraform apply to see" are suggesting destructive action as a debugging strategy.
Round 3: The Constraint¶
Interviewer: "It turns out the state file is partially corrupted. A previous terraform apply was interrupted mid-run (someone hit Ctrl+C during an EC2 instance creation), and now the state has a resource entry with an empty ID. Terraform thinks the resource doesn't exist and wants to create it, but the resource actually exists in AWS. How do you fix this without any data loss?"
Strong Answer:¶
"State corruption from an interrupted apply is a known hazard. The empty-ID resource exists in AWS but Terraform doesn't know about it. The fix is to import the existing resource into the state. First, I'd identify the resource in AWS — look at the plan output to see which resource Terraform wants to create, find the actual resource in the AWS console or via CLI (aws ec2 describe-instances --filters 'Name=tag:Name,Values=the-expected-name'), and get its ID. Then: terraform import aws_instance.my_server i-0abc123def456. This writes the real resource into the state file, associating it with the Terraform resource address. After import, terraform plan should show no changes for that resource (or minor tag drift). For the other corrupted entries, I'd repeat the process. Before any of this, I'd make a backup of the state file: terraform state pull > state-backup.json. If the state is in S3 with versioning (which it should be), I'd also note the current version ID so I can recover if needed. For the 47 resources, I'd script the import if many are affected: extract the resource addresses from the plan, find the corresponding AWS resource IDs, and run terraform import for each."
The Senior Signal:¶
What separates a senior answer: Backing up the state before any manipulation. State is the single most critical Terraform artifact and losing it means either manually reimporting every resource or, worse, Terraform losing track of infrastructure. Also: knowing that
terraform state pull > backup.jsonworks for both local and remote backends, and that S3 versioning provides additional safety. Candidates who jump straight toterraform importwithout backing up are taking unnecessary risk.
Round 4: The Curveball¶
Interviewer: "You're importing resources and it's working, but you have a terraform module that creates 15 related resources (VPC, subnets, route tables, security groups, etc.). The module is instantiated 3 times for 3 environments. The state file is completely missing one module instance (45 resources). The resources exist in AWS. How do you import all 45 resources into the correct module path?"
Strong Answer:¶
"Importing 45 resources into a module requires careful address mapping. Each resource needs to be imported with the full module path: terraform import 'module.network[\"production\"].aws_vpc.main' vpc-0abc123. The square bracket syntax is needed for module instances using for_each. I'd approach this systematically: first, get the list of resource addresses from the module by running terraform plan and looking at the resources it wants to create — these are the addresses I need to import into. Second, for each resource type, find the corresponding AWS resource IDs. For a VPC module, the dependency order matters conceptually but not for imports: I can import them in any order since import just updates state. But I'd start with the VPC, then subnets, then route tables, since that makes it easier to verify. For 45 resources, I'd write a script. Something like a shell script with a mapping file:
while IFS=',' read -r tf_addr aws_id; do
terraform import "$tf_addr" "$aws_id"
done < import-map.csv
Where import-map.csv has lines like module.network[\"production\"].aws_vpc.main,vpc-0abc123. After all imports, I'd run terraform plan and verify it shows no changes (or only expected drift). Any remaining differences would be attributes that were configured differently than what Terraform expects, which I'd resolve in the code."
Trap Question Variant:¶
The right answer is "This is tedious but safe if done carefully." Candidates who say "I'd delete the module and let Terraform recreate it" are suggesting destroying production infrastructure. Candidates who say "I'd edit the state file JSON directly" are playing with fire — manual state file editing is error-prone and should be a last resort.
terraform importis the right tool even though it's slow for 45 resources. Some candidates might mentionterraform state mvor the newerimportblock in Terraform 1.5+ (which allows import in configuration), both of which are valid alternative approaches.
Round 5: The Synthesis¶
Interviewer: "This whole incident was caused by an interrupted terraform apply. How do you prevent state corruption and make Terraform operations safe for a team of 10 engineers?"
Strong Answer:¶
"State protection is the most important Terraform operational concern. First, remote state with locking: use an S3 backend with DynamoDB locking (or Terraform Cloud). This prevents two engineers from running terraform apply simultaneously, which causes state corruption. The lock is acquired at the start of the operation and released at the end. Second, state file versioning: S3 bucket versioning must be enabled so you can recover any previous state version. Third, require plan-before-apply workflow: no one runs terraform apply directly. Instead: terraform plan -out=plan.tfplan, review the plan, then terraform apply plan.tfplan. This ensures the applied changes match what was reviewed. Fourth, CI/CD-driven applies: in a mature setup, engineers never run terraform apply locally. They commit code, CI runs the plan, a human reviews and approves, and CI applies. This is the Atlantis or Terraform Cloud workflow. Fifth, state backup automation: even with S3 versioning, I'd have a nightly snapshot of all state files to a separate account for disaster recovery. Sixth, for the interrupted-apply scenario specifically: Terraform has gotten better about this — it writes state after each resource operation, not just at the end. But for truly critical infrastructure, I'd recommend running applies from a reliable CI/CD runner (not a laptop on flaky WiFi) and having a documented recovery procedure for interrupted applies."
What This Sequence Tested:¶
| Round | Skill Tested |
|---|---|
| 1 | Terraform plan interpretation and state-vs-code reasoning |
| 2 | Data source evaluation and provider behavior debugging |
| 3 | State recovery via terraform import |
| 4 | Module-aware state manipulation at scale |
| 5 | Terraform operational safety and team workflow design |