Terraform Modules: Building Infrastructure LEGOs
- lesson
- terraform-modules
- code-reuse
- versioning
- composition
- testing
- governance
- vpc-design
- l2 ---# Terraform Modules — Building Infrastructure LEGOs
Topics: Terraform modules, code reuse, versioning, composition, testing, governance, VPC design Level: L2 (Operations) Time: 60–75 minutes Prerequisites: None required; basic Terraform familiarity helpful
The Mission¶
Your team's Terraform codebase has a problem. Three different engineers wrote three different VPC configurations for dev, staging, and production. They started as copies. Over six months, they drifted:
# environments/dev/vpc.tf — CIDR 10.0.0.0/16, 2 AZs, no NAT gateway
# environments/staging/vpc.tf — CIDR 10.1.0.0/16, 2 AZs, single NAT gateway
# environments/prod/vpc.tf — CIDR 10.2.0.0/16, 3 AZs, NAT per AZ, flow logs enabled
Dev is missing flow logs that compliance requires. Staging has a subnet CIDR overlap with prod because someone fat-fingered it. Prod has a security group rule that was hotfixed during an incident and never backported. Nobody knows which version is "right."
Your job: refactor these three snowflakes into a single VPC module that all environments share. Along the way, you'll learn why modules exist, how to build them, how to version them, and how to avoid the mistakes that make module upgrades terrifying.
Why Modules (The 3-Minute Case)¶
You could keep three separate VPC configs. Here's what happens:
| Without modules | With modules |
|---|---|
| Bug fix? Patch 3 files (and remember all 3) | Bug fix? Patch 1 module, all envs get it |
| New requirement? Add to 3 files (differently) | New requirement? Add once, parameterize |
| Security audit? Review 3 implementations | Security audit? Review 1 module |
| New engineer? "Which VPC file is canonical?" | New engineer? "Read the module" |
| Drift between envs? Guaranteed | Drift between envs? Impossible by design |
Mental Model: A Terraform module is a function. It takes inputs (variables), does work (creates resources), and returns outputs. Just like you wouldn't copy-paste a function body into three places in your code, you shouldn't copy-paste infrastructure definitions.
The three reasons modules exist, in order of importance:
- Consistency — every VPC looks the same because they come from the same code
- DRY — fix a bug once, not N times
- Governance — your platform team publishes the approved VPC module; app teams consume it
That third one is underappreciated. Modules aren't just about saving typing. They're how organizations enforce standards without writing policy documents nobody reads.
Module Anatomy: What's in the Box¶
A module is a directory of .tf files. That's it. No special syntax, no magic. Every
Terraform configuration you've ever written is already a module — the "root module."
Here's the standard layout:
modules/vpc/
├── main.tf # Resources — the actual infrastructure
├── variables.tf # Inputs — what the caller passes in
├── outputs.tf # Outputs — what the caller gets back
├── versions.tf # Provider and Terraform version constraints
├── locals.tf # Internal computed values
└── README.md # How to use this module
Each file has a job:
| File | Purpose | Analogy |
|---|---|---|
variables.tf |
Function parameters | def create_vpc(cidr, azs, environment): |
main.tf |
Function body | The actual resource creation logic |
outputs.tf |
Return values | return {"vpc_id": vpc.id, "subnet_ids": [...]} |
versions.tf |
Compatibility contract | "Works with Terraform >= 1.5 and AWS provider ~> 5.0" |
locals.tf |
Internal scratch space | Local variables you don't expose |
Trivia: The Terraform Registry enforces this layout. To publish a module, you need the standard structure, a GitHub repo with semantic version tags, and the naming convention
terraform-<PROVIDER>-<NAME>(e.g.,terraform-aws-vpc). The most downloaded module on the registry —terraform-aws-modules/vpc/aws— has been downloaded over 50 million times.
Building the VPC Module (Hands On)¶
Let's build the module that replaces those three snowflake VPCs. We'll start simple and add complexity as the requirements demand it.
Step 1: Variables — The Module's API¶
# modules/vpc/variables.tf
variable "vpc_name" {
description = "Name prefix for all resources"
type = string
validation {
condition = length(var.vpc_name) > 0 && length(var.vpc_name) <= 32
error_message = "VPC name must be 1-32 characters."
}
}
variable "vpc_cidr" {
description = "CIDR block for the VPC (e.g., 10.0.0.0/16)"
type = string
validation {
condition = can(cidrnetmask(var.vpc_cidr))
error_message = "Must be a valid IPv4 CIDR block."
}
}
variable "availability_zones" {
description = "List of AZs to deploy into"
type = list(string)
validation {
condition = length(var.availability_zones) >= 2
error_message = "At least 2 AZs required for high availability."
}
}
variable "environment" {
description = "Environment name (dev, staging, prod)"
type = string
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be dev, staging, or prod."
}
}
variable "enable_nat_gateway" {
type = bool
default = false
}
variable "single_nat_gateway" {
type = bool
default = true
}
variable "enable_flow_logs" {
type = bool
default = true # Secure default — teams must explicitly opt out
}
variable "common_tags" {
type = map(string)
default = {}
}
The validation blocks catch mistakes at terraform plan time. The CIDR validation uses
can(cidrnetmask(...)) — a Terraform built-in that returns false if the CIDR is malformed.
Gotcha: Validation blocks can only reference their own variable. For cross-variable validation ("if NAT is enabled, you need at least 2 AZs"), use
preconditionblocks on resources (Terraform 1.2+).
Step 2: Resources — The Module's Body¶
# modules/vpc/main.tf
locals {
nat_gateway_count = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : length(var.availability_zones)) : 0
tags = merge(var.common_tags, { Environment = var.environment, ManagedBy = "terraform" })
}
resource "aws_vpc" "this" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = merge(local.tags, { Name = "${var.vpc_name}-vpc" })
}
resource "aws_subnet" "public" {
for_each = toset(var.availability_zones)
vpc_id = aws_vpc.this.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, index(var.availability_zones, each.value))
availability_zone = each.value
map_public_ip_on_launch = true
tags = merge(local.tags, { Name = "${var.vpc_name}-public-${each.value}", Tier = "public" })
}
resource "aws_subnet" "private" {
for_each = toset(var.availability_zones)
vpc_id = aws_vpc.this.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, index(var.availability_zones, each.value) + 100)
availability_zone = each.value
tags = merge(local.tags, { Name = "${var.vpc_name}-private-${each.value}", Tier = "private" })
}
resource "aws_internet_gateway" "this" {
vpc_id = aws_vpc.this.id
tags = merge(local.tags, { Name = "${var.vpc_name}-igw" })
}
Two things to notice:
-
for_eachinstead ofcount. Adding an AZ creates one subnet. Removing an AZ from the middle of acountlist would shift indexes and destroy/recreate everything after it. -
cidrsubnetfor automatic CIDR math.cidrsubnet("10.0.0.0/16", 8, 0)gives10.0.0.0/24, index 100 gives10.0.100.0/24. No more fat-fingered CIDRs.
Under the Hood:
cidrsubnet(prefix, newbits, netnum)addsnewbitsto the prefix length (/16 + 8 = /24) and selects thenetnum-th network of that size. Binary math on IP addresses, guaranteed correct.
Step 3: Outputs — The Module's Return Values¶
# modules/vpc/outputs.tf
output "vpc_id" {
description = "ID of the VPC"
value = aws_vpc.this.id
}
output "vpc_cidr" {
description = "CIDR block of the VPC"
value = aws_vpc.this.cidr_block
}
output "public_subnet_ids" {
description = "IDs of public subnets, keyed by AZ"
value = { for az, subnet in aws_subnet.public : az => subnet.id }
}
output "private_subnet_ids" {
description = "IDs of private subnets, keyed by AZ"
value = { for az, subnet in aws_subnet.private : az => subnet.id }
}
output "nat_gateway_ids" {
description = "IDs of NAT gateways (empty if NAT disabled)"
value = [for nat in aws_nat_gateway.this : nat.id]
}
Outputs are your module's API contract. Downstream callers depend on these names and types. Change an output name and you break every caller. This is why output stability matters — and why semantic versioning matters for modules.
Calling the Module: Three Environments, One Source¶
Now the payoff. Each environment is a thin wrapper:
# environments/dev/main.tf # environments/prod/main.tf
module "vpc" { module "vpc" {
source = "../../modules/vpc" source = "../../modules/vpc"
vpc_name = "dev" vpc_name = "prod"
vpc_cidr = "10.0.0.0/16" vpc_cidr = "10.2.0.0/16"
availability_zones = ["us-east-1a", availability_zones = ["us-east-1a",
"us-east-1b"] "us-east-1b",
environment = "dev" "us-east-1c"]
enable_nat_gateway = false environment = "prod"
} enable_nat_gateway = true
single_nat_gateway = false
}
Same module, different inputs. Dev skips the NAT gateway (~$32/month savings). Prod gets one
per AZ. Both get flow logs because the module defaults to true.
Remember: Module defaults are your governance lever. Set
enable_flow_logs = trueby default, and teams must explicitly opt out. The PR review catches the opt-out — compare that to a policy doc nobody reads.
Flashcard Check #1¶
Cover the right column. Test yourself.
| Question | Answer |
|---|---|
| What three files does every module need at minimum? | main.tf (resources), variables.tf (inputs), outputs.tf (return values) |
Why for_each instead of count for subnets? |
count uses numeric indexes — removing an item shifts all subsequent indexes, causing destroy/recreate. for_each uses stable keys. |
What does can(cidrnetmask(var.vpc_cidr)) do in a validation block? |
Returns true if the string is a valid CIDR block, false otherwise. Catches malformed CIDRs at plan time. |
| How do you access a module's output? | module.<NAME>.<OUTPUT> — e.g., module.vpc.vpc_id |
| What's the Terraform Registry naming convention? | terraform-<PROVIDER>-<NAME> — e.g., terraform-aws-vpc |
Local vs. Remote Modules: Where Does the Code Live?¶
So far we used a local path (source = "../../modules/vpc"). That works for a single repo.
But when multiple repos need the same module, or when you need versioning, local paths break
down.
# Local path — no versioning, tied to repo structure
module "vpc" {
source = "../../modules/vpc"
}
# Terraform Registry — versioned, discoverable, documented
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.5.0"
}
# GitHub — versioned via Git tags
module "vpc" {
source = "git::https://github.com/mycompany/terraform-modules.git//vpc?ref=v2.1.0"
}
# S3 — for air-gapped or private environments
module "vpc" {
source = "s3::https://s3-us-east-1.amazonaws.com/mycompany-modules/vpc/v2.1.0.zip"
}
| Source | Versioning | Best for |
|---|---|---|
| Local path | None (whatever's on disk) | Rapid iteration within one repo |
| Terraform Registry | Semantic version constraints | Public modules, shared across orgs |
| Git (GitHub/GitLab) | Tag or SHA ref | Private modules, org-wide sharing |
| S3/GCS | Directory per version | Air-gapped environments, artifact-based workflows |
Gotcha: Every time you change the
sourceof a module, you must runterraform init(orterraform init -upgrade). Terraform caches modules in.terraform/modules/and won't notice a source change without re-initialization.
The Versioning Problem (Or: Why "Latest" Is a Four-Letter Word)¶
Here's a story that happens at every organization using Terraform at scale.
War Story: A platform team published v2.0.0 of their VPC module. It renamed an output from
private_subnetstoprivate_subnet_idsfor consistency. Reasonable change, clearly a major version bump. But 12 application teams hadsourcepointed at the module without version pinning. Monday morning, 50 CI pipelines broke simultaneously. Engineers across 4 time zones filed tickets against the platform team. The fix took 10 minutes per team — just pin the version — but coordinating it took two days. The postmortem action item: "All module references MUST include a version constraint."
The rules:
# BAD — pulls latest on every terraform init
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
}
# BAD — pins to a Git branch that can change under you
module "vpc" {
source = "git::https://github.com/mycompany/terraform-modules.git//vpc?ref=main"
}
# GOOD — exact version pin
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.5.0"
}
# GOOD — allows patch updates but not minor/major
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.5.0" # >= 5.5.0, < 5.6.0
}
# GOOD — immutable Git SHA
module "vpc" {
source = "git::https://github.com/mycompany/terraform-modules.git//vpc?ref=abc123def"
}
Name Origin: The
~>operator is borrowed from Ruby's Bundler, where it's called the "twiddle-wakka" or "pessimistic version constraint."~> 5.5.0means ">= 5.5.0, < 5.6.0" (patches only).~> 5.0means ">= 5.0, < 6.0" (minor updates too).
Module Composition: Modules Calling Modules¶
Real infrastructure is modules wired together. Outputs from one become inputs to the next:
# environments/prod/main.tf — root module composes everything
module "network" {
source = "../../modules/vpc"
vpc_name = "prod"
vpc_cidr = "10.2.0.0/16"
availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
environment = "prod"
enable_nat_gateway = true
single_nat_gateway = false
}
module "eks" {
source = "../../modules/eks"
cluster_name = "prod-cluster"
vpc_id = module.network.vpc_id # network → EKS
subnet_ids = values(module.network.private_subnet_ids)
}
module "rds" {
source = "../../modules/rds"
identifier = "prod-db"
vpc_id = module.network.vpc_id # network → RDS
subnet_ids = values(module.network.private_subnet_ids)
}
Terraform builds the dependency graph from these references: network first, then EKS + RDS
in parallel. No depends_on needed.
Mental Model: Module composition is LEGO. The VPC module is the baseplate. EKS and RDS snap onto it. Each piece has defined connection points (outputs/inputs). Swap the RDS module for Aurora without rebuilding the baseplate — as long as it provides the same outputs.
The Two-Level Rule¶
Keep nesting to two levels max. Deeper nesting makes state paths unreadable:
module.network.aws_vpc.this # Good — readable
module.platform.module.network.aws_vpc.this # Pain starts here
module.prod.module.platform.module.network.aws_vpc.this # Nobody can debug this
for_each and count at the Module Level¶
Since Terraform 0.13, you can use for_each on module blocks — create per-region
infrastructure from a single definition:
module "vpc" {
source = "../../modules/vpc"
for_each = {
"us-east-1" = { cidr = "10.0.0.0/16", azs = ["us-east-1a", "us-east-1b", "us-east-1c"] }
"eu-west-1" = { cidr = "10.1.0.0/16", azs = ["eu-west-1a", "eu-west-1b"] }
}
vpc_name = "prod-${each.key}"
vpc_cidr = each.value.cidr
availability_zones = each.value.azs
environment = "prod"
enable_nat_gateway = true
}
# Access: module.vpc["us-east-1"].vpc_id
Adding ap-southeast-1 later only creates new resources — existing regions are untouched.
Gotcha:
for_eachkeys must be known at plan time. If keys come from a data source, you get:The "for_each" map includes keys derived from resource attributes that cannot be determined until apply.Fix: use static keys, not dynamic ones.
Testing Modules: Trust, But Verify¶
The HCL-Native Testing Framework (Terraform 1.6+)¶
Terraform has a built-in test framework. Test files use .tftest.hcl extension:
# tests/vpc.tftest.hcl
variables {
vpc_name = "test"
vpc_cidr = "10.99.0.0/16"
availability_zones = ["us-east-1a", "us-east-1b"]
environment = "dev"
enable_nat_gateway = false
enable_flow_logs = false
}
# Plan-only test — fast, free, no real resources
run "validates_inputs" {
command = plan
assert {
condition = aws_vpc.this.cidr_block == "10.99.0.0/16"
error_message = "VPC CIDR doesn't match input."
}
assert {
condition = length(aws_subnet.public) == 2
error_message = "Expected 2 public subnets."
}
}
# Test input validation catches bad CIDRs
run "rejects_invalid_cidr" {
command = plan
expect_failures = [var.vpc_cidr]
variables {
vpc_cidr = "not-a-cidr"
}
}
terraform test # Run all tests
terraform test -filter=tests/vpc.tftest.hcl # Specific file
terraform test -verbose # See each assertion
| Test type | command = plan |
command = apply |
|---|---|---|
| Speed | Seconds | Minutes |
| Cost | Free | Creates real cloud resources |
| Catches | Config errors, validation, logic | API errors, permission issues, provider bugs |
| Use when | Fast feedback during development | CI pipeline before publishing a module version |
Under the Hood:
terraform testcreates an isolated state per test run. Apply tests create real infrastructure, run assertions, then destroy everything at the end. If a test crashes mid-run, resources are orphaned. Test environments need aggressive cost alerts.
Before the native framework, Terratest (a Go library) was the standard. It's still useful for cross-module integration tests and validating things the Terraform provider doesn't expose — like making HTTP calls to deployed services.
Flashcard Check #2¶
| Question | Answer |
|---|---|
What does version = "~> 5.5.0" mean? |
>= 5.5.0 and < 5.6.0 (patch updates only) |
Why should you never point a module source at a Git branch like main? |
Branches change — your next terraform init could pull breaking changes without warning |
What command do you run after changing a module's source? |
terraform init (or terraform init -upgrade) |
| What's the max recommended nesting depth for modules? | Two levels (root → child → grandchild) |
terraform test with command = plan vs command = apply — which costs money? |
apply creates real cloud resources; plan is free |
What does expect_failures do in a test block? |
Asserts that the specified variable or resource validation should fail — used to test that input validation catches bad inputs |
Anti-Patterns: Modules Gone Wrong¶
The God Module¶
# DON'T: One module that creates everything
module "platform" {
source = "../../modules/platform"
# 47 input variables covering VPC, EKS, RDS, ElastiCache,
# S3, CloudFront, Route53, ACM, WAF, and monitoring
vpc_cidr = "10.0.0.0/16"
cluster_name = "prod"
db_instance_class = "db.r6g.xlarge"
cache_node_type = "cache.r6g.large"
# ... 43 more variables ...
}
A god module has the blast radius of a monolith. Change anything, risk everything. It's also impossible to test — you can't test the VPC logic without also provisioning an EKS cluster and a database.
Fix: One module per concern. A VPC module, an EKS module, an RDS module. Compose them in the root module.
Hidden Provider Configuration¶
Modules should never contain provider blocks — it hardcodes the region/account. The caller
passes providers in via the providers argument.
Circular Dependencies¶
Module A outputs a security group. Module B uses it and outputs a subnet. Module A needs that subnet. Terraform can't resolve cycles. Fix: extract shared resources into a third module, or restructure so dependencies flow one direction.
Overly Generic Inputs¶
map(any) hides required structure. Use typed objects instead — they're self-documenting and
catch type errors at plan time:
# DON'T # DO
variable "config" { variable "config" {
type = map(any) type = object({
} cidr = string
azs = list(string)
})
}
Module Governance: Scaling Beyond One Team¶
At scale, you need guardrails: private registries (Terraform Cloud, Artifactory, or S3-backed) for hosting approved modules, and policy-as-code (Sentinel or OPA) to enforce rules on plans before they apply:
# OPA: require encryption on all S3 buckets
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_s3_bucket"
resource.change.actions[_] == "create"
not resource.change.after.server_side_encryption_configuration
msg := sprintf("S3 bucket %s must have encryption enabled", [resource.name])
}
The pipeline: Engineer writes Terraform → CI runs terraform plan → Plan JSON evaluated
against policies → Violations block the apply.
Interview Bridge: "How would you enforce infrastructure standards across 50 teams?" Modules encode how to build things right. Sentinel/OPA enforce that teams must use them.
War Story: The Module Upgrade That Broke 50 Environments¶
War Story: An infrastructure team maintained a shared RDS module used by 50 service teams. Version 3.2.0 added a parameter group with
log_min_duration_statement = 1000(log queries over 1 second). Sensible default. But the parameter group name was derived from the database identifier using a new naming scheme. When teams upgraded from 3.1.x to 3.2.0, Terraform detected the parameter group name change and planned a replacement — which on RDS means a database reboot. Fifty databases, fifty reboots. The teams that ranterraform plancaught it. The three teams that hadauto-approvein CI did not. Three production databases rebooted during business hours. The fix: the module team released 3.2.1 within hours, usinglifecycle { create_before_destroy = true }on the parameter group and preserving the old naming scheme with a deprecation notice. The postmortem action items: (1) Never change resource naming schemes in a minor version. (2) All module upgrades requireterraform planreview in a PR before apply. (3)auto-approvein production CI is banned.
This story illustrates why module versioning isn't academic. A naming change in a module can cascade into infrastructure destruction across dozens of teams. Semantic versioning is a contract: patch versions fix bugs, minor versions add features without breaking existing behavior, major versions may break things.
Exercises¶
Exercise 1: Spot the Anti-Pattern (2 minutes)¶
What's wrong with this module call?
module "vpc" {
source = "git::https://github.com/company/tf-modules.git//vpc?ref=main"
config = {
cidr = "10.0.0.0/16"
azs = ["us-east-1a"]
}
}
Solution
Three problems: 1. **`ref=main`** — pointing at a branch, not a version tag. Any push to `main` changes what you get on `terraform init`. 2. **Single AZ** — no high availability. The module should validate that at least 2 AZs are provided. 3. **`config = {}`** — untyped map input. Should be individual typed variables for clarity and validation. Fixed:Exercise 2: Cross-Variable Validation (5 minutes)¶
Write a precondition that allows any instance_type in production but restricts
non-production to t3.micro, t3.small, and t3.medium.
Solution
Exercise 3: Refactor Copy-Paste into a Module (15 minutes)¶
You have two nearly identical security group definitions — one in dev/ and one in prod/.
They differ only in allowed CIDR ranges and the VPC ID. Create a module at
modules/web-sg/ that takes vpc_id, allowed_cidrs, and environment as inputs, and
outputs the security group ID.
Solution
# modules/web-sg/variables.tf
variable "vpc_id" { type = string }
variable "allowed_cidrs" { type = list(string) }
variable "environment" { type = string }
# modules/web-sg/main.tf
resource "aws_security_group" "web" {
name_prefix = "${var.environment}-web-"
vpc_id = var.vpc_id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = var.allowed_cidrs
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = { Name = "${var.environment}-web-sg", ManagedBy = "terraform" }
}
# modules/web-sg/outputs.tf
output "security_group_id" { value = aws_security_group.web.id }
Cheat Sheet¶
Pin this to your wall.
| Task | Command / Syntax |
|---|---|
| Initialize modules | terraform init |
| Update module version | Change version, run terraform init -upgrade |
| List modules in state | terraform state list \| grep module |
| Move resource into module | terraform state mv aws_vpc.main module.network.aws_vpc.this |
| Module output reference | module.<NAME>.<OUTPUT> |
| Version pin (exact) | version = "5.5.0" |
| Version pin (patch range) | version = "~> 5.5.0" (>= 5.5.0, < 5.6.0) |
| Version pin (minor range) | version = "~> 5.0" (>= 5.0, < 6.0) |
| Run module tests | terraform test |
| Run specific test | terraform test -filter=tests/vpc.tftest.hcl |
| Validate config | terraform validate |
| Format module code | terraform fmt -recursive |
Module design rules of thumb:
| Rule | Why |
|---|---|
| One module per concern | Blast radius, testability |
No provider blocks inside modules |
Caller controls region/account |
Typed variables, not map(any) |
Self-documenting, validates at plan time |
| Minimal outputs | Fewer outputs = smaller API surface = fewer breaking changes |
| Default to secure | enable_encryption = true, enable_flow_logs = true |
| Max 2 levels of nesting | Deeper nesting = unreadable state paths |
| Pin versions in production | Unpinned modules are ticking time bombs |
Takeaways¶
-
Modules are functions for infrastructure. Inputs, logic, outputs. If you're copy-pasting
.tffiles between directories, you need a module. -
Version pinning is not optional. An unpinned module source is a production incident waiting for someone to push a breaking change upstream.
-
for_eachovercount, always. Index-based addressing (count) causes cascade destruction when you remove items. Key-based addressing (for_each) is surgical. -
Module defaults are governance. Set secure defaults (
encryption = true,flow_logs = true) and make teams explicitly opt out. The PR review becomes the policy enforcement. -
Test modules before publishing. The native
terraform testframework catches bugs at plan time for free. Apply tests catch the rest. -
Keep modules small. A module that creates a VPC is good. A module that creates a VPC, EKS cluster, RDS database, and monitoring stack is a liability.
Related Lessons¶
- The Terraform State Disaster — what happens when state goes wrong and how to recover
- Terraform vs Ansible vs Helm — when to use which tool for infrastructure and configuration
- GitOps: The Repo Is the Truth — how module versioning fits into a GitOps workflow
- The Cloud Bill Surprise — cost implications of module design decisions (NAT gateways, instance types)