Skip to content

Terraform Modules: Building Infrastructure LEGOs

  • lesson
  • terraform-modules
  • code-reuse
  • versioning
  • composition
  • testing
  • governance
  • vpc-design
  • l2 ---# Terraform Modules — Building Infrastructure LEGOs

Topics: Terraform modules, code reuse, versioning, composition, testing, governance, VPC design Level: L2 (Operations) Time: 60–75 minutes Prerequisites: None required; basic Terraform familiarity helpful


The Mission

Your team's Terraform codebase has a problem. Three different engineers wrote three different VPC configurations for dev, staging, and production. They started as copies. Over six months, they drifted:

# environments/dev/vpc.tf        — CIDR 10.0.0.0/16, 2 AZs, no NAT gateway
# environments/staging/vpc.tf    — CIDR 10.1.0.0/16, 2 AZs, single NAT gateway
# environments/prod/vpc.tf       — CIDR 10.2.0.0/16, 3 AZs, NAT per AZ, flow logs enabled

Dev is missing flow logs that compliance requires. Staging has a subnet CIDR overlap with prod because someone fat-fingered it. Prod has a security group rule that was hotfixed during an incident and never backported. Nobody knows which version is "right."

Your job: refactor these three snowflakes into a single VPC module that all environments share. Along the way, you'll learn why modules exist, how to build them, how to version them, and how to avoid the mistakes that make module upgrades terrifying.


Why Modules (The 3-Minute Case)

You could keep three separate VPC configs. Here's what happens:

Without modules With modules
Bug fix? Patch 3 files (and remember all 3) Bug fix? Patch 1 module, all envs get it
New requirement? Add to 3 files (differently) New requirement? Add once, parameterize
Security audit? Review 3 implementations Security audit? Review 1 module
New engineer? "Which VPC file is canonical?" New engineer? "Read the module"
Drift between envs? Guaranteed Drift between envs? Impossible by design

Mental Model: A Terraform module is a function. It takes inputs (variables), does work (creates resources), and returns outputs. Just like you wouldn't copy-paste a function body into three places in your code, you shouldn't copy-paste infrastructure definitions.

The three reasons modules exist, in order of importance:

  1. Consistency — every VPC looks the same because they come from the same code
  2. DRY — fix a bug once, not N times
  3. Governance — your platform team publishes the approved VPC module; app teams consume it

That third one is underappreciated. Modules aren't just about saving typing. They're how organizations enforce standards without writing policy documents nobody reads.


Module Anatomy: What's in the Box

A module is a directory of .tf files. That's it. No special syntax, no magic. Every Terraform configuration you've ever written is already a module — the "root module."

Here's the standard layout:

modules/vpc/
├── main.tf          # Resources — the actual infrastructure
├── variables.tf     # Inputs — what the caller passes in
├── outputs.tf       # Outputs — what the caller gets back
├── versions.tf      # Provider and Terraform version constraints
├── locals.tf        # Internal computed values
└── README.md        # How to use this module

Each file has a job:

File Purpose Analogy
variables.tf Function parameters def create_vpc(cidr, azs, environment):
main.tf Function body The actual resource creation logic
outputs.tf Return values return {"vpc_id": vpc.id, "subnet_ids": [...]}
versions.tf Compatibility contract "Works with Terraform >= 1.5 and AWS provider ~> 5.0"
locals.tf Internal scratch space Local variables you don't expose

Trivia: The Terraform Registry enforces this layout. To publish a module, you need the standard structure, a GitHub repo with semantic version tags, and the naming convention terraform-<PROVIDER>-<NAME> (e.g., terraform-aws-vpc). The most downloaded module on the registry — terraform-aws-modules/vpc/aws — has been downloaded over 50 million times.


Building the VPC Module (Hands On)

Let's build the module that replaces those three snowflake VPCs. We'll start simple and add complexity as the requirements demand it.

Step 1: Variables — The Module's API

# modules/vpc/variables.tf

variable "vpc_name" {
  description = "Name prefix for all resources"
  type        = string
  validation {
    condition     = length(var.vpc_name) > 0 && length(var.vpc_name) <= 32
    error_message = "VPC name must be 1-32 characters."
  }
}

variable "vpc_cidr" {
  description = "CIDR block for the VPC (e.g., 10.0.0.0/16)"
  type        = string
  validation {
    condition     = can(cidrnetmask(var.vpc_cidr))
    error_message = "Must be a valid IPv4 CIDR block."
  }
}

variable "availability_zones" {
  description = "List of AZs to deploy into"
  type        = list(string)
  validation {
    condition     = length(var.availability_zones) >= 2
    error_message = "At least 2 AZs required for high availability."
  }
}

variable "environment" {
  description = "Environment name (dev, staging, prod)"
  type        = string
  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Environment must be dev, staging, or prod."
  }
}

variable "enable_nat_gateway" {
  type    = bool
  default = false
}

variable "single_nat_gateway" {
  type    = bool
  default = true
}

variable "enable_flow_logs" {
  type    = bool
  default = true     # Secure default — teams must explicitly opt out
}

variable "common_tags" {
  type    = map(string)
  default = {}
}

The validation blocks catch mistakes at terraform plan time. The CIDR validation uses can(cidrnetmask(...)) — a Terraform built-in that returns false if the CIDR is malformed.

Gotcha: Validation blocks can only reference their own variable. For cross-variable validation ("if NAT is enabled, you need at least 2 AZs"), use precondition blocks on resources (Terraform 1.2+).

Step 2: Resources — The Module's Body

# modules/vpc/main.tf

locals {
  nat_gateway_count = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : length(var.availability_zones)) : 0
  tags = merge(var.common_tags, { Environment = var.environment, ManagedBy = "terraform" })
}

resource "aws_vpc" "this" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true
  tags = merge(local.tags, { Name = "${var.vpc_name}-vpc" })
}

resource "aws_subnet" "public" {
  for_each = toset(var.availability_zones)

  vpc_id                  = aws_vpc.this.id
  cidr_block              = cidrsubnet(var.vpc_cidr, 8, index(var.availability_zones, each.value))
  availability_zone       = each.value
  map_public_ip_on_launch = true
  tags = merge(local.tags, { Name = "${var.vpc_name}-public-${each.value}", Tier = "public" })
}

resource "aws_subnet" "private" {
  for_each = toset(var.availability_zones)

  vpc_id            = aws_vpc.this.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 8, index(var.availability_zones, each.value) + 100)
  availability_zone = each.value
  tags = merge(local.tags, { Name = "${var.vpc_name}-private-${each.value}", Tier = "private" })
}

resource "aws_internet_gateway" "this" {
  vpc_id = aws_vpc.this.id
  tags = merge(local.tags, { Name = "${var.vpc_name}-igw" })
}

Two things to notice:

  1. for_each instead of count. Adding an AZ creates one subnet. Removing an AZ from the middle of a count list would shift indexes and destroy/recreate everything after it.

  2. cidrsubnet for automatic CIDR math. cidrsubnet("10.0.0.0/16", 8, 0) gives 10.0.0.0/24, index 100 gives 10.0.100.0/24. No more fat-fingered CIDRs.

Under the Hood: cidrsubnet(prefix, newbits, netnum) adds newbits to the prefix length (/16 + 8 = /24) and selects the netnum-th network of that size. Binary math on IP addresses, guaranteed correct.

Step 3: Outputs — The Module's Return Values

# modules/vpc/outputs.tf

output "vpc_id" {
  description = "ID of the VPC"
  value       = aws_vpc.this.id
}

output "vpc_cidr" {
  description = "CIDR block of the VPC"
  value       = aws_vpc.this.cidr_block
}

output "public_subnet_ids" {
  description = "IDs of public subnets, keyed by AZ"
  value       = { for az, subnet in aws_subnet.public : az => subnet.id }
}

output "private_subnet_ids" {
  description = "IDs of private subnets, keyed by AZ"
  value       = { for az, subnet in aws_subnet.private : az => subnet.id }
}

output "nat_gateway_ids" {
  description = "IDs of NAT gateways (empty if NAT disabled)"
  value       = [for nat in aws_nat_gateway.this : nat.id]
}

Outputs are your module's API contract. Downstream callers depend on these names and types. Change an output name and you break every caller. This is why output stability matters — and why semantic versioning matters for modules.


Calling the Module: Three Environments, One Source

Now the payoff. Each environment is a thin wrapper:

# environments/dev/main.tf                  # environments/prod/main.tf
module "vpc" {                               module "vpc" {
  source = "../../modules/vpc"                 source = "../../modules/vpc"

  vpc_name           = "dev"                   vpc_name           = "prod"
  vpc_cidr           = "10.0.0.0/16"           vpc_cidr           = "10.2.0.0/16"
  availability_zones = ["us-east-1a",          availability_zones = ["us-east-1a",
                        "us-east-1b"]                               "us-east-1b",
  environment        = "dev"                                        "us-east-1c"]
  enable_nat_gateway = false                   environment        = "prod"
}                                              enable_nat_gateway = true
                                               single_nat_gateway = false
                                             }

Same module, different inputs. Dev skips the NAT gateway (~$32/month savings). Prod gets one per AZ. Both get flow logs because the module defaults to true.

Remember: Module defaults are your governance lever. Set enable_flow_logs = true by default, and teams must explicitly opt out. The PR review catches the opt-out — compare that to a policy doc nobody reads.


Flashcard Check #1

Cover the right column. Test yourself.

Question Answer
What three files does every module need at minimum? main.tf (resources), variables.tf (inputs), outputs.tf (return values)
Why for_each instead of count for subnets? count uses numeric indexes — removing an item shifts all subsequent indexes, causing destroy/recreate. for_each uses stable keys.
What does can(cidrnetmask(var.vpc_cidr)) do in a validation block? Returns true if the string is a valid CIDR block, false otherwise. Catches malformed CIDRs at plan time.
How do you access a module's output? module.<NAME>.<OUTPUT> — e.g., module.vpc.vpc_id
What's the Terraform Registry naming convention? terraform-<PROVIDER>-<NAME> — e.g., terraform-aws-vpc

Local vs. Remote Modules: Where Does the Code Live?

So far we used a local path (source = "../../modules/vpc"). That works for a single repo. But when multiple repos need the same module, or when you need versioning, local paths break down.

# Local path — no versioning, tied to repo structure
module "vpc" {
  source = "../../modules/vpc"
}

# Terraform Registry — versioned, discoverable, documented
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.5.0"
}

# GitHub — versioned via Git tags
module "vpc" {
  source = "git::https://github.com/mycompany/terraform-modules.git//vpc?ref=v2.1.0"
}

# S3 — for air-gapped or private environments
module "vpc" {
  source = "s3::https://s3-us-east-1.amazonaws.com/mycompany-modules/vpc/v2.1.0.zip"
}
Source Versioning Best for
Local path None (whatever's on disk) Rapid iteration within one repo
Terraform Registry Semantic version constraints Public modules, shared across orgs
Git (GitHub/GitLab) Tag or SHA ref Private modules, org-wide sharing
S3/GCS Directory per version Air-gapped environments, artifact-based workflows

Gotcha: Every time you change the source of a module, you must run terraform init (or terraform init -upgrade). Terraform caches modules in .terraform/modules/ and won't notice a source change without re-initialization.


The Versioning Problem (Or: Why "Latest" Is a Four-Letter Word)

Here's a story that happens at every organization using Terraform at scale.

War Story: A platform team published v2.0.0 of their VPC module. It renamed an output from private_subnets to private_subnet_ids for consistency. Reasonable change, clearly a major version bump. But 12 application teams had source pointed at the module without version pinning. Monday morning, 50 CI pipelines broke simultaneously. Engineers across 4 time zones filed tickets against the platform team. The fix took 10 minutes per team — just pin the version — but coordinating it took two days. The postmortem action item: "All module references MUST include a version constraint."

The rules:

# BAD — pulls latest on every terraform init
module "vpc" {
  source = "terraform-aws-modules/vpc/aws"
}

# BAD — pins to a Git branch that can change under you
module "vpc" {
  source = "git::https://github.com/mycompany/terraform-modules.git//vpc?ref=main"
}

# GOOD — exact version pin
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.5.0"
}

# GOOD — allows patch updates but not minor/major
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.5.0"    # >= 5.5.0, < 5.6.0
}

# GOOD — immutable Git SHA
module "vpc" {
  source = "git::https://github.com/mycompany/terraform-modules.git//vpc?ref=abc123def"
}

Name Origin: The ~> operator is borrowed from Ruby's Bundler, where it's called the "twiddle-wakka" or "pessimistic version constraint." ~> 5.5.0 means ">= 5.5.0, < 5.6.0" (patches only). ~> 5.0 means ">= 5.0, < 6.0" (minor updates too).


Module Composition: Modules Calling Modules

Real infrastructure is modules wired together. Outputs from one become inputs to the next:

# environments/prod/main.tf — root module composes everything

module "network" {
  source             = "../../modules/vpc"
  vpc_name           = "prod"
  vpc_cidr           = "10.2.0.0/16"
  availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
  environment        = "prod"
  enable_nat_gateway = true
  single_nat_gateway = false
}

module "eks" {
  source       = "../../modules/eks"
  cluster_name = "prod-cluster"
  vpc_id       = module.network.vpc_id                  # network → EKS
  subnet_ids   = values(module.network.private_subnet_ids)
}

module "rds" {
  source     = "../../modules/rds"
  identifier = "prod-db"
  vpc_id     = module.network.vpc_id                    # network → RDS
  subnet_ids = values(module.network.private_subnet_ids)
}

Terraform builds the dependency graph from these references: network first, then EKS + RDS in parallel. No depends_on needed.

Mental Model: Module composition is LEGO. The VPC module is the baseplate. EKS and RDS snap onto it. Each piece has defined connection points (outputs/inputs). Swap the RDS module for Aurora without rebuilding the baseplate — as long as it provides the same outputs.

The Two-Level Rule

Keep nesting to two levels max. Deeper nesting makes state paths unreadable:

module.network.aws_vpc.this                              # Good — readable
module.platform.module.network.aws_vpc.this              # Pain starts here
module.prod.module.platform.module.network.aws_vpc.this  # Nobody can debug this

for_each and count at the Module Level

Since Terraform 0.13, you can use for_each on module blocks — create per-region infrastructure from a single definition:

module "vpc" {
  source   = "../../modules/vpc"
  for_each = {
    "us-east-1" = { cidr = "10.0.0.0/16", azs = ["us-east-1a", "us-east-1b", "us-east-1c"] }
    "eu-west-1" = { cidr = "10.1.0.0/16", azs = ["eu-west-1a", "eu-west-1b"] }
  }

  vpc_name           = "prod-${each.key}"
  vpc_cidr           = each.value.cidr
  availability_zones = each.value.azs
  environment        = "prod"
  enable_nat_gateway = true
}

# Access: module.vpc["us-east-1"].vpc_id

Adding ap-southeast-1 later only creates new resources — existing regions are untouched.

Gotcha: for_each keys must be known at plan time. If keys come from a data source, you get: The "for_each" map includes keys derived from resource attributes that cannot be determined until apply. Fix: use static keys, not dynamic ones.


Testing Modules: Trust, But Verify

The HCL-Native Testing Framework (Terraform 1.6+)

Terraform has a built-in test framework. Test files use .tftest.hcl extension:

# tests/vpc.tftest.hcl

variables {
  vpc_name           = "test"
  vpc_cidr           = "10.99.0.0/16"
  availability_zones = ["us-east-1a", "us-east-1b"]
  environment        = "dev"
  enable_nat_gateway = false
  enable_flow_logs   = false
}

# Plan-only test — fast, free, no real resources
run "validates_inputs" {
  command = plan

  assert {
    condition     = aws_vpc.this.cidr_block == "10.99.0.0/16"
    error_message = "VPC CIDR doesn't match input."
  }

  assert {
    condition     = length(aws_subnet.public) == 2
    error_message = "Expected 2 public subnets."
  }
}

# Test input validation catches bad CIDRs
run "rejects_invalid_cidr" {
  command = plan
  expect_failures = [var.vpc_cidr]

  variables {
    vpc_cidr = "not-a-cidr"
  }
}
terraform test                                      # Run all tests
terraform test -filter=tests/vpc.tftest.hcl         # Specific file
terraform test -verbose                             # See each assertion
Test type command = plan command = apply
Speed Seconds Minutes
Cost Free Creates real cloud resources
Catches Config errors, validation, logic API errors, permission issues, provider bugs
Use when Fast feedback during development CI pipeline before publishing a module version

Under the Hood: terraform test creates an isolated state per test run. Apply tests create real infrastructure, run assertions, then destroy everything at the end. If a test crashes mid-run, resources are orphaned. Test environments need aggressive cost alerts.

Before the native framework, Terratest (a Go library) was the standard. It's still useful for cross-module integration tests and validating things the Terraform provider doesn't expose — like making HTTP calls to deployed services.


Flashcard Check #2

Question Answer
What does version = "~> 5.5.0" mean? >= 5.5.0 and < 5.6.0 (patch updates only)
Why should you never point a module source at a Git branch like main? Branches change — your next terraform init could pull breaking changes without warning
What command do you run after changing a module's source? terraform init (or terraform init -upgrade)
What's the max recommended nesting depth for modules? Two levels (root → child → grandchild)
terraform test with command = plan vs command = apply — which costs money? apply creates real cloud resources; plan is free
What does expect_failures do in a test block? Asserts that the specified variable or resource validation should fail — used to test that input validation catches bad inputs

Anti-Patterns: Modules Gone Wrong

The God Module

# DON'T: One module that creates everything
module "platform" {
  source = "../../modules/platform"

  # 47 input variables covering VPC, EKS, RDS, ElastiCache,
  # S3, CloudFront, Route53, ACM, WAF, and monitoring
  vpc_cidr            = "10.0.0.0/16"
  cluster_name        = "prod"
  db_instance_class   = "db.r6g.xlarge"
  cache_node_type     = "cache.r6g.large"
  # ... 43 more variables ...
}

A god module has the blast radius of a monolith. Change anything, risk everything. It's also impossible to test — you can't test the VPC logic without also provisioning an EKS cluster and a database.

Fix: One module per concern. A VPC module, an EKS module, an RDS module. Compose them in the root module.

Hidden Provider Configuration

Modules should never contain provider blocks — it hardcodes the region/account. The caller passes providers in via the providers argument.

Circular Dependencies

Module A outputs a security group. Module B uses it and outputs a subnet. Module A needs that subnet. Terraform can't resolve cycles. Fix: extract shared resources into a third module, or restructure so dependencies flow one direction.

Overly Generic Inputs

map(any) hides required structure. Use typed objects instead — they're self-documenting and catch type errors at plan time:

# DON'T                              # DO
variable "config" {                   variable "config" {
  type = map(any)                       type = object({
}                                         cidr = string
                                          azs  = list(string)
                                        })
                                      }

Module Governance: Scaling Beyond One Team

At scale, you need guardrails: private registries (Terraform Cloud, Artifactory, or S3-backed) for hosting approved modules, and policy-as-code (Sentinel or OPA) to enforce rules on plans before they apply:

# OPA: require encryption on all S3 buckets
deny[msg] {
    resource := input.resource_changes[_]
    resource.type == "aws_s3_bucket"
    resource.change.actions[_] == "create"
    not resource.change.after.server_side_encryption_configuration
    msg := sprintf("S3 bucket %s must have encryption enabled", [resource.name])
}

The pipeline: Engineer writes Terraform → CI runs terraform plan → Plan JSON evaluated against policies → Violations block the apply.

Interview Bridge: "How would you enforce infrastructure standards across 50 teams?" Modules encode how to build things right. Sentinel/OPA enforce that teams must use them.


War Story: The Module Upgrade That Broke 50 Environments

War Story: An infrastructure team maintained a shared RDS module used by 50 service teams. Version 3.2.0 added a parameter group with log_min_duration_statement = 1000 (log queries over 1 second). Sensible default. But the parameter group name was derived from the database identifier using a new naming scheme. When teams upgraded from 3.1.x to 3.2.0, Terraform detected the parameter group name change and planned a replacement — which on RDS means a database reboot. Fifty databases, fifty reboots. The teams that ran terraform plan caught it. The three teams that had auto-approve in CI did not. Three production databases rebooted during business hours. The fix: the module team released 3.2.1 within hours, using lifecycle { create_before_destroy = true } on the parameter group and preserving the old naming scheme with a deprecation notice. The postmortem action items: (1) Never change resource naming schemes in a minor version. (2) All module upgrades require terraform plan review in a PR before apply. (3) auto-approve in production CI is banned.

This story illustrates why module versioning isn't academic. A naming change in a module can cascade into infrastructure destruction across dozens of teams. Semantic versioning is a contract: patch versions fix bugs, minor versions add features without breaking existing behavior, major versions may break things.


Exercises

Exercise 1: Spot the Anti-Pattern (2 minutes)

What's wrong with this module call?

module "vpc" {
  source = "git::https://github.com/company/tf-modules.git//vpc?ref=main"

  config = {
    cidr = "10.0.0.0/16"
    azs  = ["us-east-1a"]
  }
}
Solution Three problems: 1. **`ref=main`** — pointing at a branch, not a version tag. Any push to `main` changes what you get on `terraform init`. 2. **Single AZ** — no high availability. The module should validate that at least 2 AZs are provided. 3. **`config = {}`** — untyped map input. Should be individual typed variables for clarity and validation. Fixed:
module "vpc" {
  source = "git::https://github.com/company/tf-modules.git//vpc?ref=v2.1.0"

  vpc_cidr           = "10.0.0.0/16"
  availability_zones = ["us-east-1a", "us-east-1b"]
  environment        = "dev"
}

Exercise 2: Cross-Variable Validation (5 minutes)

Write a precondition that allows any instance_type in production but restricts non-production to t3.micro, t3.small, and t3.medium.

Solution
resource "aws_instance" "this" {
  ami           = var.ami_id
  instance_type = var.instance_type

  lifecycle {
    precondition {
      condition = (
        var.environment == "prod" ||
        contains(["t3.micro", "t3.small", "t3.medium"], var.instance_type)
      )
      error_message = "Non-prod is limited to t3.micro/small/medium."
    }
  }
}

Exercise 3: Refactor Copy-Paste into a Module (15 minutes)

You have two nearly identical security group definitions — one in dev/ and one in prod/. They differ only in allowed CIDR ranges and the VPC ID. Create a module at modules/web-sg/ that takes vpc_id, allowed_cidrs, and environment as inputs, and outputs the security group ID.

Solution
# modules/web-sg/variables.tf
variable "vpc_id"        { type = string }
variable "allowed_cidrs" { type = list(string) }
variable "environment"   { type = string }

# modules/web-sg/main.tf
resource "aws_security_group" "web" {
  name_prefix = "${var.environment}-web-"
  vpc_id      = var.vpc_id

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = var.allowed_cidrs
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = { Name = "${var.environment}-web-sg", ManagedBy = "terraform" }
}

# modules/web-sg/outputs.tf
output "security_group_id" { value = aws_security_group.web.id }

Cheat Sheet

Pin this to your wall.

Task Command / Syntax
Initialize modules terraform init
Update module version Change version, run terraform init -upgrade
List modules in state terraform state list \| grep module
Move resource into module terraform state mv aws_vpc.main module.network.aws_vpc.this
Module output reference module.<NAME>.<OUTPUT>
Version pin (exact) version = "5.5.0"
Version pin (patch range) version = "~> 5.5.0" (>= 5.5.0, < 5.6.0)
Version pin (minor range) version = "~> 5.0" (>= 5.0, < 6.0)
Run module tests terraform test
Run specific test terraform test -filter=tests/vpc.tftest.hcl
Validate config terraform validate
Format module code terraform fmt -recursive

Module design rules of thumb:

Rule Why
One module per concern Blast radius, testability
No provider blocks inside modules Caller controls region/account
Typed variables, not map(any) Self-documenting, validates at plan time
Minimal outputs Fewer outputs = smaller API surface = fewer breaking changes
Default to secure enable_encryption = true, enable_flow_logs = true
Max 2 levels of nesting Deeper nesting = unreadable state paths
Pin versions in production Unpinned modules are ticking time bombs

Takeaways

  • Modules are functions for infrastructure. Inputs, logic, outputs. If you're copy-pasting .tf files between directories, you need a module.

  • Version pinning is not optional. An unpinned module source is a production incident waiting for someone to push a breaking change upstream.

  • for_each over count, always. Index-based addressing (count) causes cascade destruction when you remove items. Key-based addressing (for_each) is surgical.

  • Module defaults are governance. Set secure defaults (encryption = true, flow_logs = true) and make teams explicitly opt out. The PR review becomes the policy enforcement.

  • Test modules before publishing. The native terraform test framework catches bugs at plan time for free. Apply tests catch the rest.

  • Keep modules small. A module that creates a VPC is good. A module that creates a VPC, EKS cluster, RDS database, and monitoring stack is a liability.