Skip to content

How We Got Here: Infrastructure as Code

Arc: Infrastructure Eras covered: 6 Timeline: ~2010-2025 Read time: ~12 min


The Original Problem

In 2010, provisioning cloud infrastructure meant clicking through the AWS Management Console. You'd click "Launch Instance," choose an AMI, pick a security group (or create one inline), attach an EBS volume, and hope you remembered all the settings when you needed to do it again. There was no record of what you did, no way to reproduce it, and no way to review it. "Infrastructure" lived in someone's head and in the console's current state. When that person left the company, the knowledge left with them.

Worse, the console was the audit trail. "Who created this S3 bucket with public access?" Nobody knew. CloudTrail existed but nobody read it proactively. The gap between "we should track infrastructure changes" and "we actually do" was enormous.


Era 1: ClickOps and Console Cowboying (~2006-2012)

The Solution

The AWS Console, Azure Portal, and GCP Console were the primary interfaces. For slightly more sophisticated teams, the AWS CLI and SDKs provided scriptable access. But most infrastructure was created and modified through web UIs.

What It Looked Like

# The "infrastructure as code" of 2010:
1. Log into AWS Console
2. Navigate to EC2 → Launch Instance
3. Click through 7 configuration screens
4. Forget to add the tag for cost allocation
5. Realize 3 days later you chose the wrong subnet
6. Create another instance in the right subnet
7. Forget to terminate the old one
8. Get a $400 bill surprise at month-end

Why It Was Better

  • Accessible to anyone who could use a web browser
  • Visual feedback — you could see what you were creating
  • No tooling to install or learn

Why It Wasn't Enough

  • Not reproducible — doing it again required remembering every click
  • Not reviewable — no pull request for infrastructure changes
  • Not auditable — who changed what, when, and why?
  • Not testable — can't unit test a click sequence
  • Environments diverged immediately (dev never matched prod)

Legacy You'll Still See

ClickOps is alive and well. Many teams still create "quick" resources through the console, intending to codify them later (they don't). The AWS Console is often the first place people go to debug — and sometimes the first place they go to "fix" things, bypassing their own IaC pipeline.


Era 2: Imperative Scripts and SDKs (~2010-2014)

The Solution

Teams wrote scripts using cloud SDKs (boto for Python/AWS, fog for Ruby, azure-sdk for .NET) that called cloud APIs directly. This was at least reproducible and version-controllable.

What It Looked Like

# boto script to create a VPC, subnet, and instance (~2012)
import boto.ec2
import boto.vpc

conn_vpc = boto.vpc.connect_to_region('us-east-1')
vpc = conn_vpc.create_vpc('10.0.0.0/16')
subnet = conn_vpc.create_subnet(vpc.id, '10.0.1.0/24')

conn_ec2 = boto.ec2.connect_to_region('us-east-1')
reservation = conn_ec2.run_instances(
    'ami-12345678',
    instance_type='t2.micro',
    subnet_id=subnet.id,
    key_name='deploy-key',
)

Why It Was Better

  • Scriptable and repeatable
  • Could be version controlled in Git
  • Could be parameterized (different values for dev/staging/prod)
  • Teams could build libraries of common patterns

Why It Wasn't Enough

  • Imperative: scripts described steps, not desired state
  • No built-in idempotency — running twice created duplicate resources
  • No dependency graph — order mattered and was error-prone
  • No state tracking — scripts didn't know what already existed
  • Deleting resources required separate teardown scripts

Legacy You'll Still See

SDK-based provisioning scripts persist in data engineering and ML teams who build custom automation. Lambda functions that create resources on-demand use this pattern. "Quick scripts" for one-off tasks often follow this model.


Era 3: CloudFormation and ARM Templates (~2011-2016)

The Solution

AWS CloudFormation (2011) introduced declarative infrastructure. You wrote a JSON (later YAML) template describing your desired resources, and CloudFormation created, updated, and deleted them to match. Azure Resource Manager (ARM) templates (2014) followed the same model.

What It Looked Like

# CloudFormation template
AWSTemplateFormatVersion: '2010-09-09'
Resources:
  MyVPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/16
      Tags:
        - Key: Name
          Value: production-vpc

  WebSubnet:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref MyVPC
      CidrBlock: 10.0.1.0/24
      AvailabilityZone: us-east-1a

  WebServer:
    Type: AWS::EC2::Instance
    Properties:
      InstanceType: t3.micro
      SubnetId: !Ref WebSubnet
      ImageId: ami-0abcdef1234567890

Why It Was Better

  • Declarative: describe what you want, not how to create it
  • Automatic dependency resolution
  • Stack-based lifecycle: create, update, delete as a unit
  • Drift detection (eventually)
  • Free — no additional cost beyond the resources themselves

Why It Wasn't Enough

  • Vendor-locked: CloudFormation is AWS-only, ARM is Azure-only
  • Verbose: simple infrastructure required hundreds of lines of YAML/JSON
  • Error messages were cryptic ("UPDATE_ROLLBACK_FAILED" was feared)
  • Rollback behavior was unpredictable and sometimes destructive
  • No real programming constructs (loops, conditionals were hacks)

Legacy You'll Still See

CloudFormation is still heavily used, especially in organizations where AWS is the only cloud. Many CDK and SAM projects compile down to CloudFormation. ARM templates persist in Azure-first enterprises. If you see a 2000-line YAML file with AWSTemplateFormatVersion at the top, you're in this era.


Era 4: Terraform (~2014-2022)

The Solution

HashiCorp released Terraform in 2014. It used a custom language (HCL) that was more readable than JSON/YAML, supported multiple cloud providers through a plugin system, and maintained a state file that tracked what it had created. For the first time, one tool could manage AWS, Azure, GCP, and dozens of other providers.

What It Looked Like

# main.tf
provider "aws" {
  region = "us-east-1"
}

resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
  tags = {
    Name = "production-vpc"
  }
}

resource "aws_subnet" "web" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.1.0/24"
  availability_zone = "us-east-1a"
}

resource "aws_instance" "web" {
  ami           = "ami-0abcdef1234567890"
  instance_type = "t3.micro"
  subnet_id     = aws_subnet.web.id
}
terraform init
terraform plan    # show what will change
terraform apply   # make it happen

Why It Was Better

  • Multi-cloud with a single tool and language
  • Plan before apply — see changes before they happen
  • State file tracks resource-to-code mapping
  • Module system for reusable components (Terraform Registry)
  • Massive community and provider ecosystem

Why It Wasn't Enough

  • State file management became its own discipline (locking, backends, drift)
  • HCL is not a real programming language — complex logic is awkward
  • The HashiCorp BSL license change (2023) shook community trust
  • Large state files became slow and fragile
  • Terraform Cloud/Enterprise pushed a commercial model that not everyone wanted

Legacy You'll Still See

Terraform is the most widely used IaC tool today. Most job postings that mention IaC mean Terraform. The OpenTofu fork (after the license change) is gaining traction but the ecosystem is still Terraform-centric. If you do IaC professionally, you will write HCL.


Era 5: Pulumi and CDK (~2018-2024)

The Solution

AWS CDK (2018) and Pulumi (2018) asked: why invent a new language when developers already know TypeScript, Python, Go, and Java? Both let you define infrastructure using real programming languages with real IDEs, real debuggers, real testing frameworks, and real abstractions like classes and functions.

What It Looked Like

// Pulumi — TypeScript
import * as aws from "@pulumi/aws";

const vpc = new aws.ec2.Vpc("main", {
  cidrBlock: "10.0.0.0/16",
  tags: { Name: "production-vpc" },
});

const subnet = new aws.ec2.Subnet("web", {
  vpcId: vpc.id,
  cidrBlock: "10.0.1.0/24",
  availabilityZone: "us-east-1a",
});

const server = new aws.ec2.Instance("web", {
  ami: "ami-0abcdef1234567890",
  instanceType: "t3.micro",
  subnetId: subnet.id,
});
# AWS CDK — Python
from aws_cdk import Stack, aws_ec2 as ec2
from constructs import Construct

class WebStack(Stack):
    def __init__(self, scope: Construct, id: str, **kwargs):
        super().__init__(scope, id, **kwargs)
        vpc = ec2.Vpc(self, "MainVpc", max_azs=2)
        ec2.Instance(self, "WebServer",
            vpc=vpc,
            instance_type=ec2.InstanceType("t3.micro"),
            machine_image=ec2.AmazonLinuxImage(),
        )

Why It Was Better

  • Real programming languages: loops, conditionals, type checking, IDE support
  • Testable with standard testing frameworks (Jest, pytest)
  • Reusable abstractions using classes, functions, packages
  • CDK compiles to CloudFormation — familiar deployment model
  • Pulumi manages its own state — no separate state file gymnastics

Why It Wasn't Enough

  • CDK is AWS-only (CDKTF for Terraform exists but is a second-class citizen)
  • Generated CloudFormation templates are enormous and hard to debug
  • "Turing-complete IaC" can lead to over-engineering
  • Pulumi's managed state service is a commercial dependency
  • The community is smaller than Terraform's — fewer examples, fewer modules

Legacy You'll Still See

CDK is growing fast in AWS-heavy shops. Pulumi is popular with developer-centric teams. Both coexist with Terraform — often in the same organization. The "should IaC be a DSL or a real language?" debate is ongoing.


Era 6: Crossplane and Control Plane IaC (~2022-2025)

The Solution

Crossplane (2019, mainstream ~2022) brings IaC into Kubernetes. Instead of running terraform apply from a CI pipeline, you declare cloud resources as Kubernetes custom resources. The Crossplane controller reconciles them continuously, just like Kubernetes reconciles pods. Infrastructure becomes another Kubernetes workload.

What It Looked Like

# Crossplane Composition — a managed Postgres
apiVersion: database.aws.crossplane.io/v1beta1
kind: RDSInstance
metadata:
  name: production-db
spec:
  forProvider:
    region: us-east-1
    dbInstanceClass: db.t3.micro
    engine: postgres
    engineVersion: "15"
    masterUsername: admin
  writeConnectionSecretToRef:
    name: db-credentials
    namespace: production

Why It Was Better

  • Continuous reconciliation — drift is automatically corrected
  • Uses Kubernetes RBAC, namespaces, and policies for access control
  • Compositions let platform teams build self-service abstractions
  • Single control plane for both workloads and infrastructure
  • GitOps-native — ArgoCD can manage infrastructure and applications

Why It Wasn't Enough

  • Requires Kubernetes — which is a significant prerequisite
  • Provider coverage lags behind Terraform
  • Debugging is harder (Kubernetes events + cloud API errors)
  • Community is smaller and documentation is thinner
  • The "everything through Kubernetes" philosophy is not universally embraced

Legacy You'll Still See

Crossplane is gaining adoption in platform engineering teams but is far from mainstream. Most organizations still use Terraform. The pattern of "infrastructure as Kubernetes resources" is the direction, but the tooling is still maturing.


Where We Are Now

Terraform dominates, with CDK and Pulumi growing in developer-centric teams. Crossplane is emerging for platform engineering use cases. CloudFormation persists in AWS-only shops. Most organizations use one primary tool with exceptions for edge cases. The state management problem (Terraform state, CloudFormation stacks, Pulumi state) remains one of the biggest operational challenges.

Where It's Going

The convergence of IaC and GitOps is the clearest trend — infrastructure declared in Git, reconciled continuously by controllers. AI-assisted IaC generation (describe what you want, get working code) is arriving but not yet reliable for production use. The most impactful near-term change may be the OpenTofu/Terraform split forcing organizations to choose sides.

The Pattern

Every generation of IaC tries to be the single abstraction layer between intent and infrastructure. The winning tool is always the one with the largest ecosystem of providers and modules, because infrastructure diversity is the fundamental challenge.

Key Takeaway for Practitioners

Learn Terraform deeply — it's the lingua franca. But understand that IaC is a practice, not a tool. The discipline of "all infrastructure changes go through code review" matters more than which tool generates the API calls.

Cross-References