How We Got Here: Configuration Management¶

Arc: Infrastructure Eras covered: 6 Timeline: ~2005-2025 Read time: ~12 min

The Original Problem¶

In 2005, if you had 50 servers, you configured them by SSH'ing into each one. Maybe you had a wiki page with a checklist: "install Apache, edit /etc/httpd/conf/httpd.conf, set MaxClients to 256, restart." When the wiki got out of date — and it always got out of date — servers drifted. Server #23 had a different timezone. Server #41 was missing a security patch. Server #7 had a config file hand-edited three months ago and nobody remembered why. When something broke, you couldn't tell if the problem was the code, the config, or the drift.

Configuration drift was the silent killer of uptime. Not dramatic outages — just slow, grinding unreliability as identical servers became quietly unique.

Era 1: Shell Scripts and Golden Images (~2005-2008)¶

The Solution¶

Teams wrote bash scripts that automated their wiki checklists. More sophisticated shops built "golden images" — a fully configured VM snapshot that served as the template for new servers. The script was the source of truth, and you ran it on every new machine.

What It Looked Like¶

#!/bin/bash
# setup-webserver.sh — the "configuration management" of 2005
yum install -y httpd mod_ssl php
cp /nfs/configs/httpd.conf /etc/httpd/conf/httpd.conf
cp /nfs/configs/php.ini /etc/php.ini
chkconfig httpd on
service httpd start
iptables -A INPUT -p tcp --dport 80 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j ACCEPT
service iptables save

Why It Was Better¶

Repeatable — run the same script on every server
Documented — the script was the documentation
Faster than manual SSH sessions
Version controllable (if you put it in SVN)

Why It Wasn't Enough¶

Scripts were imperative: they described steps, not desired state
Running a script twice could break things (not idempotent)
No way to detect or correct drift after initial setup
Error handling was primitive (set -e and hope)
Scripts didn't compose — 50 scripts for 50 concerns became unmaintainable

Legacy You'll Still See¶

Shell provisioning scripts are everywhere. Dockerfiles are essentially this pattern refined. Packer provisioners are often shell scripts. Many "cloud-init" user data scripts follow this exact model. When you see a setup.sh in a repo, you're looking at this era.

Era 2: CFEngine (~2005-2010)¶

The Solution¶

Mark Burgess created CFEngine in 1993, but it gained real traction in the mid-2000s as infrastructure scaled beyond what scripts could manage. CFEngine introduced the concept of convergence — you declared the desired state, and the agent running on each server continuously corrected drift to match.

What It Looked Like¶

# cfengine promise: ensure Apache is installed and running
bundle agent webserver {
  packages:
    "httpd"
      package_policy => "add",
      package_method => yum;

  files:
    "/etc/httpd/conf/httpd.conf"
      copy_from => remote_cp("/srv/cfengine/configs/httpd.conf", "cfhub"),
      perms     => mog("644", "root", "root");

  processes:
    "httpd"
      restart_class => "restart_httpd";

  commands:
    restart_httpd::
      "/sbin/service httpd restart";
}

Why It Was Better¶

Declarative: describe what, not how
Convergent: runs every 5 minutes, corrects drift automatically
Scales to thousands of nodes (agent-based, no central bottleneck)
Theoretical foundation (promise theory) was rigorous

Why It Wasn't Enough¶

The language was idiosyncratic and hard to learn
Community was small; documentation was academic
Debugging was opaque — when convergence failed, figuring out why was painful
No good ecosystem of reusable modules
The competition arrived with friendlier interfaces

Legacy You'll Still See¶

CFEngine is still in production at some large enterprises and government agencies. Its ideas — convergence, desired state, promise theory — are the intellectual foundation of everything that followed. You may never write CFEngine, but every modern config management tool owes it a debt.

Era 3: Puppet and Chef (~2008-2015)¶

The Solution¶

Puppet (2005, but widespread ~2008) and Chef (2009) brought configuration management to the mainstream. Puppet used a declarative DSL; Chef used Ruby. Both had robust module/cookbook ecosystems, a central server for state tracking, and corporate backing with training and support.

What It Looked Like¶

# Puppet manifest
class profile::webserver {
  package { 'httpd':
    ensure => installed,
  }

  file { '/etc/httpd/conf/httpd.conf':
    ensure  => file,
    source  => 'puppet:///modules/profile/httpd.conf',
    require => Package['httpd'],
    notify  => Service['httpd'],
  }

  service { 'httpd':
    ensure => running,
    enable => true,
  }
}

# Chef recipe
package 'httpd' do
  action :install
end

cookbook_file '/etc/httpd/conf/httpd.conf' do
  source 'httpd.conf'
  owner 'root'
  group 'root'
  mode '0644'
  notifies :restart, 'service[httpd]'
end

service 'httpd' do
  action [:enable, :start]
end

Why It Was Better¶

Rich ecosystems: Puppet Forge, Chef Supermarket had thousands of modules
Enterprise features: role-based access, audit trails, compliance reporting
Testing frameworks: rspec-puppet, ChefSpec, Test Kitchen
Large communities, conferences (PuppetConf, ChefConf), job market

Why It Wasn't Enough¶

Agent-based: required a Puppet/Chef agent on every node
Central server was a single point of failure and complexity
Puppet's DSL had a learning curve; Chef required Ruby knowledge
Catalog compilation could be slow at scale
The rise of immutable infrastructure made "converge in place" less appealing

Legacy You'll Still See¶

Puppet is still widely used in enterprise environments, especially for on-prem infrastructure. Chef (now Progress Chef) persists in organizations that invested heavily in cookbooks. Many compliance frameworks (InSpec) originated in this ecosystem. If you're at a large enterprise, there's a good chance Puppet is managing something.

Era 4: Ansible (~2013-2020)¶

The Solution¶

Ansible (2012, acquired by Red Hat 2015) solved the two biggest complaints about Puppet and Chef: it was agentless (SSH-based) and used YAML instead of a custom DSL. You could start using it in an afternoon. No server to set up, no agent to deploy, no new language to learn.

What It Looked Like¶

# playbook.yml
---
- name: Configure web servers
  hosts: webservers
  become: true
  tasks:
    - name: Install Apache
      yum:
        name: httpd
        state: present

    - name: Deploy config
      copy:
        src: httpd.conf
        dest: /etc/httpd/conf/httpd.conf
        owner: root
        group: root
        mode: '0644'
      notify: restart apache

    - name: Ensure Apache is running
      service:
        name: httpd
        state: started
        enabled: true

  handlers:
    - name: restart apache
      service:
        name: httpd
        state: restarted

Why It Was Better¶

Zero bootstrap: if you can SSH to it, you can manage it
YAML is readable by anyone, not just developers
Massive module library (3000+ modules)
Ansible Galaxy for community roles
Low barrier to entry — ops people adopted it immediately

Why It Wasn't Enough¶

SSH-based execution is slow at scale (hundreds of hosts)
YAML is readable but painful to debug (whitespace, quoting, Jinja2 templating)
No built-in state tracking — "did this change?" requires careful task design
Playbook sprawl: without discipline, you get 200 files of spaghetti YAML
Push-based model means drift happens between runs

Legacy You'll Still See¶

Ansible is the current dominant configuration management tool for most organizations. Ansible Tower/AWX provides a GUI and RBAC layer. Red Hat bundles it into everything. If you're managing anything that isn't Kubernetes — network devices, VMs, on-prem servers — Ansible is probably the tool.

Era 5: GitOps and Declarative Platforms (~2018-2023)¶

The Solution¶

The GitOps movement (coined by Weaveworks, 2017) inverted the model: instead of pushing configuration to servers, you declare desired state in Git and let a reconciliation controller pull it. Flux and ArgoCD continuously sync cluster state to match the Git repo. On the infrastructure side, Crossplane extended this to cloud resources.

What It Looked Like¶

# ArgoCD Application — point at a Git repo, let it reconcile
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: web-frontend
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/k8s-manifests.git
    targetRevision: main
    path: apps/web-frontend
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

Why It Was Better¶

Git is the single source of truth — full audit trail via git log
Self-healing: drift is automatically corrected by the controller
Pull-based: no SSH credentials to manage, no push infrastructure
Declarative all the way down — what, not how
Works naturally with Kubernetes RBAC and namespaces

Why It Wasn't Enough¶

Only works well for Kubernetes workloads (not VMs, network gear, etc.)
Secrets management in Git is an unsolved tension
Debugging sync failures requires understanding both Git and K8s deeply
"Everything in YAML" fatigue — the configuration management moved, not disappeared
Requires Kubernetes — which is its own complexity commitment

Legacy You'll Still See¶

GitOps is the current best practice for Kubernetes configuration. ArgoCD and Flux are standard tools. But the non-Kubernetes world still uses Ansible, and many organizations run both. The "GitOps for everything" vision is aspirational, not realized.

Era 6: Platform Engineering and Self-Service (~2023-2025)¶

The Solution¶

Platform engineering teams build internal developer platforms (IDPs) that abstract away the configuration management layer entirely. Tools like Backstage (Spotify), Humanitec, and custom portals let developers request infrastructure through forms or APIs. The platform team handles the how; developers specify the what.

What It Looked Like¶

# score.yaml — Humanitec Score workload spec
apiVersion: score.dev/v1b1
metadata:
  name: my-service
containers:
  main:
    image: registry.example.com/my-service
    variables:
      DB_HOST: ${resources.db.host}
      DB_PORT: ${resources.db.port}
resources:
  db:
    type: postgres
  dns:
    type: dns

Why It Was Better¶

Developers don't need to know Ansible, Terraform, or Kubernetes
Golden paths enforce best practices without restricting choice
Self-service reduces ticket-based bottlenecks
Configuration management happens behind the platform API
Compliance and security are built into the platform, not bolted on

Why It Wasn't Enough¶

Building an IDP is a multi-year investment
The "platform team" is a new organizational cost center
Abstractions can be too opaque — debugging requires dropping down
Still early — tooling is fragmented and immature
Doesn't eliminate configuration management, just moves it to the platform team

Legacy You'll Still See¶

Platform engineering is the current hype cycle, but genuinely solving real problems. Most organizations are in the "thinking about it" or "early implementation" phase. The actual configuration management still happens — someone has to write the Terraform and Ansible behind the portal.

Where We Are Now¶

Configuration management has not gone away — it has been absorbed into higher-level abstractions. Ansible dominates the VM and bare-metal world. GitOps (ArgoCD/Flux) dominates Kubernetes. Platform engineering is emerging as the layer that hides both. The actual work of ensuring "this server looks like this specification" still happens; the question is who does it and how visible it is.

Where It's Going¶

The most likely near-term evolution is AI-assisted configuration generation — describe what you want in natural language, get a working Ansible playbook or Kubernetes manifest. Longer term, the configuration management layer may become an implementation detail of the platform, invisible to most engineers. But as long as software runs on computers, someone needs to configure those computers.

The Pattern¶

Every generation tries to make configuration a declaration rather than a procedure, then discovers that the real problem was never the syntax — it was the organizational discipline to keep declarations accurate and drift under control.

Key Takeaway for Practitioners¶

The tool matters less than the discipline. Pick one tool, use it for everything it's good at, and resist the temptation to mix three configuration management approaches in the same estate. Consistency beats optimization.

Cross-References¶

Topic Packs: Ansible, Puppet, ArgoCD
Tool Comparisons: Ansible vs Puppet vs Chef
Evolution Guides: Infrastructure as Code, Developer Experience

How We Got Here: Configuration Management¶

The Original Problem¶

Era 1: Shell Scripts and Golden Images (~2005-2008)¶

The Solution¶

What It Looked Like¶

Why It Was Better¶

Why It Wasn't Enough¶

Legacy You'll Still See¶

Era 2: CFEngine (~2005-2010)¶

The Solution¶

What It Looked Like¶

Why It Was Better¶

Why It Wasn't Enough¶

Legacy You'll Still See¶

Era 3: Puppet and Chef (~2008-2015)¶

The Solution¶

What It Looked Like¶

Why It Was Better¶

Why It Wasn't Enough¶

Legacy You'll Still See¶

Era 4: Ansible (~2013-2020)¶

The Solution¶

What It Looked Like¶

Why It Was Better¶

Why It Wasn't Enough¶

Legacy You'll Still See¶

Era 5: GitOps and Declarative Platforms (~2018-2023)¶

The Solution¶

What It Looked Like¶

Why It Was Better¶

Why It Wasn't Enough¶

Legacy You'll Still See¶

Era 6: Platform Engineering and Self-Service (~2023-2025)¶

The Solution¶

What It Looked Like¶

Why It Was Better¶

Why It Wasn't Enough¶

Legacy You'll Still See¶

Where We Are Now¶

Where It's Going¶

The Pattern¶

Key Takeaway for Practitioners¶

Cross-References¶

Pages that link here¶