How We Got Here: Configuration Management¶
Arc: Infrastructure Eras covered: 6 Timeline: ~2005-2025 Read time: ~12 min
The Original Problem¶
In 2005, if you had 50 servers, you configured them by SSH'ing into each one. Maybe you had a wiki page with a checklist: "install Apache, edit /etc/httpd/conf/httpd.conf, set MaxClients to 256, restart." When the wiki got out of date — and it always got out of date — servers drifted. Server #23 had a different timezone. Server #41 was missing a security patch. Server #7 had a config file hand-edited three months ago and nobody remembered why. When something broke, you couldn't tell if the problem was the code, the config, or the drift.
Configuration drift was the silent killer of uptime. Not dramatic outages — just slow, grinding unreliability as identical servers became quietly unique.
Era 1: Shell Scripts and Golden Images (~2005-2008)¶
The Solution¶
Teams wrote bash scripts that automated their wiki checklists. More sophisticated shops built "golden images" — a fully configured VM snapshot that served as the template for new servers. The script was the source of truth, and you ran it on every new machine.
What It Looked Like¶
#!/bin/bash
# setup-webserver.sh — the "configuration management" of 2005
yum install -y httpd mod_ssl php
cp /nfs/configs/httpd.conf /etc/httpd/conf/httpd.conf
cp /nfs/configs/php.ini /etc/php.ini
chkconfig httpd on
service httpd start
iptables -A INPUT -p tcp --dport 80 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j ACCEPT
service iptables save
Why It Was Better¶
- Repeatable — run the same script on every server
- Documented — the script was the documentation
- Faster than manual SSH sessions
- Version controllable (if you put it in SVN)
Why It Wasn't Enough¶
- Scripts were imperative: they described steps, not desired state
- Running a script twice could break things (not idempotent)
- No way to detect or correct drift after initial setup
- Error handling was primitive (set -e and hope)
- Scripts didn't compose — 50 scripts for 50 concerns became unmaintainable
Legacy You'll Still See¶
Shell provisioning scripts are everywhere. Dockerfiles are essentially this pattern refined. Packer provisioners are often shell scripts. Many "cloud-init" user data scripts follow this exact model. When you see a setup.sh in a repo, you're looking at this era.
Era 2: CFEngine (~2005-2010)¶
The Solution¶
Mark Burgess created CFEngine in 1993, but it gained real traction in the mid-2000s as infrastructure scaled beyond what scripts could manage. CFEngine introduced the concept of convergence — you declared the desired state, and the agent running on each server continuously corrected drift to match.
What It Looked Like¶
# cfengine promise: ensure Apache is installed and running
bundle agent webserver {
packages:
"httpd"
package_policy => "add",
package_method => yum;
files:
"/etc/httpd/conf/httpd.conf"
copy_from => remote_cp("/srv/cfengine/configs/httpd.conf", "cfhub"),
perms => mog("644", "root", "root");
processes:
"httpd"
restart_class => "restart_httpd";
commands:
restart_httpd::
"/sbin/service httpd restart";
}
Why It Was Better¶
- Declarative: describe what, not how
- Convergent: runs every 5 minutes, corrects drift automatically
- Scales to thousands of nodes (agent-based, no central bottleneck)
- Theoretical foundation (promise theory) was rigorous
Why It Wasn't Enough¶
- The language was idiosyncratic and hard to learn
- Community was small; documentation was academic
- Debugging was opaque — when convergence failed, figuring out why was painful
- No good ecosystem of reusable modules
- The competition arrived with friendlier interfaces
Legacy You'll Still See¶
CFEngine is still in production at some large enterprises and government agencies. Its ideas — convergence, desired state, promise theory — are the intellectual foundation of everything that followed. You may never write CFEngine, but every modern config management tool owes it a debt.
Era 3: Puppet and Chef (~2008-2015)¶
The Solution¶
Puppet (2005, but widespread ~2008) and Chef (2009) brought configuration management to the mainstream. Puppet used a declarative DSL; Chef used Ruby. Both had robust module/cookbook ecosystems, a central server for state tracking, and corporate backing with training and support.
What It Looked Like¶
# Puppet manifest
class profile::webserver {
package { 'httpd':
ensure => installed,
}
file { '/etc/httpd/conf/httpd.conf':
ensure => file,
source => 'puppet:///modules/profile/httpd.conf',
require => Package['httpd'],
notify => Service['httpd'],
}
service { 'httpd':
ensure => running,
enable => true,
}
}
# Chef recipe
package 'httpd' do
action :install
end
cookbook_file '/etc/httpd/conf/httpd.conf' do
source 'httpd.conf'
owner 'root'
group 'root'
mode '0644'
notifies :restart, 'service[httpd]'
end
service 'httpd' do
action [:enable, :start]
end
Why It Was Better¶
- Rich ecosystems: Puppet Forge, Chef Supermarket had thousands of modules
- Enterprise features: role-based access, audit trails, compliance reporting
- Testing frameworks: rspec-puppet, ChefSpec, Test Kitchen
- Large communities, conferences (PuppetConf, ChefConf), job market
Why It Wasn't Enough¶
- Agent-based: required a Puppet/Chef agent on every node
- Central server was a single point of failure and complexity
- Puppet's DSL had a learning curve; Chef required Ruby knowledge
- Catalog compilation could be slow at scale
- The rise of immutable infrastructure made "converge in place" less appealing
Legacy You'll Still See¶
Puppet is still widely used in enterprise environments, especially for on-prem infrastructure. Chef (now Progress Chef) persists in organizations that invested heavily in cookbooks. Many compliance frameworks (InSpec) originated in this ecosystem. If you're at a large enterprise, there's a good chance Puppet is managing something.
Era 4: Ansible (~2013-2020)¶
The Solution¶
Ansible (2012, acquired by Red Hat 2015) solved the two biggest complaints about Puppet and Chef: it was agentless (SSH-based) and used YAML instead of a custom DSL. You could start using it in an afternoon. No server to set up, no agent to deploy, no new language to learn.
What It Looked Like¶
# playbook.yml
---
- name: Configure web servers
hosts: webservers
become: true
tasks:
- name: Install Apache
yum:
name: httpd
state: present
- name: Deploy config
copy:
src: httpd.conf
dest: /etc/httpd/conf/httpd.conf
owner: root
group: root
mode: '0644'
notify: restart apache
- name: Ensure Apache is running
service:
name: httpd
state: started
enabled: true
handlers:
- name: restart apache
service:
name: httpd
state: restarted
Why It Was Better¶
- Zero bootstrap: if you can SSH to it, you can manage it
- YAML is readable by anyone, not just developers
- Massive module library (3000+ modules)
- Ansible Galaxy for community roles
- Low barrier to entry — ops people adopted it immediately
Why It Wasn't Enough¶
- SSH-based execution is slow at scale (hundreds of hosts)
- YAML is readable but painful to debug (whitespace, quoting, Jinja2 templating)
- No built-in state tracking — "did this change?" requires careful task design
- Playbook sprawl: without discipline, you get 200 files of spaghetti YAML
- Push-based model means drift happens between runs
Legacy You'll Still See¶
Ansible is the current dominant configuration management tool for most organizations. Ansible Tower/AWX provides a GUI and RBAC layer. Red Hat bundles it into everything. If you're managing anything that isn't Kubernetes — network devices, VMs, on-prem servers — Ansible is probably the tool.
Era 5: GitOps and Declarative Platforms (~2018-2023)¶
The Solution¶
The GitOps movement (coined by Weaveworks, 2017) inverted the model: instead of pushing configuration to servers, you declare desired state in Git and let a reconciliation controller pull it. Flux and ArgoCD continuously sync cluster state to match the Git repo. On the infrastructure side, Crossplane extended this to cloud resources.
What It Looked Like¶
# ArgoCD Application — point at a Git repo, let it reconcile
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: web-frontend
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/myorg/k8s-manifests.git
targetRevision: main
path: apps/web-frontend
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
Why It Was Better¶
- Git is the single source of truth — full audit trail via git log
- Self-healing: drift is automatically corrected by the controller
- Pull-based: no SSH credentials to manage, no push infrastructure
- Declarative all the way down — what, not how
- Works naturally with Kubernetes RBAC and namespaces
Why It Wasn't Enough¶
- Only works well for Kubernetes workloads (not VMs, network gear, etc.)
- Secrets management in Git is an unsolved tension
- Debugging sync failures requires understanding both Git and K8s deeply
- "Everything in YAML" fatigue — the configuration management moved, not disappeared
- Requires Kubernetes — which is its own complexity commitment
Legacy You'll Still See¶
GitOps is the current best practice for Kubernetes configuration. ArgoCD and Flux are standard tools. But the non-Kubernetes world still uses Ansible, and many organizations run both. The "GitOps for everything" vision is aspirational, not realized.
Era 6: Platform Engineering and Self-Service (~2023-2025)¶
The Solution¶
Platform engineering teams build internal developer platforms (IDPs) that abstract away the configuration management layer entirely. Tools like Backstage (Spotify), Humanitec, and custom portals let developers request infrastructure through forms or APIs. The platform team handles the how; developers specify the what.
What It Looked Like¶
# score.yaml — Humanitec Score workload spec
apiVersion: score.dev/v1b1
metadata:
name: my-service
containers:
main:
image: registry.example.com/my-service
variables:
DB_HOST: ${resources.db.host}
DB_PORT: ${resources.db.port}
resources:
db:
type: postgres
dns:
type: dns
Why It Was Better¶
- Developers don't need to know Ansible, Terraform, or Kubernetes
- Golden paths enforce best practices without restricting choice
- Self-service reduces ticket-based bottlenecks
- Configuration management happens behind the platform API
- Compliance and security are built into the platform, not bolted on
Why It Wasn't Enough¶
- Building an IDP is a multi-year investment
- The "platform team" is a new organizational cost center
- Abstractions can be too opaque — debugging requires dropping down
- Still early — tooling is fragmented and immature
- Doesn't eliminate configuration management, just moves it to the platform team
Legacy You'll Still See¶
Platform engineering is the current hype cycle, but genuinely solving real problems. Most organizations are in the "thinking about it" or "early implementation" phase. The actual configuration management still happens — someone has to write the Terraform and Ansible behind the portal.
Where We Are Now¶
Configuration management has not gone away — it has been absorbed into higher-level abstractions. Ansible dominates the VM and bare-metal world. GitOps (ArgoCD/Flux) dominates Kubernetes. Platform engineering is emerging as the layer that hides both. The actual work of ensuring "this server looks like this specification" still happens; the question is who does it and how visible it is.
Where It's Going¶
The most likely near-term evolution is AI-assisted configuration generation — describe what you want in natural language, get a working Ansible playbook or Kubernetes manifest. Longer term, the configuration management layer may become an implementation detail of the platform, invisible to most engineers. But as long as software runs on computers, someone needs to configure those computers.
The Pattern¶
Every generation tries to make configuration a declaration rather than a procedure, then discovers that the real problem was never the syntax — it was the organizational discipline to keep declarations accurate and drift under control.
Key Takeaway for Practitioners¶
The tool matters less than the discipline. Pick one tool, use it for everything it's good at, and resist the temptation to mix three configuration management approaches in the same estate. Consistency beats optimization.
Cross-References¶
- Topic Packs: Ansible, Puppet, ArgoCD
- Tool Comparisons: Ansible vs Puppet vs Chef
- Evolution Guides: Infrastructure as Code, Developer Experience