Skip to content

Packer — Street Ops

Golden AMI pipeline (GitHub Actions + Packer + AWS)

Typical workflow:

# .github/workflows/build-ami.yml
on:
  push:
    paths: ["packer/**"]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-packer@main
      - name: Init
        run: packer init packer/
      - name: Validate
        run: packer validate packer/
      - name: Build
        run: packer build -var-file=packer/prod.pkrvars.hcl packer/
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

Use OIDC federation instead of long-lived keys when possible. Tag AMIs with the git SHA so you can trace any running instance back to the commit that built it.

Remember: Every golden image should be tagged with: git SHA, build timestamp, base OS version, and builder identity. Mnemonic: STOB — SHA, Time, OS, Builder. When an instance misbehaves in production, these tags let you trace it back to the exact build.

Gotcha: Packer builds that fail mid-provisioning leave orphaned cloud resources (EC2 instances, disks, security groups) that accumulate cost silently. Set -on-error=cleanup (the default) in CI, and run a weekly cleanup script to find resources tagged packer-builder older than 24 hours. In AWS: aws ec2 describe-instances --filters "Name=tag:Name,Values=Packer*" "Name=instance-state-name,Values=running".

Homelab: Proxmox VM templates with Packer + cloud-init

source "proxmox-iso" "ubuntu" {
  proxmox_url              = "https://pve.local:8006/api2/json"
  username                 = "root@pam"
  token                    = var.proxmox_token
  node                     = "pve"
  iso_file                 = "local:iso/ubuntu-22.04-live-server-amd64.iso"
  vm_id                    = 9000
  template_name            = "ubuntu-template"
  cores                    = 2
  memory                   = 2048
  cloud_init               = true
  cloud_init_storage_pool  = "local-lvm"
  ssh_username             = "ubuntu"
  ssh_timeout              = "20m"
  http_directory           = "http"  # serves autoinstall files
}

Put your autoinstall (user-data) in http/ so Packer's built-in HTTP server serves it. The resulting template gets cloud-init baked in — clone it and customize per-VM via cloud-init at boot.

Under the hood: Packer starts a temporary HTTP server on the build host and passes {{ .HTTPIP }}:{{ .HTTPPort }} to the VM's boot command. The VM fetches the autoinstall file over HTTP during initial boot. If the build hangs at "Waiting for SSH," the VM likely cannot reach Packer's HTTP server — check firewall rules between the hypervisor network and the build host.

Docker base images with Packer

When a Dockerfile is not enough — multi-step provisioning with Ansible, complex shell logic, or when you need the same provisioning to produce both a Docker image and a VM image:

source "docker" "base" {
  image  = "ubuntu:22.04"
  commit = true
}

build {
  sources = ["source.docker.base"]

  provisioner "shell" {
    inline = ["apt-get update && apt-get install -y python3"]
  }

  provisioner "ansible" {
    playbook_file = "ansible/docker-base.yml"
  }

  post-processor "docker-tag" {
    repository = "registry.internal/base"
    tags       = ["latest", var.build_tag]
  }

  post-processor "docker-push" {
    login          = true
    login_server   = "registry.internal"
    login_username = var.registry_user
    login_password = var.registry_pass
  }
}

Multi-platform builds

One template, multiple outputs. Define multiple sources and reference them all:

source "amazon-ebs" "app"      { /* ... */ }
source "googlecompute" "app"   { /* ... */ }
source "vagrant" "app"         { /* ... */ }

build {
  sources = [
    "source.amazon-ebs.app",
    "source.googlecompute.app",
    "source.vagrant.app"
  ]

  provisioner "ansible" {
    playbook_file = "ansible/app.yml"
  }
}

Build all: packer build . Build one: packer build -only='amazon-ebs.app' .

One-liner: Multi-platform builds ensure your provisioning logic is tested against every target. A script that works on AWS might fail on GCP due to different disk device names, network config, or init systems.

Testing images before promotion

Build the image, boot it, run tests, then promote only if tests pass.

With Goss (lightweight):

provisioner "goss" {
  tests = ["goss/goss.yaml"]
}

Or run Goss as a shell provisioner at the end of the build — it exits non-zero on failure, which fails the Packer build.

With InSpec (compliance-focused):

Boot the image in CI (EC2 instance, Docker container), run InSpec against it, terminate on success, alert on failure.

Pipeline pattern:

Packer build -> deploy test instance -> run tests -> terminate test instance
  -> if pass: copy AMI to prod account / tag as "approved"
  -> if fail: alert, do not promote

Debugging failed builds

Technique What it does
packer build -debug . Pauses after each step. Lets you SSH in and inspect.
packer build -on-error=ask . On failure, asks whether to clean up, abort, or retry.
packer build -on-error=abort . Leaves the instance running so you can SSH in post-failure.
PACKER_LOG=1 packer build . Full debug logging to stderr.
PACKER_LOG=1 PACKER_LOG_PATH=build.log packer build . Debug logging to a file.

Debug clue: When a Packer build fails with "Timeout waiting for SSH," the three most common causes are: (1) the VM's firewall blocks port 22, (2) the SSH user/key does not match what was configured in the autoinstall, (3) the VM did not get an IP address (DHCP failure or wrong network). Use -on-error=abort to keep the VM running so you can connect via the hypervisor console and check.

When a provisioner fails: 1. Check the build output — Packer shows stdout/stderr from the provisioner. 2. Use -on-error=abort, SSH into the instance, and inspect manually. 3. Fix the provisioner script, packer build again.

Template organization for a multi-image repo

packer/
  plugins.pkr.hcl          # required_plugins block (shared)
  variables.pkr.hcl        # common variables
  images/
    base/
      base.pkr.hcl         # source + build for base image
      base.pkrvars.hcl     # default variable values
      scripts/             # shell provisioner scripts
      ansible/             # playbooks specific to this image
    app/
      app.pkr.hcl
      app.pkrvars.hcl
      scripts/
    hardened/
      hardened.pkr.hcl
      hardened.pkrvars.hcl

Build one image: packer build -var-file=images/base/base.pkrvars.hcl images/base/

Keep plugins.pkr.hcl at the top level so packer init works from the root.

Default trap: packer init downloads plugins to ~/.packer.d/plugins/ by default. In CI, this directory is ephemeral and plugins are re-downloaded every build. Either cache the plugin directory between CI runs or vendor plugins into the repo for reproducible builds.

Integrating with Ansible

Passing variables:

provisioner "ansible" {
  playbook_file   = "ansible/site.yml"
  extra_arguments = [
    "--extra-vars", "app_version=${var.app_version} env=${var.env}",
    "-v"
  ]
  ansible_env_vars = [
    "ANSIBLE_HOST_KEY_CHECKING=False"
  ]
}

Using Ansible Vault:

provisioner "ansible" {
  playbook_file   = "ansible/site.yml"
  extra_arguments = [
    "--vault-password-file", "${var.vault_pass_file}"
  ]
}

Do not bake vault-encrypted secrets into the image. Use Vault for secrets that should exist only at runtime. Ansible Vault in Packer builds is fine for non-secret config that you just want to keep out of plain text in the repo.

War story: A team baked database credentials into their golden AMI for convenience. Months later, they rotated the password and every new instance launched from the old AMI silently connected with stale credentials — causing intermittent auth failures that took days to trace. Golden rule: images should contain software and configuration, never credentials. Inject secrets at boot via instance metadata, Vault, or Parameter Store.

Connection tuning: Packer generates a temporary SSH keypair and passes it to Ansible. If Ansible is slow to connect, check that ansible_env_vars includes ANSIBLE_HOST_KEY_CHECKING=False and that ssh_timeout on the source block is long enough.