EC2 Image Builder vs Packer

September 22, 2027 · 16 min read

CloudOps Engineer · SOA-C03 · part of The Exam Room

The situation

Acme runs about 600 EC2 instances across 40 services. Historically the golden AMI has been a manual affair: every few months, an engineer spins up an instance from an Amazon Linux 2023 base, runs a setup script they’ve tweaked over time, snapshots the result, and publishes the AMI ID into Parameter Store. Services then reference the parameter in their launch templates.

Three problems have accumulated.

  • Staleness. The last rebuild was seven months ago. Dozens of security patches have shipped in that time. The fleet is running an image that was current in October; today is May.
  • Reproducibility. The “setup script” is half in a private git repo, half in the head of the engineer who built the last image, and includes at least one step that only works when run by a specific person (their GPG key signs a config file). No one else can rebuild it cleanly.
  • Fan-out. When the AMI does get rebuilt, copying it to the three other production Regions is done by hand. Sometimes a Region is forgotten.

This morning’s CVE has made the problem urgent. The platform team wants a pipeline: codified recipe, scheduled rebuild, multi-Region publish, automated smoke tests, and a policy that deprecates old AMIs on a timeline.

Two tools are on the table. AWS EC2 Image Builder is a managed service for exactly this shape: recipes, pipelines, infrastructure, distribution, test invocation. HashiCorp Packer is an open-source tool that builds images across clouds via pluggable builders, including an amazon-ebs builder for AMIs.

What actually matters

Before reaching for a tool, it’s worth being honest about what’s actually wrong with the status quo and what shape of replacement closes those gaps without inventing a parallel control plane.

The first concern is where the recipe lives. A “setup script” that’s half in a private repo and half in someone’s head is not a recipe; it’s tribal knowledge. The replacement needs the build steps as a file under version control, reviewable, diff-able, and runnable by anyone with the right IAM. Whichever tool the team picks, the unit that earns its place is “the file checked in”, not “the tool”.

The second is where the orchestration lives. Building an image is one step; running that build on a cadence, in the right Region, with the right inputs, and shipping the output to the right accounts and Regions, is the actual pipeline. That pipeline can be the tool’s own product, or it can be CI plus glue, or it can be a managed runner the team writes inside their own platform. Each choice trades convenience for control; the wrong shape buys neither.

The third is who owns the build infrastructure. The build itself launches an instance, runs steps on it, snapshots, and terminates. Either the tool launches the instance for the team (managed runners, IAM-and-VPC handled for you) or the team launches it (CI runner, ECS job, local workstation). The first removes work; the second is unavoidable when the recipe has to produce images for multiple targets, not just AWS.

The fourth is the test phase. A bake without a test before publish is a way to ship breakage to a fleet. The minimum is “launch from the new image, run a smoke test, gate distribution on the result”. A tool with that phase built in is doing real work; a tool without it needs the team to bolt on the harness themselves.

The fifth is multi-Region distribution. Once the image is built, it has to land in every Region the fleet runs in, and old images need to be deprecated on a timeline. Either the tool supports that natively (a distribution config the team writes once) or it’s a post-processor and a script and somebody’s calendar reminder.

And sixth, the deprecation story. Building new images is one half; retiring old ones is the other. Without an automated deprecation policy, “the AMI catalogue” becomes a graveyard of out-of-date images launch templates can still reference. The mechanism for marking old images deprecated on a cadence belongs in the same toolchain that built them, otherwise the two halves drift.

What we’ll filter on

  1. Managed pipeline: does the tool own scheduling, orchestration, and distribution?
  2. Test-after-bake: does it invoke tests on a booted instance before publishing?
  3. Multi-Region distribution: native, or glued on with post-processors?
  4. AMI deprecation policy: can it mark old images deprecated on a cadence?
  5. Recipe portability: does the recipe work outside this tool, or is it vendor-native?
  6. Cross-cloud: does it produce images for more than AWS?

The image-build landscape

  1. Shell script on a long-lived EC2 instance (status quo). Artisanal. Works; doesn’t reproduce; doesn’t schedule; doesn’t distribute. Where every shop starts and most don’t leave until the CVE forces the move.

  2. AWS EC2 Image Builder. Managed pipeline. Components, recipes, pipelines, distribution, test integration. AWS owns the builder infrastructure; the team owns the YAML. Multi-Region and multi-account distribution first-class. Deprecation and retirement on a schedule. Native to AWS and AWS only.

  3. HashiCorp Packer (standalone). Template-based, open-source builder. Multi-cloud (AWS, Azure, GCP, VMware, Docker, QEMU). Multi-Region within AWS via ami_regions. No native pipeline: the team schedules runs themselves and wires up the build environment (Lambda, CodeBuild, a CI runner, or a human).

  4. Packer orchestrated by CodeBuild/CodePipeline. Packer running inside a CodeBuild project, triggered by CodePipeline on a git push or schedule. Adds the pipeline layer Packer lacks natively. More pieces, more flexibility, portable between clouds.

  5. Packer inside Image Builder’s Custom Components. You can invoke Packer from an Image Builder component, although in practice this rarely makes sense. If the recipe is in Image Builder, Packer is just extra scaffolding.

  6. Third-party tools (Nixpkgs, Bakery, BuildKit for images). Adjacent. Each solves a narrower slice; rarely the answer for a team that already has AMIs.

Side by side

Option Managed pipeline Test-after-bake Multi-Region native Deprecation policy Recipe portable Cross-cloud
Shell script on EC2
EC2 Image Builder AWS-native
Packer standalone ✗ (team’s problem) ✗ (custom) ✓ (builder feature) ✗ (custom)
Packer + CodeBuild/Pipeline ✓ (team builds) ✓ (team builds) ✓ (team builds)
Packer inside Image Builder Limited Limited

Reading by constraint:

  • AWS-only shop, wants AWS to own the orchestration. Image Builder. The pipeline resource, the distribution configuration, and the deprecation rules are all first-class.
  • Multi-cloud shop, portable recipes matter. Packer (standalone or in a CI pipeline). The same HCL template can build AMIs today and Azure images tomorrow.
  • Existing team with Packer expertise and CodePipeline pipelines. Packer in CodeBuild is a reasonable extension; Image Builder would be a second tool to learn.

Acme is an AWS-only shop with no existing Packer usage. Image Builder is the lower-friction answer. The correct question isn’t “which is objectively better” but “which matches the shape of the organisation.”

How Image Builder lays out

Image Recipe parent AMI + components parent: al2023-x86_64 (Managed) BUILD components: update-linux (Managed) acme-base-packages cloudwatch-agent-install TEST components: simple-boot-test (Managed) Infrastructure Configuration where the build instance runs VPC: vpc-builder Subnet: private InstanceType: m6i.large InstanceProfile: ImageBuilderRole Security groups: sg-outbound-only SNS topic: build-notifications Log group: /aws/imagebuilder/acme-base TerminateInstanceOnFailure: true Distribution Configuration where the AMI ends up Regions: eu-west-1 us-east-1 ap-southeast-2 AMI name: acme-base-{date}-{version} Tags: { Owner=platform, Release=weekly } Launch perms: share with prod, staging AMI deprecation: 45 days after publish Retention: 10 AMIs per Region Image Pipeline: acme-base-weekly schedule: cron(0 6 ? * SUN *) · build on parent update: true ties recipe + infra + distribution together; triggers builds Build phase 1. launch from parent AMI 2. run BUILD components 3. snapshot → AMI Test phase 4. launch from new AMI 5. run TEST components 6. pass → distribute, fail → stop acme-base-2027-06-06-v12 → eu-west-1 + us-east-1 + ap-southeast-2 previous AMI auto-marked deprecated after 45 days
Four resources compose an Image Builder pipeline: the recipe (what to bake), the infrastructure configuration (where to bake it), the distribution configuration (where to ship it), and the pipeline itself (when to run). Each is versioned independently.

The recipe in depth

A component is a YAML document. The key structure is a phases list (build, validate, test) each with steps:

name: acme-base-packages
description: Install Acme's base package set on Amazon Linux 2023.
schemaVersion: 1.0
phases:
  - name: build
    steps:
      - name: UpdateSystem
        action: ExecuteBash
        inputs:
          commands:
            - sudo dnf upgrade -y

      - name: InstallPackages
        action: ExecuteBash
        inputs:
          commands:
            - sudo dnf install -y jq awscli2 rsync fail2ban
            - sudo systemctl enable fail2ban

      - name: InstallCustomRepoGpgKey
        action: ExecuteBash
        inputs:
          commands:
            - sudo rpm --import https://repo.acme.internal/RPM-GPG-KEY-acme

  - name: validate
    steps:
      - name: VerifyPackages
        action: ExecuteBash
        inputs:
          commands:
            - rpm -q jq awscli2 rsync fail2ban
            - systemctl is-enabled fail2ban

  - name: test
    steps:
      - name: BootAndReachable
        action: ExecuteBash
        inputs:
          commands:
            - systemctl is-system-running --wait
            - curl -fsS https://repo.acme.internal/healthz

Three design points matter. Each step has an action from a managed set (ExecuteBash, ExecuteDocument, S3Download, SetRegistry, etc.) rather than raw shell, which constrains what a component can do and keeps it auditable. The validate phase runs on the build instance before the AMI is created; a non-zero exit fails the build and no AMI is produced. The test phase runs on a fresh instance launched from the built AMI; it verifies the image boots cleanly and still meets expectations, after which the AMI is published.

A recipe references components by versioned ARN:

arn:aws:imagebuilder:eu-west-1:111122223333:component/acme-base-packages/1.0.3

Bumping a component’s version and triggering the pipeline rebuilds the image with the new recipe. The pipeline’s Schedule block includes a PipelineExecutionStartCondition of EXPRESSION_MATCH_AND_DEPENDENCY_UPDATES_AVAILABLE, which means: run on the schedule, and run whenever any component’s version changes or the parent AMI has a new version available. That second condition is the CVE-response lever. When Amazon publishes a patched Amazon Linux AMI, the pipeline rebakes within the schedule’s resolution.

Distribution in depth

The distribution configuration is where the multi-Region and deprecation policy live:

aws imagebuilder create-distribution-configuration \
  --name acme-base-dist \
  --distributions '[
    {
      "region": "eu-west-1",
      "amiDistributionConfiguration": {
        "name": "acme-base-{{ imagebuilder:buildDate }}-v{{ imagebuilder:buildVersion }}",
        "launchPermission": { "userIds": ["444455556666", "777788889999"] },
        "targetAccountIds": ["444455556666", "777788889999"],
        "amiTags": { "Owner": "platform", "Family": "acme-base" }
      }
    },
    { "region": "us-east-1", "amiDistributionConfiguration": { ... } },
    { "region": "ap-southeast-2", "amiDistributionConfiguration": { ... } }
  ]'

The AMI is built in the pipeline’s home Region, copied to each distribution Region (Image Builder does the CopyImage internally), shared with the listed accounts, and tagged.

Deprecation is a separate concept. AWS’s DeprecateImage marks an AMI as deprecated on a timeline; the AMI remains launchable but is hidden from default DescribeImages listings and flagged when used. Image Builder can set a deprecation date automatically via the recipe’s ImageScanningConfiguration or via a lifecycle policy attached to the image family. A policy that deprecates at 45 days and retires (deletes) at 90 days gives teams enough time to catch up without letting ancient AMIs linger.

A worked CVE response

09:00. CVE announced. Amazon publishes a new Amazon Linux 2023 parent AMI with the patched kernel at 11:40.

The acme-base-weekly pipeline has pipelineExecutionStartCondition: EXPRESSION_MATCH_AND_DEPENDENCY_UPDATES_AVAILABLE enabled. Image Builder polls for parent updates every few hours. At approximately 15:30, the pipeline triggers automatically.

  • 15:30. StartImagePipelineExecution called internally. New build instance launches in the builder VPC.
  • 15:31. UpdateSystem step runs dnf upgrade. Kernel patched.
  • 15:34. Other BUILD components run (package install, config, CloudWatch agent).
  • 15:38. validate phase: every assertion passes.
  • 15:40. AMI acme-base-2027-06-06-v13 created in eu-west-1.
  • 15:42. Test instance launched from v13. test phase components run; all pass.
  • 15:45. Distribution starts: CopyImage to us-east-1 and ap-southeast-2, launch permissions granted to prod and staging accounts.
  • 15:52. All three Regions have v13. SNS event fires. The platform team’s alerting picks up the successful-publish event.

15:53. Services’ launch templates reference the /acme/ami/base Parameter Store parameter. A small Lambda subscribed to the Image Builder SNS topic updates the parameter to the new AMI ID per Region. New instances launching after 15:53 come up on v13. Running instances continue on v12 until they’re next replaced (ASG rolling replacement, deploy, or scale event).

15:55. Previous AMI v12 is marked deprecated for 45 days from now, in line with the lifecycle policy. It remains launchable for the ASG’s existing launch configurations that reference it, but any new references to v12 get a deprecation warning.

End-to-end: 25 minutes from AWS publishing the patched parent to Acme’s three-Region fleet having the new AMI available, with no human in the loop. The same response on the status-quo shell-script pipeline would have taken a day or two and probably skipped the smoke tests.

When Packer is still the correct answer

A few shapes where Packer wins even inside AWS.

Multi-cloud commitment. Even if 95% of the estate is AWS, a second cloud with its own image-bake requirement changes the calculus. The same HCL template and the same provisioner scripts can target amazon-ebs and azure-arm builders.

Complex provisioning with Ansible / Chef / Puppet. Image Builder’s ExecuteBash / ExecuteDocument / ExecuteAnsible action set is fine for many cases but Packer’s first-class ansible provisioner integrates more cleanly with an existing Ansible playbook library.

Tight coupling to an existing CI pipeline with rich triggers. If CodePipeline or Jenkins already drives image builds with custom approval gates and artefact handoff, Packer inside that pipeline is a smaller step than introducing Image Builder as a parallel orchestration.

For Acme, none of these hold. The team has no Ansible, no second cloud, and no CI-driven image pipeline today. Image Builder is the lower-complexity answer.

What’s worth remembering

  1. Image Builder is the pipeline; Packer is the recipe. Image Builder owns scheduling, orchestration, testing, and multi-Region distribution as AWS resources. Packer builds the image but delegates the pipeline shape to the team.
  2. Four Image Builder resources. Component, Recipe, Infrastructure Configuration, Distribution Configuration, composed by an Image Pipeline. Each is versioned independently.
  3. Components are declarative YAML with managed actions. ExecuteBash, S3Download, SetRegistry, and friends; not raw scripts. Easier to audit than shell.
  4. test phase runs on a fresh instance from the new AMI. Gate before distribution. Failures stop the pipeline; no untested AMI ever reaches target Regions.
  5. EXPRESSION_MATCH_AND_DEPENDENCY_UPDATES_AVAILABLE is the CVE lever. Pipelines rebake automatically when any referenced component or parent AMI updates, without waiting for the next scheduled run.
  6. Distribution Configuration handles multi-Region, multi-account, tagging, launch permissions, and deprecation. One resource, one place to change cross-Region policy.
  7. Packer wins when portability or existing investment matters. Multi-cloud shops, teams with deep Packer or Ansible tooling, and pipelines already in CodeBuild/Jenkins have legitimate reasons to prefer it.
  8. Parameter Store + SNS closes the loop. Pipeline publishes SNS on success; a small Lambda updates Parameter Store with the new AMI ID; launch templates resolve it on next instance launch. Fleet moves without a deploy.

The choice for Acme is Image Builder: less surface area, AWS-owned orchestration, and a pipeline that rebakes on its own when upstream patches ship. The shell script on the bastion retires. The CVE response goes from “a day of scrambling” to “a 25-minute automated build the platform team notices on Monday morning because of the SNS email.” The recipe lives in git; the pipeline lives in AWS resources; the AMI IDs live in Parameter Store. Three places, clear ownership, fresh AMIs.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.