Choosing CodeBuild Compute: On-Demand, Reserved Fleets, or Lambda

August 16, 2028 · 14 min read

The situation

A platform team supports CI for ~40 projects across three languages (Node, Python, Go), one Region (eu-west-1):

Total build volume: ~2,000 runs/day, ~14,000 runs/week.
Peak load: Monday 09:00–09:15 UTC when accumulated weekend PRs merge; queue depth spikes from 0 to 80 in minutes.
Average build duration: 4 minutes; p99: 12 minutes (monorepo integration builds).
Current setup: every project on the smallest shared on-demand compute tier; cache configured per project via S3 and a custom local-cache mode.

Pain points:

Monday-morning queue depth means a PR that merges at 09:05 doesn’t start building until 09:20, because the on-demand pool is cold and AWS is provisioning new instances. The 15-minute queue wait dwarfs the 4-minute build.
Low-frequency projects (15 of the 40) build 3-5 times a day; their cache warmth is destroyed between every run because CodeBuild never sees the same host twice.
A handful of projects are short, simple scripts that take 30 seconds to run but spend 60 seconds of provisioning. The provisioning overhead is larger than the work.

The asks:

Reduce queue time on Monday spikes without doubling the bill.
Improve cache hit rate for low-frequency projects.
Reduce provisioning overhead for projects where the provisioning itself is the dominant cost.
Keep cost proportional to usage. Reserved capacity that sits idle all weekend isn’t what the finance partner wants.

What actually matters

Build compute comes in three billing shapes, each with different pre-warming, queue, and cold-start properties.

The first is pay-per-build-minute on shared capacity. The provider hands you a fresh host when a build starts and bills only for the minutes a build runs. No standing charge; scales from zero instantly from a billing perspective; queue time during spikes is real because the host has to be provisioned, and caches are best-effort because the next build of a project may land on a different host. Honest billing for moderate, predictable workloads.

The second is pay-per-fleet-minute on dedicated capacity. You provision N hosts of a chosen size; the fleet is always running; billing tracks fleet existence, not per-build. Builds land immediately and local caches persist across runs on the same host. Pay for idle; save on queue time and cache warmth. Honest for high-frequency, cache-sensitive workloads.

The third is pay-per-millisecond on event-driven compute. Provisioning is near-zero because the substrate pre-warms runtimes; billing is per millisecond of execution. The trade is feature support (short execution caps, no privileged mode, no Docker-in-Docker), and that limits the set of projects it fits. Honest for short, frequent, low-ceremony builds.

The decision axes:

Build frequency. Frequent builds on dedicated capacity amortise the fleet’s idle cost across many runs; infrequent builds pay for idle time they don’t consume.
Build duration. Very short builds benefit most from per-millisecond billing because the per-minute shape over-charges via rounding. Long builds may exceed the per-millisecond runtime’s execution cap entirely.
Cache sensitivity. Projects whose builds depend heavily on warm caches benefit from host persistence. Projects that don’t cache much see less benefit.
Peak tolerance. Dedicated capacity eliminates queue time at peaks up to fleet size. Shared capacity handles any load but with variable queue time.
Privileged mode. Docker-in-Docker builds need privileged mode; event-driven compute doesn’t support it. Dedicated and shared capacity both do.

What we’ll filter on

Filtering:

Queue time at peak: how fast does a build start when the queue is deep?
Idle cost: how much does the fleet cost when nothing is building?
Cache warmth: how often does LOCAL cache actually help?
Cost efficiency for the build’s duration: over- or under-paying vs actual work?
Feature support: privileged mode, custom images, image size, memory tiers.

The CodeBuild fleet landscape

1. All on-demand, one size. Status quo. Scales from zero, pays for build time only. Fails on queue-time at peak and on cache warmth for low-frequency projects.

2. All reserved, sized for peak. Ten-instance reserved fleet sized for Monday 09:00. Zero queue time on peak, full cache warmth; pays for idle through 99% of the week. Finance won’t sign this.

3. Mixed: on-demand baseline + reserved for the monorepo + Lambda for small scripts. Each project type lands on the fleet that fits it. The monorepo’s eight-minute integration build uses reserved capacity for cache warmth; the 30-second linter uses Lambda compute; everything else stays on-demand. Adds configuration per project; matches cost to shape.

4. Reserved fleet sized below peak, with on-demand overflow. A smaller reserved fleet (say 4-6 instances) absorbs the baseline and the bottom of the Monday spike; any builds that can’t fit fall back to on-demand. Balances idle cost against queue time. The fleet’s size is the dial.

5. Lambda compute for all short builds. Projects with durations under 10 minutes and Lambda-compatible toolchains move to Lambda compute. Dramatic cost drop for quick builds; doesn’t help the monorepo.

6. Cross-account fleet sharing via AWS Resource Access Manager. A reserved fleet in one account shared with other accounts, so multiple teams’ builds amortise the fleet cost. Useful for organisations with many small teams; not helpful when one team owns all the fleet.

Side by side

Option	Queue time (peak)	Idle cost	Cache warmth	Cost efficiency	Feature support
All on-demand	High	Zero	Low	Per-minute	All
All reserved, peak-sized	Zero	High	High	Poor (idle-heavy)	All
Mixed per project	Low where it matters	Matches usage	High where configured	Matches shape	All
Reserved below peak + overflow	Low (mixed)	Moderate	High on fleet	Moderate	All
Lambda compute for short builds	Near-zero	Zero	N/A	Excellent (short)	No privileged, 15-min cap
Shared reserved fleet	Low	Split across users	High	Good at scale	All

The durable pick is the mixed model: per-project assignment to the fleet type that matches build shape. The secondary pick (reserved fleet sized to absorb Monday’s spike plus on-demand overflow) is valuable specifically for the monorepo’s cache warmth.

Three fleet shapes, three billing shapes

On-demand bills for each build; reserved bills for the fleet regardless of load; Lambda bills per millisecond. Match the shape to the project.

The picks in depth

Mixed-model assignment. Projects map to fleets based on three questions: how often do they build, how long do they take, and how warm does the cache need to be?

Frequent + long + cache-heavy (the monorepo) → reserved fleet. The monorepo’s integration build runs ~200 times a day, takes 8-12 minutes, and benefits enormously from a warm ~/.npm cache (drops from 5 minutes install to 40 seconds). A dedicated reserved fleet of 6 × BUILD_GENERAL1_LARGE absorbs the monorepo’s Monday-morning spike and keeps node_modules warm across runs. Cost: 6 × $0.02/minute × 60 × 24 × 30 ≈ $5,200/month. On-demand equivalent: 200 builds/day × 10 min × $0.02 × 30 ≈ $3,600/month. Reserved costs more in absolute terms but recovers through reduced build time (8 min → 3 min average) that feeds back into engineer productivity and shortens the merge-queue bottleneck.
Infrequent + short + minimal cache (the 15 small projects) → Lambda compute. A project that builds 3 times a day for 90 seconds pays about $0.003 per build on Lambda compute vs $0.04 on-demand (minute rounding + provisioning overhead). Aggregate monthly saving across 15 projects: ~$1,800. Limitations: no privileged mode (these projects don’t need it), no Docker-in-Docker builds, 15-minute execution cap (not a concern for 90-second builds).
Everything else → on-demand. The remaining ~24 projects with moderate build durations (2-6 minutes) and moderate frequencies (10-50 runs/day) stay on-demand. Per-minute billing matches their shape; cache warmth via S3 is acceptable; queue time on the Monday peak is inconvenient but not critical for these projects.

Reserved fleet for peak absorption (alternative). A simpler model: one reserved fleet sized for typical non-peak load (say 4 × MEDIUM), with on-demand overflow. The first 4 concurrent builds land on reserved (zero queue time, warm caches); the 5th onwards falls back to on-demand. Monday 09:05 with 80 queued builds: 4 start instantly on reserved, 76 queue on-demand and start as instances provision. The reserved fleet halves the Monday queue pain and pays for itself in cache warmth across the week.

CodeBuild’s fleet overflow is configured via fleet.fleetServiceRole and fleet.overflowBehavior = QUEUE (wait for a fleet slot) or ON_DEMAND (fall back). For Monday-morning responsiveness, ON_DEMAND is the right overflow: users get builds started, even if some wait for on-demand provisioning.

Compute-type sizing. CodeBuild’s tiers:

BUILD_GENERAL1_SMALL: 3 GB, 2 vCPU. Fine for most single-service builds.
BUILD_GENERAL1_MEDIUM: 7 GB, 4 vCPU. The default for Node/Python projects with moderate parallelism.
BUILD_GENERAL1_LARGE: 15 GB, 8 vCPU. Monorepo builds, TypeScript compilation, heavy test suites.
BUILD_GENERAL1_2XLARGE: 145 GB, 72 vCPU. Specialised (ML training, large compilation, video rendering).
BUILD_LAMBDA_*: Lambda memory tiers 1024 MB, 2048 MB, 4096 MB, 8192 MB, 10240 MB. Vertical scaling only; no horizontal parallelism within a single build.

Over-sizing is a cost-leak that’s easy to miss; under-sizing is a cause of slow builds that nobody attributes correctly. The finance dashboard and the build-duration dashboard both matter; a build that takes 8 minutes on SMALL and 2 minutes on LARGE is cheaper on LARGE (per-minute-of-LARGE × 2 < per-minute-of-SMALL × 8) despite the higher per-minute rate.

Migration order

Move short projects to Lambda compute first. Lowest-risk change; immediate cost reduction; no effect on other projects. ~15 projects, ~1 week of migration work including testing.
Right-size compute types based on actual CPU/memory utilisation. CloudWatch metrics for BUILD_COMPUTE_TYPE and build durations expose over- and under-sized builds. Adjust each project’s buildspec and re-measure.
Stand up the reserved fleet for the monorepo. Provision fleet, update the monorepo’s project config to target the fleet via fleet.fleetArn. Validate cache behaviour and run some peak-time experiments.
Evaluate reserved-with-overflow for the rest. After two weeks, review whether the on-demand projects benefit from reserved overflow. Sizing the general-purpose fleet based on observed Monday queue depth.

What’s worth remembering

Three fleet types, three pricing shapes. On-demand is per-build-minute with cold caches; reserved is per-fleet-minute with warm caches; Lambda is per-millisecond with no privileged mode and a 15-minute cap.
Cache warmth is the reserved fleet’s real product. The provisioning-time saving is visible; the cache-hit rate is where the wall-clock time drops. Projects that don’t benefit from LOCAL caches don’t need reserved capacity.
Lambda compute dominates for short builds. Sub-2-minute builds on Lambda cost an order of magnitude less than on-demand because per-minute rounding stops applying. The catch is feature support: no privileged mode.
Reserved with on-demand overflow is the peak-absorption pattern. Size the reserved fleet for the steady-state baseline plus a chunk of the spike; let overflow handle the long tail. The fleet size is the dial between idle cost and queue time.
Compute-type sizing matters more than fleet type. A 2x faster build on a 2x more expensive compute is the same cost; if it’s more than 2x faster, larger is cheaper. Measure first, pick second.
Fleet service role sets permissions for fleet-wide access. Reserved fleets need a service role that allows ECR pull, S3 cache access, KMS decrypt. On-demand uses the project’s build role; reserved adds a fleet-level role on top.
Cross-account fleet sharing via RAM. Large organisations can share one reserved fleet across many accounts; amortisation improves, blast radius widens. Appropriate past ~5 accounts, not before.
Build cost is usually not the bottom line. Engineer-time saved by faster builds and shorter queue times usually dwarfs the fleet cost itself. The finance partner cares; the engineering partner cares more.

Three fleets for three build shapes is not over-engineering; it is matching billing to usage. The monorepo’s reserved fleet earns its idle cost in cache warmth; the short scripts on Lambda stop over-paying for minute rounding; on-demand absorbs the unpredictable middle. The merge-storm queue stops being a 15-minute tax every Monday, because the reserved fleet absorbs the first wave and on-demand carries the rest.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.