Lambda Cold Starts Under a P99 Budget

The situation

A SaaS team runs a JSON API behind Amazon API Gateway (REST) with a single Node.js 20 Lambda function as the sole integration. The function reads from DynamoDB, calls a third-party billing service, and returns a response in roughly 80 ms when warm. After any idle period of a few minutes, the next request takes 4.2 seconds to return, the execution environment has been reaped and a new one has to initialise.

Traffic is bursty through the working day on weekdays (5-20 requests every two to four minutes between 09:00 and 18:00 UTC) and near silent overnight and at weekends, broken only by a handful of scheduled health checks. The product team has written 500 ms P99 at the API Gateway access log into the product specification. The platform team is two people and a generous slice of on-call; they would rather not multiply the current Lambda bill by ten, and they would very much rather not add a second thing to page someone about.

What actually matters

Before reaching for a pricing page, it helps to look at what’s actually being traded. The 4.2-second first-request tax is entirely in the INIT phase: Lambda provisioning a microVM, downloading the deployment package, starting the language runtime, and running module-scope code before the handler ever sees the event. Every lever in this corner of the service either shortens INIT, hides it behind a pool of pre-warmed environments, or moves it out of the critical path entirely. The question is which of those trades the team can afford, in dollars, in operational weight, and in runtime constraints.

Ownership sits with the platform team, and the platform team is small. That pushes hard against any lever that needs a second thing watching a first thing: a fleet of warmer Lambdas on EventBridge, a bespoke scale-up state machine, a cache pre-warmer keyed on deploys. The least-ops answer is one that AWS manages and the team configures.

Blast radius is narrow by construction, one function, one alias, one integration, but cost blast radius is real. A Lambda bill near zero is easy to inflate to a few hundred dollars a month with a sufficiently aggressive solution. The team cares about the multiple, not the absolute number; tenfold is a no, threefold is a conversation, similar-order is fine.

Cost shape is the big one. On-demand Lambda is pay-per-use; provisioned concurrency is pay-for-capacity even when nobody’s calling. A static pool sized to business-hours peak costs the same overnight as at 15:00 on Wednesday. The workload is bursty, which is the correct shape for scheduled capacity, expensive when you need it, cheap when you don’t.

Failure modes matter because any cold-start fix hides some cold starts, not all of them. Deployment rollovers, spillover beyond the warm pool, autoscale-up events, a fresh environment replacing one that misbehaved: all of those still run INIT. The defence-in-depth question is whether, when a cold start happens anyway, the result is 700 ms or 4,200. A trim init path matters independently of whatever else is in place.

Coupling is the quiet constraint. Some of the cleanest cold-start features only apply to certain runtimes. Java, Python, or .NET. The team has a Node.js codebase and isn’t rewriting it. A lever that requires a language migration is a non-starter, not a good answer to a different question.

What we’ll filter on

Distilling those concerns into filters the candidates can be scored against:

Predictable P99 under 500 ms including the first request after quiet periods. Anything that only helps some cold starts is insufficient, the burst pattern guarantees the cold request is the one a user is waiting on.
Cost-effective for sparse traffic. The function is mostly idle. Solutions that bill for 24/7 capacity at full warm-pool size will dominate the Lambda line item.
Low operational overhead. No custom warmer Lambdas on EventBridge schedules, no bespoke state machines, no second thing to page someone about.
Compatible with Node.js 20. Some of the best cold-start features only apply to Java, Python, or .NET. The team has a Node.js codebase and isn’t rewriting it.

The Lambda cold-start landscape

Eight levers, walked in order.

Runtime choice. Node.js and Python INIT typically lands in 150-400 ms before user code; Go is similar or a little faster; Java and .NET can take 800 ms to well over two seconds because the JVM or CLR starts, classes load, and frameworks resolve dependency graphs. Switching runtime is a rewrite, and the scenario already sits on Node.js, near the fast end of the field.
Architecture: arm64 (Graviton2) vs x86_64. arm64 Lambda runs on Graviton2, priced roughly 20% cheaper per GB-second with up to 34% better price-performance for typical workloads. The larger per-vCPU L2 cache shaves a little off init for module-heavy Node.js. A worthwhile tweak, 10-20% off cold-start duration and 20% off the bill, but on its own it turns a 4.2 s start into a 3.5 s start. Good support act, not the lead.
Memory size. Lambda allocates CPU proportionally to memory. 128 MB gets a fraction of a vCPU; 1,769 MB is approximately one full vCPU; 10,240 MB is roughly six. Init is largely single-threaded, so CPU share matters, bumping 128 MB to 1,024 MB often halves cold-start duration. Above ~1,769 MB returns flatten for Node.js init. Higher memory costs more per GB-second but runs fewer seconds, so the bill effect is often neutral for short handlers.
Package size and init code. Module-scope code runs on every cold start. The v2 AWS SDK monolith at module scope loads 10+ MB of JavaScript; the v3 modular SDK loads only what’s used and is dramatically faster. The deployment package also has to be pulled from S3 and unpacked, a 200 MB bundle is slower to provision than a 2 MB one. Trimming imports, tree-shaking with esbuild, and deferring heavy work (secrets fetches, JWKS downloads) out of module scope can take seconds off the worst cold starts.
VPC configuration. Pre-2019, attaching a Lambda to a VPC added 10+ seconds while a dedicated ENI was created. That was replaced by the Hyperplane ENI model, shared, pre-provisioned interfaces attached at invocation time. The ongoing penalty is now tens of milliseconds. For a function exercised daily through API Gateway, VPC no longer dominates cold-start.
SnapStart. Lambda snapshots the initialised execution environment after a version is published and restores from it on invocation, skipping most of INIT. Typical reduction: seconds to sub-second. The caveat: SnapStart supports Java 11+, Python 3.12+, and .NET 8+ only. Node.js is not supported.
Provisioned concurrency. Pre-allocate N initialised environments that Lambda keeps warm and hands to requests first. Requests landing on PC skip INIT entirely. Pricing has a capacity charge billed per-GB-second whether anything’s calling or not, and an execution-duration charge at a lower per-GB-second rate than on-demand when requests actually run on PC environments. Scaled via Application Auto Scaling, either on a schedule or on target-tracking against LambdaProvisionedConcurrencyUtilization.
Scheduled warmer Lambdas. The folk remedy: an EventBridge rule every four minutes invoking the function with a no-op payload. Works for one concurrent execution; the second concurrent request arriving at the same instant cold-starts anyway. Extra Lambda, extra rule, extra failure mode. Superseded by provisioned concurrency.

Side by side

Lever	P99 under 500 ms	Cheap when idle	Low ops	Node.js 20
Runtime swap	,	,	,	✗
arm64 / Graviton2	✗	✓	✓	✓
Memory tuning	✗	✓	✓	✓
Package / init trimming	✗	✓	✓	✓
VPC reshuffle	,	✓	✓	✓
SnapStart	✓	✓	✓	✗
Provisioned concurrency (scheduled)	✓	✓	✓	✓
Warmer Lambdas on EventBridge	✗	✓	✗	✓

Matching levers to the workload

Five levers are eliminated by the filters; one survives as the lead; arm64, a sensible memory tier, and a disciplined init path work alongside it as supporting cast.

Provisioned concurrency, in depth

Three things are worth nailing down: how PC attaches, how Application Auto Scaling drives it, and how the request router chooses between warm-PC and on-demand.

Attachment. PC is configured on a specific function version or alias, never $LATEST. The normal pattern is to publish a version, point the prod alias at it, and configure PC on the alias. API Gateway integrates with the alias, so every request benefits from the warm pool. When a new version ships, PC is configured on the new version first, the alias is flipped, the old version’s PC is torn down. Zero cold starts during deployment if you stage it that way.

Application Auto Scaling. Register the alias as a scalable target and attach either a scheduled action or a target-tracking policy. For time-of-day patterns exactly like this scenario, scheduled scaling is usually simpler:

aws application-autoscaling register-scalable-target \
  --service-namespace lambda \
  --resource-id function:api:prod \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --min-capacity 1 \
  --max-capacity 20

aws application-autoscaling put-scheduled-action \
  --service-namespace lambda \
  --resource-id function:api:prod \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --scheduled-action-name business-hours-up \
  --schedule "cron(45 8 ? * MON-FRI *)" \
  --scalable-target-action '{"MinCapacity": 10, "MaxCapacity": 20}'

aws application-autoscaling put-scheduled-action \
  --service-namespace lambda \
  --resource-id function:api:prod \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --scheduled-action-name business-hours-down \
  --schedule "cron(15 18 ? * MON-FRI *)" \
  --scalable-target-action '{"MinCapacity": 1, "MaxCapacity": 5}'

A floor of 1 overnight means the single request that wakes the function at 04:00 UTC still doesn’t cold-start. A floor of 10 during business hours covers the typical burst of 5-20 concurrent requests; target tracking on top adds headroom for unusual spikes.

Request routing. A request arriving at the alias is routed to a warm PC environment if one is available. If all PC environments are busy, Lambda spills over to on-demand, and those requests cold-start. PC capacity therefore has to cover the P99 concurrent-requests number, not the mean, if spillover cold starts are unacceptable. Metrics to watch: ProvisionedConcurrencyUtilization, ProvisionedConcurrencySpilloverInvocations, and ProvisionedConcurrencyInvocations vs Invocations.

A worked example: one month of bill shape

Baseline: a 1,024 MB Node.js 20 function on arm64, running for an average of 80 ms warm. Business hours burn roughly 20,000 invocations per weekday; overnight and weekends burn roughly 500 per day. Monthly invocations land near 455,000; monthly GB-seconds of actual execution at 1.0 GB × 0.080 s come out near 36,400.

Option A: no provisioned concurrency. Request charges $0.09; duration at arm64 on-demand around $0.48. Total Lambda bill under a dollar. Every first-of-a-burst request runs the 4.2-second cold start. Attribute 1: fails.

Option B: provisioned concurrency, scheduled. Floor of 10 for nine hours × 22 weekdays (198 hours per month) and floor of 1 for the remaining 546 hours. Provisioned capacity is about 9.09 million GB-seconds at ~$0.00000333 per GB-second, around $30.30. PC execution duration on the 36,400 GB-seconds that actually run on PC is about $0.28. Request charges unchanged at $0.09. Total Lambda bill around $30.70. P99 under 100 ms for any request landing on the warm pool.

Option C: provisioned concurrency, 24/7 flat at 10. Capacity is about 26.78 million GB-seconds at ~$0.00000333, roughly $89.20. Same duration and request charges. Total near $89.60, for no user-visible improvement over Option B.

Option B is roughly 3× cheaper than Option C for the same user-facing P99, because 75% of the month is overnight or weekend and the floor drops to 1. Scheduled scaling pays for itself within hours of being configured.

What’s worth remembering

INIT runs module-scope code, runtime startup, and handler class loading on every new environment. Warm invocations skip it; cold ones don’t. Cold starts are visible at the 99th percentile for sparse workloads.
Provisioned concurrency keeps N environments pre-initialised on a specific function version or alias, never on $LATEST, and spillover beyond N falls through to on-demand, which cold-starts.
SnapStart supports Java 11+, Python 3.12+, and .NET 8+ only. Node.js, Ruby, Go, and container images are out. Java incurs no extra SnapStart charge; Python and .NET have a cache-per-GB-second fee plus a per-restore fee.
Application Auto Scaling for Lambda PC supports scheduled (cron) and target-tracking scaling on LambdaProvisionedConcurrencyUtilization. Scheduled fits time-of-day patterns; target tracking fits reactive load. They combine.
arm64 is roughly 20% cheaper per GB-second than x86_64 with comparable or better performance for most workloads, applies to both on-demand and provisioned concurrency pricing.
Memory controls CPU. More memory means more CPU share, which makes single-threaded INIT finish faster. Diminishing returns above ~1,769 MB for Node.js.
VPC cold-start cost is near zero now thanks to Hyperplane ENIs. Reaching for “take it out of the VPC” as a first-line cold-start fix is usually out of date.
Trim init regardless. Even with PC, spillover, deployments, and scale-up events still run INIT. Cutting INIT from 4.2 s to 700 ms makes every cold start that does happen survivable.
ProvisionedConcurrencySpilloverInvocations and ProvisionedConcurrencyUtilization tell you whether the pool is the correct size. Spillover above zero means some users are paying the cold-start tax the pool was supposed to absorb.
The pricing model matters more than the cleverness. A scheduled floor that drops overnight is roughly a third of the bill of a flat 24/7 pool at the same peak, for the same user-visible behaviour.