Choosing a Container Runtime for Mixed Workload Shapes on AWS

August 09, 2028 · 13 min read

The situation

A team runs ten services, all packaged as container images pushed to ECR. The shapes vary:

api-gateway-edge. HTTP API fronting the public product. ~2,000 RPS sustained, occasional spikes to 5,000 RPS. Needs response times <100ms p99. Always has traffic; no cold-start tolerance.
worker-background. A long-running worker that polls SQS, processes messages, writes to Postgres. Average 4 messages/sec, bursts to 40. Doesn’t serve HTTP. Runs 24/7.
webhook-receiver. Receives webhooks from a third-party SaaS. ~800 invocations a day, bursty (half of them arrive in the same 15 minutes after a batch job runs somewhere else). Must respond within 30 seconds.
image-thumbnailer. Invoked by S3 event on object-put. ~2,000 events/day; each resizes an image, writes back. Tolerates 5 seconds of cold start.
internal-admin-ui. A React app with a Node backend, used by 20 engineers, low traffic, intermittent. Needs access to private RDS.
Five other internal services of similar low-traffic profile.

The team currently runs all ten on EC2 behind an ALB. Right-sizing has become a quarterly chore; the webhook receiver runs a t3.medium that sits at 0.1% CPU most of the time. The finance partner wants the obvious wins without asking every team to learn a new deployment model.

What actually matters

Container runtimes on AWS share a surface, give me a container image, I’ll run it, and diverge sharply below the waterline. Three properties decide which runtime a given application belongs on.

The first is billing shape. An always-on per-second billing model is honest for a steady workload and punitive for an idle one; a per-millisecond billing model is honest for bursty workloads and punitive for sustained ones; a hybrid that charges a floor plus active-request time is honest for low-traffic HTTP and indifferent for everything else. The shape of the bill should match the shape of the load. Mismatched is what right-sizing fatigue feels like.

The second is what the runtime owns. A runtime that runs containers indefinitely owns the long-running case but makes you pay for idle. A runtime that scales to zero owns the bursty case but introduces a cold-start penalty when traffic resumes. A runtime that caps execution time can’t host a worker process at all. The first cut between options is workload duration: sub-15-minute event handlers are one shape, indefinite services are another, and 24/7 polling workers are a third.

The third is what the runtime makes easy. Some let you bring any port, any protocol, any sidecar; others insist on a single HTTP endpoint per service in exchange for a near-zero control plane. Networking shape matters too: reaching a private database from a runtime that owns its ENIs natively is one configuration; reaching it from a runtime that bolts on connectivity is another.

The decisions worth making per-application:

Request rate and idle time. A 24/7 service has no idle time worth reclaiming; per-millisecond billing’s advantage disappears. A 40-requests-a-day webhook has 99.9% idle time; the per-millisecond shape is cheapest by orders of magnitude.
Cold-start tolerance. A user-facing API cannot tolerate a multi-second cold start. A webhook with a 30-second SLA can.
Execution duration. Event-driven runtimes cap execution time. A long-running worker process is not that shape; it belongs on an always-on runtime.
Networking requirements. “Access to a private database” needs VPC integration; the runtimes vary in how heavyweight that integration is, and whether it costs concurrency or cold-start time.
Protocol. Event-driven runtimes front HTTP via an API gateway but don’t handle arbitrary TCP. Always-on container runtimes handle anything. HTTP-only runtimes cover web services and nothing else.

What we’ll filter on

Filtering against:

Request pattern fit, does the billing model match the application’s rhythm?
Cold-start behaviour, does the application tolerate the startup penalty?
Duration ceiling, can tasks run indefinitely or is there a hard cap?
VPC integration, reachable from private subnets, reachable to RDS and internal services.
Operational surface, how much control-plane does the team manage?

The container-runtime landscape

1. App Runner. HTTP service container runtime. Pros: near-zero ops, auto-scales from a floor (default min 1) to a ceiling (default max 25, configurable to hundreds), handles TLS, handles deployments. Cons: HTTP-only, one container per service, pricing floor not zero, opinionated around request-response patterns.

2. ECS on Fargate. General-purpose container runtime. Pros: arbitrary containers, any port, any protocol, Fargate-native VPC ENIs, service and cluster abstractions. Cons: task definitions + services + load balancer target groups + IAM task roles + execution roles = meaningful control-plane to understand; always-on pricing when tasks are running.

3. Lambda (container image). Event-driven compute, up to 15-minute execution. Pros: pay only when invoked, integrations with most AWS event sources (S3, EventBridge, SQS, Kinesis, API Gateway, DynamoDB Streams), fast scale. Cons: 15-minute cap, cold starts, 10-GB image size limit, VPC cold-start cost, execution environment is Lambda’s custom runtime.

4. EKS on Fargate. Kubernetes on top of Fargate. Pros: Kubernetes API, existing tooling, per-pod Fargate isolation. Cons: EKS control plane charge ($0.10/hr, ~$73/month), Kubernetes operational overhead. Out of scope here because the team doesn’t need Kubernetes for ten services.

5. ECS on EC2. Containers on self-managed EC2 cluster capacity. Rejected because the team already has “EC2 right-sizing fatigue.”

6. Lambda Function URLs + zip. HTTPS endpoint without API Gateway. Simplest public endpoint; fine for webhooks. Still subject to Lambda’s cold starts and 15-minute cap.

Side by side

Option	Request pattern	Cold start	Duration ceiling	VPC integration	Ops surface
App Runner	Steady HTTP	Low (warm-pool)	None	VPC Connector	Minimal
ECS Fargate	Any	None (always running)	None	Native ENI	Moderate
Lambda (image)	Bursty / event-driven	100ms–several s	15 min	ENI pool (cold-start cost)	Minimal
EKS Fargate	Any	None	None	Native ENI	High (K8s)
Lambda (zip)	Event-driven	Low (small runtime)	15 min	ENI pool	Minimal

No single runtime wins. The pick is per-application, driven by the five attributes.

Mapping applications to runtimes

Each application picks its runtime by answering three questions: is it long-running, is it HTTP, is it OK with a cold start.

The picks in depth

api-gateway-edge → ECS on Fargate. 2,000 RPS sustained, spikes to 5,000, <100ms p99: this is the textbook always-on workload. Lambda’s pricing advantage depends on idle time; there isn’t any. Fargate runs the container behind an ALB target group, autoscaling on CPU or request count. A service with desired=10, min=8, max=40 gives headroom for spikes. App Runner could do it, its upper limit is hundreds of instances per service now, but ECS’s ALB routing, connection-draining, blue/green deployment, and operational tooling are the known quantities for a high-traffic public API.

worker-background → ECS on Fargate. The shape is long-running, non-HTTP, polling SQS. Lambda could process each message, but 24/7 operation at 4 msg/sec is 345,600 invocations a day. Lambda’s pricing advantage flips once there’s no idle time. A Fargate service with desired=2 running the worker container (no load balancer, no ingress) polls SQS continuously, scales on queue depth via a CloudWatch alarm driving ECS Service Auto Scaling. The worker’s code doesn’t change between EC2 and Fargate; the deployment does.

webhook-receiver → Lambda. 800/day, bursty, 30-second SLA on response. This is Lambda’s sweet spot. A Function URL gives an HTTPS endpoint without API Gateway; the cold start of ~500ms on Node.js is well inside the SLA. Idle cost: zero. If the bursts occasionally cause throttling (default account concurrency is 1,000 in most Regions), reserved concurrency of 50 per function keeps noisy neighbours at bay. VPC integration only if the webhook handler needs private-network reach, which it doesn’t here.

image-thumbnailer → Lambda (container image). S3 event on object-put invokes the function; the container includes ImageMagick or similar, which exceeds the 250 MB zip-deployment limit, so container image deployment is the ergonomic path (10 GB limit). Function memory set to 1024 MB gives more CPU (Lambda allocates CPU proportional to memory) and faster processing; execution time per image under a second keeps costs negligible.

internal-admin-ui → App Runner. Low-traffic HTTP service that needs to reach private RDS. App Runner with a VPC Connector attaches the service to a VPC so outbound traffic can reach RDS. Min instances set to 1 to avoid cold starts for engineers; scale ceiling low (max 3) because 20 engineers don’t need 10 instances. TLS and ALB are handled by App Runner; no service configuration beyond “image URI, CPU/memory, min/max, VPC connector.”

internal-service-X (the other five) → App Runner. Same pattern; each a small HTTP service with minimal dependencies. App Runner’s reduced control-plane is the operational win: no ALB to configure, no target group health check syntax to get correct, no task definition revisions to manage. For teams that want “give me an HTTP endpoint for this container,” App Runner is the fewest moving parts.

Pricing trade-off worked per application

A quick calculation for three of the apps at steady-state monthly rates (Linux x86, eu-west-1 approximate):

api-gateway-edge (Fargate, 10 tasks × 1 vCPU × 2 GB):

10 × 730 h × ($0.04048/vCPU-h + 2 × $0.004445/GB-h)
= 10 × 730 × $0.04937
≈ $360/month

webhook-receiver (Lambda, 800 invocations/day × 200ms × 512 MB):

24,000 invocations/month × 0.2 s × 0.5 GB × $0.0000166667/GB-s
+ 24,000 × $0.0000002
≈ $0.04/month

Plus Lambda’s monthly free tier (400,000 GB-s and 1M requests) covers this completely. Against an EC2 t3.medium dedicated to the webhook (~$30/month), that’s a hundredfold saving.

internal-admin-ui (App Runner, min 1, 1 vCPU × 2 GB, avg 0.2 concurrency):

provisioned: 730 h × $0.007 (0.25 vCPU provisioned-memory equivalent)
active-request: 730 h × 0.2 × ($0.064/vCPU-h) × 1 vCPU
≈ $10–15/month

The EC2 t3.medium it replaces: ~$30/month. The savings come from App Runner’s concurrency-based billing, paying for active requests plus a small floor, not the full instance-hour.

What’s worth remembering

No runtime is “the correct answer” for all containers. Pick per-application based on request pattern, duration, cold-start tolerance, and networking.
Lambda wins when idle time is the majority. Webhook receivers, event-driven workloads, S3 triggers. Loses when a service runs 24/7 at steady load, the per-invocation pricing adds up.
Fargate wins for long-running, high-throughput, or non-HTTP. ALB-fronted APIs with sustained traffic, SQS pollers, batch workers, anything over 15 minutes.
App Runner wins for low-traffic HTTP with minimal ops. The VPC Connector makes private-subnet reach simple; the concurrency-based billing cleans up idle-heavy internal services.
Container image support on Lambda is up to 10 GB. Enough for most image-processing, ML inference, and heavy-dependency workloads that exceed the 250 MB zip limit.
VPC integration has different shapes. Lambda uses Hyperplane ENIs with cold-start implications; App Runner uses a VPC Connector; Fargate attaches an ENI to the task directly.
App Runner’s min-instance floor is not zero by default. A service with min=1 always costs something; a service with min=0 can scale to zero at the cost of cold starts. Same trade Lambda makes, expressed as configuration.
The answer is usually “all three.” A ten-service estate typically ends up with a mix, and that mix is the cost-optimisation outcome, not a loss of consistency.

The same ECR image can run on all three services; the bill and the operational surface are different depending on which one it lands on. Matching the application’s rhythm, always-on, bursty, event-driven, to the runtime that bills that rhythm favourably is the work. Forcing ten applications onto one runtime is a second, harder job that nobody asked for.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.