The situation
A payments team runs three workloads and the platform lead has asked the question every platform lead eventually asks: what’s the correct way to define our Lambda infrastructure so new services don’t each invent their own?
- A receipts service: four Lambdas, an API Gateway, a DynamoDB table, an SQS dead-letter queue. A small team ships changes weekly. Deployment is a shell script one engineer wrote in 2024 that still mostly works.
- A platform fleet: the team is about to onboard twelve more services this quarter and twenty-five next quarter. Every service needs the same logging, the same IAM baseline, the same X-Ray configuration, the same tag policy. Nobody wants to cut-and-paste that twenty-five times.
- A market-data experiment: a small team is prototyping a service that might run on AWS, might run on Google Cloud, and might end up as a hybrid. They want to be able to redeploy it somewhere else without rewriting the whole stack.
Everything today is hand-rolled YAML and shell scripts. The question isn’t whether that’s wrong, it works, it’s whether it scales across three different ownership shapes.
What actually matters
Before reaching for a docs page, it’s worth asking what we’re actually trading.
The core tension in Lambda-era infrastructure is authoring model vs deployment engine. Most AWS-native tools deploy via CloudFormation in the end, it’s the durable state store for the account’s infrastructure, and nothing else gives you the same drift detection and rollback guarantees. The authoring model is where the tools diverge: do we write templates by hand, compose them from high-level constructs in a general-purpose language, or describe the application at a level above the resources?
The first thing to ask is: how much of the template do we actually want to write? A typical Lambda-plus-API-Gateway stack is two hundred lines of CloudFormation where maybe ten lines are specific to the application. Tools that expand a small declaration into that boilerplate buy back real time; tools that make us write the two hundred lines buy back real control.
The second thing to ask is: who is the audience for the code? If the team is Node developers, a tool that lets them write TypeScript and get Lambda-plus-IAM is a shorter learning curve than a tool that requires a new YAML dialect. If the team is a mix of developers and operators, a declarative template that reads the same to both is usually easier to reason about during an incident at 3am than a program that has to be executed to understand.
The third is local iteration. Lambda code has a development feedback loop that’s slower than in-process code by default, package, upload, deploy, invoke. Tools that ship a local runner for the function (or the whole application) close that loop; tools that don’t, don’t.
The fourth is portability. Some tools assume AWS is the target forever; others abstract the provider enough that the same application definition deploys to Azure Functions or Google Cloud Functions with a provider change. That abstraction has a price, you lose some of AWS’s depth, but for a team actually deploying to two clouds, it pays back.
The fifth is ecosystem. A tool with a plugin catalogue and a sharing-library culture compounds: the first team writes the logging construct, the next ten reuse it. A tool without that compounds in the other direction: every team writes the logging construct themselves, slightly differently.
And finally, a softer one: the blast radius of the deployment engine. All three options end up calling CloudFormation. When a deployment fails, the error message the engineer sees depends on how much abstraction sits between their code and the CloudFormation stack event that actually failed.
What we’ll filter on
Distilling that exploration into filters we can score each tool against:
- Authoring model, declarative template, imperative program, or application-level shorthand?
- Lambda-specific ergonomics, does it shortcut the most common Lambda-plus-Gateway-plus-IAM boilerplate?
- Local invoke, can we run the function locally without deploying?
- Reusable constructs, does the ecosystem ship sharable high-level pieces?
- Portability beyond AWS, does the same authoring model reach other clouds?
- Deployment engine, what actually applies the change, and how legible are its errors?
The tool landscape
-
AWS CloudFormation (raw). The substrate. Declarative YAML or JSON templates; the AWS service that tracks stack state, performs change sets, and rolls back on failure. Everything else on this page compiles down to CloudFormation. Writing it by hand is the most control, the most verbose, and the most work: a Lambda-plus-API Gateway stack is dozens of resources with nested IAM policies, deployment, stage, permission, and log-group declarations. Fine for teams who want the primitive; exhausting for teams who ship Lambdas weekly.
-
AWS SAM (Serverless Application Model). A CloudFormation transform, templates start with
Transform: AWS::Serverless-2016-10-31and gain a handful of high-level resource types (AWS::Serverless::Function,AWS::Serverless::Api,AWS::Serverless::HttpApi,AWS::Serverless::StateMachine,AWS::Serverless::SimpleTable). AFunctionresource with anEvents:block expands, at deploy time, into the Lambda, the permission, the API Gateway route, the integration, the stage, and the log group, ten or fifteen underlying resources from five lines of SAM. The CLI (sam build,sam deploy,sam local invoke,sam sync) packages dependencies, deploys via CloudFormation change sets, and runs the function locally in a Docker container that mimics the Lambda runtime. AWS-specific, declarative, Lambda-shaped. -
AWS CDK (Cloud Development Kit). An imperative library. TypeScript, Python, Java, C#, Go, that generates CloudFormation. Instead of writing YAML, the engineer writes code that instantiates constructs: L1 constructs mirror CloudFormation resources one-to-one; L2 constructs are hand-curated wrappers with sensible defaults (
Function,RestApi,Table); L3 constructs (often called patterns) compose multiple L2s into application-level shapes (ApplicationLoadBalancedFargateService).cdk deploygenerates a CloudFormation template and hands it to CloudFormation. The ecosystem. Construct Hub, internal construct libraries, turns “our logging standard” into a published package that every service imports. AWS-specific, imperative, reusable. -
Serverless Framework (v3 / v4). A third-party tool with an application-level YAML dialect (
serverless.yml) that describes functions, events, and resources at a level above CloudFormation, then compiles to CloudFormation at deploy time on AWS. A rich plugin ecosystem extends it in every direction:serverless-offlineruns functions locally,serverless-webpackbundles,serverless-iam-roles-per-functionsplits IAM per handler. It also targets Azure Functions, Google Cloud Functions, and a handful of others, the sameserverless.ymlshape, different provider. Multi-cloud, application-level, plugin-heavy. (Licensing changed materially between v3 and v4; the licensed editions matter for larger installations.) -
Terraform + Lambda. Not in the batch today, but worth naming because teams ask. HCL authoring, own state file, own deployment engine (no CloudFormation); excellent for infrastructure that spans many AWS services and other providers, less ergonomic for the packaged-zip-plus-handler Lambda workflow. Appears here only to be acknowledged and set aside.
Side by side
| Option | Authoring | Lambda ergonomics | Local invoke | Reusable constructs | Portability | Deployment engine |
|---|---|---|---|---|---|---|
| CloudFormation (raw) | Declarative YAML | ✗ | ✗ | Nested stacks | AWS only | CloudFormation |
| SAM | Declarative YAML | ✓ | ✓ (sam local) |
Limited | AWS only | CloudFormation |
| CDK | Imperative (TS/Py/Java) | ✓ (L2/L3) | Partial | ✓ (Construct Hub) | AWS only | CloudFormation |
| Serverless Framework | Application YAML | ✓ | ✓ (serverless-offline) |
✓ (plugins) | ✓ (multi-cloud) | CloudFormation on AWS |
| Terraform | HCL | — | ✗ | Modules | ✓ | Terraform state |
Reading the table by workload rather than by tool:
- Receipts service, small team, ships weekly, wants the shortest possible Lambda-plus-Gateway declaration and a local invoke loop. SAM is the direct hit: five lines of
AWS::Serverless::Functionwith anEvents: Apiblock give the whole stack,sam local invokeruns the handler in seconds, no new language to learn. - Platform fleet, many services, shared standards, the standards must be reusable rather than cut-and-paste. CDK wins because constructs are first-class: write
PaymentsServiceonce as an L3 construct, publish it to an internal npm registry, every new service is three lines and inherits the logging, IAM, X-Ray, and tag baselines. - Market-data experiment, might move clouds, can’t commit to AWS-only tooling. Serverless Framework is the direct hit: same
serverless.ymlshape with a provider switch, and a plugin ecosystem that fills gaps the core doesn’t cover.
Matching ownership shape to tool
The picks in depth
Receipts service → SAM. A template.yaml for the whole service is under fifty lines: Transform: AWS::Serverless-2016-10-31, a single AWS::Serverless::Function resource with Runtime: python3.12, a CodeUri: pointing at the handler folder, and an Events: block with Type: Api and a path and method. SAM expands that into the Lambda function, the execution role with a sensible managed policy, the permission that lets API Gateway invoke the function, the AWS::Serverless::Api that SAM generates implicitly, the AWS::ApiGateway::Stage, and the AWS::Logs::LogGroup with the function’s name. sam build packages the code; sam deploy --guided creates the stack the first time and updates it via change sets after. sam local invoke ReceiptFunction --event event.json runs the handler in a Docker container against a synthetic event, the feedback loop that made the shell-script previous owner of the service tolerable is suddenly three seconds, not three minutes. sam sync --watch takes it one step further: saves Lambda code updates via the UpdateFunctionCode API directly, skipping CloudFormation on the inner loop.
The trade-off is expressiveness. SAM’s high-level resources are great for the common case and thin for the uncommon one, when the service needs a resource SAM doesn’t wrap, the template mixes SAM and raw CloudFormation, which is usually fine but occasionally jarring. For four Lambdas and a table, it’s not an issue.
Platform fleet → CDK. The case for CDK stops being “the code is shorter” and starts being “the code is composable.” A CDK stack in TypeScript for a single service is roughly the same size as the SAM equivalent. The payoff arrives on service number two and compounds from there.
export class PaymentsService extends Construct {
constructor(scope: Construct, id: string, props: PaymentsServiceProps) {
super(scope, id);
const fn = new lambda.Function(this, 'Handler', {
runtime: lambda.Runtime.PROVIDED_AL2023,
code: lambda.Code.fromAsset(props.codePath),
handler: 'bootstrap',
tracing: lambda.Tracing.ACTIVE,
logRetention: logs.RetentionDays.ONE_MONTH,
});
new tags.TagPolicy(this, 'Tags', props.ownership);
new observability.StandardDashboard(this, 'Dash', { fn });
}
}
The L3 construct PaymentsService encapsulates every decision the platform team wants to keep consistent, runtime choice, X-Ray active tracing, log retention, tagging policy, dashboards. Services one through twenty-five instantiate it with three lines of their own. When the standard changes (next quarter’s requirement: every function emits an EMF metric on cold-start), the construct changes and every service picks it up on its next deploy. That’s the compounding behaviour no amount of copy-paste YAML buys.
The deployment engine is still CloudFormation – cdk deploy runs cdk synth first to produce a template, then hands that template to CloudFormation. The experience of failures is slightly noisier than SAM because there’s more abstraction in the way: a failed stack event names a logical ID the CDK generated, which you translate back to the construct tree path to find the offending code. The CDK CLI and the cdk.out/tree.json make that translation survivable, but it’s real.
Market-data experiment → Serverless Framework. A serverless.yml describing functions, events, and resources deploys to AWS today and to Google Cloud Functions tomorrow with a change to provider.name. Plugins fill the gaps: serverless-offline runs the service on localhost:3000, serverless-iam-roles-per-function splits the execution roles, serverless-prune-plugin trims stale Lambda versions. For a prototype team that isn’t yet sure which cloud it’s on, and that values a single rich ecosystem rather than three separate vendor toolchains, the portability pays for itself the first time a stakeholder asks “how much work is GCP?” and the answer is an afternoon, not a rewrite.
The trade-off is depth. Serverless Framework’s AWS provider compiles to CloudFormation and inherits CloudFormation’s ceiling, it doesn’t have CDK’s library of L3 application patterns, and it doesn’t have SAM’s accelerated inner loop (sam sync). For multi-cloud, that’s the price.
A worked example: the same service, three ways
A function triggered by an SQS queue, writing to DynamoDB.
SAM:
Resources:
Processor:
Type: AWS::Serverless::Function
Properties:
CodeUri: ./src
Handler: app.handler
Runtime: python3.12
Policies:
- DynamoDBCrudPolicy: { TableName: !Ref Table }
Events:
Queue:
Type: SQS
Properties: { Queue: !GetAtt Queue.Arn, BatchSize: 10 }
Queue: { Type: AWS::SQS::Queue }
Table:
Type: AWS::Serverless::SimpleTable
Properties: { PrimaryKey: { Name: id, Type: String } }
CDK (TypeScript):
const queue = new sqs.Queue(this, 'Queue');
const table = new dynamodb.Table(this, 'Table', {
partitionKey: { name: 'id', type: dynamodb.AttributeType.STRING },
});
const fn = new lambda.Function(this, 'Processor', {
runtime: lambda.Runtime.PYTHON_3_12,
code: lambda.Code.fromAsset('./src'),
handler: 'app.handler',
});
fn.addEventSource(new SqsEventSource(queue, { batchSize: 10 }));
table.grantReadWriteData(fn);
Serverless Framework:
service: processor
provider:
name: aws
runtime: python3.12
functions:
processor:
handler: app.handler
events:
- sqs: { arn: !GetAtt Queue.Arn, batchSize: 10 }
resources:
Resources:
Queue: { Type: AWS::SQS::Queue }
Table:
Type: AWS::DynamoDB::Table
Properties:
AttributeDefinitions: [{ AttributeName: id, AttributeType: S }]
KeySchema: [{ AttributeName: id, KeyType: HASH }]
BillingMode: PAY_PER_REQUEST
All three produce a Lambda, an SQS queue, a DynamoDB table, an event source mapping, and the correct IAM. The differences are in what they hide, how they express the relationship, and where the reuse story lives.
What’s worth remembering
- All three sit on CloudFormation. SAM is a transform, CDK generates a template, Serverless Framework compiles one on AWS. Understanding CloudFormation errors is worth more than the features any of them add.
- SAM is the shortest path for a single Lambda-heavy service. High-level resources expand to all the boilerplate;
sam local invokeandsam syncgive the feedback loop; the template is legible to operators. - CDK is the reuse story. Construct libraries turn “our platform standard” into an import, and the compounding effect across a fleet is larger than any per-service ergonomic.
- Serverless Framework is the multi-cloud story. If the application might live somewhere other than AWS, same YAML shape with a provider change is cheaper than rewriting.
- L1, L2, L3 constructs are the CDK mental model. L1 mirrors CloudFormation; L2 adds defaults; L3 composes application shapes. Most services live at L2 with a sprinkling of L3.
sam local,serverless-offline, and CDK’s testing libraries are not the same thing. Local invoke is the feedback loop. CDK ships strong unit testing but the “run the function locally” story is thinner, the CDK answer is usually SAM side-by-side for the inner loop.- Deployment-engine failures are where tools differ. SAM errors are CloudFormation errors with SAM terminology on top; CDK errors are CloudFormation errors with generated logical IDs; Serverless Framework errors are CloudFormation errors (on AWS) wrapped in the plugin stack. Know the underlying layer.
- The tool isn’t the standard. Whichever tool a team picks, the logging, tagging, IAM, and observability baselines are what actually matter. The tool is the vehicle for those decisions, not the decisions themselves.
SAM is shortest for one service, CDK compounds across many, Serverless Framework crosses clouds. Three ownership shapes, three tools, three matches. The work isn’t picking a favourite, it’s pairing each service’s ownership shape with the tool that fits its future.