The situation
We’re wiring a new pipeline. The orders DynamoDB table has a stream enabled; every INSERT of a new order should end up on the commerce event bus as an OrderPlaced event, with a customer_tier field added from the customers table so downstream rules can fan out by tier. Out of every five writes to the table, only one is an INSERT we care about, the rest are updates to order state that belong to other pipelines.
Today this would be a Lambda. The Lambda polls the stream, checks the event type, calls GetItem on the customers table, builds the EventBridge entry, calls PutEvents. The Lambda is ninety percent glue: record-format conversion, error handling, throttling back-pressure, batching. The two lines that actually belong to our domain are the filter predicate and the GetItem enrichment.
We’re not the only team shipping this shape. The platform has ten pipelines just like it, each with its own Lambda, each with its own flavour of error handling and logging, each a small operational surface that patches and versions and alerts independently. The Lambdas are fine. The Lambdas are also a tax.
What actually matters
Before reaching for a service, it’s worth asking what “source-to-target” actually needs.
A pipe is a four-stage pipeline. Source is where events come from, a stream, a queue, a change feed. Filter is a predicate that drops events we don’t care about before we pay to process them. Enrichment is a lookup that adds context to the event. Target is where enriched events end up, another queue, another bus, a function, a state machine.
Each stage can be free-form code or constrained configuration. Free-form code is maximally flexible but means we own operations: cold starts, timeouts, retries, logging, metrics, IAM, versions, the works. Constrained configuration is less flexible but the service owns operations, and the gap between “what’s possible” and “what’s configurable” is where the interesting trade-offs live.
The question is: for each stage of this pipeline, what do we actually need?
Source. The source is a DynamoDB stream. That’s not a choice, it’s a given. Whatever service we pick has to speak DynamoDB streams out of the box, or we’re back to writing a poller.
Filter. We want to drop 80% of events. The filter is a JSON pattern match: eventName = INSERT. That’s a fixed, declarative predicate. No lookup, no code.
Enrichment. We need customer_tier from a customers table. That’s a single-record lookup keyed by a field that’s already on the event. Declarative if the service supports it; code if it doesn’t.
Target. The target is an EventBridge bus. That’s another given.
The shape of this pipeline is filter-then-enrich-then-fan-out. Every stage is expressible as configuration if the service supports the correct primitives. The question becomes: which service has the correct primitives?
What we’ll filter on
Distilling the needs into filters we can score each option against:
- Native source polling, does the service read the source without us writing poll-and-retry logic?
- Declarative filtering, can the filter predicate be expressed as JSON, or does it need code?
- Enrichment without code, can the enrichment be a direct service call, or must it route through a function?
- Batching and ordering, does the service batch events to the target and preserve partition-key order where it matters?
- Operational surface, how much of the pipeline do we own and have to alert on?
The integration landscape
-
Lambda as a stream consumer. The Lambda runtime has an event-source mapping that polls DynamoDB streams, hands batches to a function, and retries on failure. The function is user code: filter, enrichment, target call, all hand-written. Event-source mappings support declarative filter patterns (so at least the 80%-drop can be done without code), but enrichment and target publishing are code we own. Operational surface: one Lambda per pipeline, plus its IAM role, its alarms, its log group.
-
EventBridge Pipes. A managed source-to-target pipeline with four explicit stages. Source is an event-source mapping (DynamoDB streams, Kinesis, SQS, MSK, Amazon MQ, self-managed Kafka). Filter is a JSON pattern identical to the EventBridge rule pattern language. Enrichment is an optional stage that calls Lambda, Step Functions Express, API Gateway, or API destinations. Target is one of fifteen-plus services including EventBridge buses, SQS, SNS, Kinesis, Firehose, Step Functions, Lambda. Between stages, Pipes handles batching, retry, dead-letter, and back-pressure without user code.
-
Kinesis Data Streams + Firehose. If the source were already Kinesis and the target a data-lake sink, Firehose’s built-in transform Lambda and delivery semantics would be the natural path. Not the shape of our pipeline; mentioning for completeness because it’s easy to reach for when “stream processing” is in the brief.
-
Step Functions with an EventBridge scheduler. Useful when the pipeline needs branching, parallelism, or human-in-the-loop approvals. Overkill for source-filter-enrich-target where everything is inline.
-
Glue streaming ETL. Spark Streaming on Glue can read Kinesis, transform in Python or Scala, and write to most sinks. Right answer for large-scale streaming analytics; wrong shape for a one-in-five-events filter feeding an event bus.
Side by side
| Option | Native source polling | Declarative filtering | Enrichment without code | Batching / ordering | Operational surface |
|---|---|---|---|---|---|
| Lambda stream consumer | ✓ | ✓ (patterns on ESM) | ✗ (enrichment is code) | ✓ | Lambda + IAM + logs + alarms |
| EventBridge Pipes | ✓ | ✓ (same pattern syntax) | ✓ (or Lambda if needed) | ✓ | Pipe + IAM |
| Kinesis + Firehose | ✓ | Partial (transform Lambda) | ✗ (transform is code) | ✓ | Firehose + Lambda |
| Step Functions | Via ESM or schedule | Via Choice state | ✓ | Manual | State machine + tasks |
| Glue streaming ETL | ✓ | Code | Code | ✓ | Glue job + IAM + logs |
Reading the table by pipeline shape:
- One source, one filter, one enrichment, one target, no branching. EventBridge Pipes. Precisely the shape it was designed for.
- Complex branching, parallel fan-out, approvals. Step Functions, with Pipes as the front door if the source is a stream.
- Filter or enrichment requires arbitrary code (e.g. parsing a nested protobuf). Lambda, invoked either directly as the target or as the enrichment stage of a Pipe.
- Large-scale analytical streaming. Glue streaming or Kinesis Data Analytics.
The pipe, stage by stage
The pipe in depth
A pipe is a single AWS resource (AWS::Pipes::Pipe) whose configuration is four blocks: Source, SourceParameters, Enrichment, EnrichmentParameters, Target, TargetParameters, plus a top-level RoleArn that authorises the whole chain.
Source parameters for DynamoDB streams. The source is the stream’s ARN. StartingPosition is LATEST (pick up new events on create) or TRIM_HORIZON (read everything retained in the stream, up to 24 hours). BatchSize caps how many records are passed to filter/enrichment/target in one invocation (1 to 10,000, default 100 for streams; lower means more invocations and more parallelism, higher means bigger batches and fewer invocations). MaximumBatchingWindowInSeconds waits up to N seconds for a batch to fill before dispatching (0 to 300). ParallelizationFactor (1 to 10) multiplies the number of concurrent consumers per shard; keep at 1 when the downstream needs partition-key ordering, raise it when throughput matters more than order. OnPartialBatchItemFailure: AUTOMATIC_BISECT splits a failing batch in half and retries each half, isolating poison-pill records without failing the whole batch, almost always the correct setting.
Filtering. FilterCriteria.Filters is a list of JSON event patterns; an event matches the pipe if it matches any pattern (OR semantics). The pattern language is the EventBridge rule language: exact match, prefix match, anything-but, numeric ranges, suffix match, exists checks. For DynamoDB streams the envelope is the stream record shape, so the pattern has to reach into dynamodb.NewImage to match on attribute values: {"eventName": ["INSERT"], "dynamodb": {"NewImage": {"status": {"S": ["confirmed"]}}}} matches inserts whose status attribute is the string "confirmed". Filtered-out events don’t count toward the pipe’s invocation or byte charges, the filter runs before billing, which is exactly why you do the 80% drop in the filter rather than in a downstream function.
Enrichment. Four target types: Lambda (the general-purpose answer), Step Functions Express workflows (for multi-step enrichment with branching), API Gateway (REST or HTTP), and API destinations (a connection to an external HTTP endpoint, with AWS-managed retries and connection limits). The enrichment receives the event and returns a transformed event; whatever it returns is what flows to the target. If the enrichment is a simple GetItem against a DynamoDB table and nothing else, there’s no direct-DynamoDB enrichment, you still go through Lambda. That’s the one awkward gap: DynamoDB is a target, not an enrichment.
Input transformers can shape the event at each stage boundary. The InputTemplate is a JSON template with <$.path.to.field> substitutions from the incoming event; the output becomes the input to the next stage. Useful for re-shaping a DynamoDB stream’s verbose dynamodb.NewImage.customer_id.S into a flat {"customer_id": "c-123"} before the enrichment receives it.
Targets. The long list includes EventBridge buses, SQS, SNS, Kinesis, Firehose, Step Functions (Standard or Express), Lambda, API Gateway, API destinations, CloudWatch Logs, ECS tasks, Batch jobs, Redshift data API, SageMaker pipelines. The target’s own TargetParameters shape the delivery: for an EventBridge bus, EventBridgeEventBusParameters lets you set Source, DetailType, Resources, and Time; for SQS, SqsQueueParameters sets MessageGroupId (for FIFO) and MessageDeduplicationId.
Failure handling. MaximumRetryAttempts (0 to 185) on the source caps how many times a record is retried before being sent to the dead-letter queue. DeadLetterConfig.Arn points at an SQS queue (or SNS topic, or another destination) that receives the failed records. MaximumRecordAgeInSeconds (60 to 604800) discards records older than N seconds, which matters for streams that catch up after an outage, you usually don’t want to replay yesterday’s backlog as if it were live.
A worked pipe: the YAML
OrdersPipe:
Type: AWS::Pipes::Pipe
Properties:
Name: orders-stream-to-commerce-bus
RoleArn: !GetAtt PipeRole.Arn
Source: !GetAtt OrdersTable.StreamArn
SourceParameters:
DynamoDBStreamParameters:
StartingPosition: LATEST
BatchSize: 10
MaximumBatchingWindowInSeconds: 5
OnPartialBatchItemFailure: AUTOMATIC_BISECT
FilterCriteria:
Filters:
- Pattern: '{"eventName":["INSERT"]}'
Enrichment: !GetAtt LookupTierFunction.Arn
EnrichmentParameters:
InputTemplate: |
{
"order_id": "<$.dynamodb.NewImage.order_id.S>",
"customer_id": "<$.dynamodb.NewImage.customer_id.S>",
"total_cents": <$.dynamodb.NewImage.total_cents.N>
}
Target: !GetAtt CommerceBus.Arn
TargetParameters:
EventBridgeEventBusParameters:
Source: com.acme.orders
DetailType: OrderPlaced
InputTemplate: |
{
"order_id": "<$.order_id>",
"customer_id": "<$.customer_id>",
"customer_tier": "<$.customer_tier>",
"total_cents": <$.total_cents>,
"placed_at": "<aws.pipes.event.ingestion-time>"
}
The Lambda behind LookupTierFunction is twelve lines: take the flat input, call GetItem on the customers table, merge the returned tier attribute back into the event, return. No polling code, no filter code, no PutEvents code. The pipe does the rest.
The IAM role PipeRole needs four things: dynamodb:DescribeStream, dynamodb:GetRecords, dynamodb:GetShardIterator, dynamodb:ListStreams on the source stream; lambda:InvokeFunction on the enrichment Lambda; events:PutEvents on the target bus; and sqs:SendMessage on the dead-letter queue.
When to reach past Pipes
Pipes is optimised for the source-filter-enrich-target shape. Two situations push past it:
Arbitrary branching or parallel fan-out. If the logic is “if order_total > 1000, run the fraud check in parallel with the inventory reservation and the payment intent, and join before publishing”, that’s a state machine, not a pipe. Use Step Functions as the target and let the pipe feed the state machine. The pipe still does the pre-filter and pre-enrichment work, keeping the state machine focused on orchestration.
Enrichment needs more than one call. If the enrichment needs to look up the customer, then the SKU, then the warehouse, with branches based on intermediate results, a Lambda starts to do real work and the “just configuration” argument weakens. Either write that Lambda and let it do its thing, or promote the enrichment to a Step Functions Express workflow. Pipes accepts Express workflows as enrichment targets specifically because that gap exists.
Very low latency requirements. Pipes adds a small but measurable overhead at each stage (low double-digit milliseconds). For a pipeline where every millisecond matters, a single Lambda consuming the stream and doing everything in-process can be faster, at the cost of all the glue code we were trying to avoid.
What’s worth remembering
- Pipes is source-filter-enrichment-target as configuration. Four stages, declarative between them, one IAM role, one CloudFormation resource. The Lambda in the middle is still allowed; the pipe just handles the boring parts around it.
- Filter runs before billing. The filter stage drops events without charging for them. That’s the economic reason to move the filter out of the Lambda into the pipe: a one-in-five filter means five times less enrichment invocation cost.
- Sources are streams and queues, not everything. DynamoDB streams, Kinesis, SQS (standard and FIFO), MSK, Amazon MQ, self-managed Kafka. Not S3 events, not EventBridge buses themselves, not CloudWatch Events. The source list matters; check it before committing.
- Enrichment is Lambda, Step Functions Express, API Gateway, or API destinations. No direct-DynamoDB enrichment; a single GetItem is still a Lambda. Step Functions Express is the escape hatch when the enrichment is multi-step but fits in five minutes.
- Targets are a long list. Fifteen-plus services. Pick the target that matches the downstream shape; don’t wrap another Lambda around it just because that’s what you’d have written before.
- Input transformers reshape events at stage boundaries. Flatten the verbose DynamoDB stream envelope before enrichment; compose the final event shape before publishing. Saves the Lambda from being a transformation layer as well as a lookup layer.
- OnPartialBatchItemFailure: AUTOMATIC_BISECT. Default to this. Poison-pill records isolate themselves to the DLQ without dragging down the whole batch.
- Ordering versus parallelism is the one trade-off to think about.
ParallelizationFactor > 1on a stream means events from the same partition key can land out of order. Keep at 1 when downstream assumes order; raise when it doesn’t and throughput is the pressure.
A pipe isn’t magic. It’s the patterns we’d have written into a Lambda, poll, filter, enrich, publish, retry, DLQ, lifted out of user code into service configuration. When the pipeline’s shape fits the four-stage model, Pipes removes the Lambda that would otherwise be ninety percent glue. When the shape doesn’t fit, the Lambda is still the correct answer; the pipe knows how to feed it.