How to Fan Out CloudWatch Logs to Real-Time Consumers

September 20, 2027 · 15 min read

The situation

Acme runs 40 services across three accounts. Every service writes structured JSON logs to CloudWatch Logs (one log group per service per Region), retention set between 7 and 30 days depending on compliance class. The logs are useful: engineers tail them with Logs Insights, on-call engineers grep them during incidents, and the security team queries them monthly for audit.

Three new consumers need log data routed as it’s written:

SIEM integration. Every log line that matches "errorCode": "AccessDenied" or "errorCode": "UnauthorizedOperation" should land in Splunk within two minutes of being written, regardless of which service produced it.
Product analytics stream. Every log line that matches "event": "order.created" in the order-service log group should land in a Kinesis Data Stream within seconds, feeding a real-time analytics dashboard.
Incident Slack bot. Every log line containing "level": "ERROR" with "stacktrace": present should call a Lambda that formats and posts to a Slack channel.

All three need real-time, filtered, at-source routing. None of them want “give me the whole log group and I’ll filter on my side”; the volume is too high and the latency too loose.

What actually matters

Before reaching for a mechanism, it’s worth naming the shape of the problem, because “get logs out of CloudWatch” hides several genuinely different questions whose answers want different tooling.

The first is streaming vs extraction. Streaming routes each log event through a filter at ingestion time and delivers it (or a transformed version) to a destination within seconds. Extraction pulls whole log groups into a downstream store and queries them retroactively. The three jobs in the brief are all streaming jobs; an extraction-shaped mechanism is the wrong answer regardless of how elegant the query language is.

The second is where the filter runs. A filter at the source means only matching lines leave the log group, and the bill is paid in proportion to interesting events, not total volume. A filter at the destination means everything is shipped and the consumer drops the noise, which works but pays bandwidth and ingest cost for every line nobody wanted. For high-volume sources with low-volume signals (AccessDenied, ERRORs with stack traces), the difference is an order of magnitude on the bill.

The third is how many consumers want the same source. One source, one consumer is a single pipe. One source, three consumers is fan-out: either the source supports multiple downstream attachments natively, or one attachment carries everything to a relay where many consumers can pick up. Hitting an attachment cap in the middle of a rollout is how a streaming design gets stuck.

The fourth is cross-account vs same-account. Same-account routing is local IAM. Cross-account routing needs the destination to expose a policy the source’s IAM can assume into; the mechanism either has a first-class way to express that or the integration becomes a Lambda relay.

The fifth is what fits at the consumer end. A Slack bot wants per-event invocation with a small batch and the right format. A SIEM wants buffered, durable delivery with retries and an error bucket. A real-time analytics dashboard wants an ordered, replayable stream. Each consumer wants a different shape of pipe, and the routing mechanism has to be honest about which shapes it serves cleanly.

And sixth, what the team doesn’t want to operate. A pipeline with a buffer, retries, format conversion, and a dead-letter queue is real work to run. If a managed delivery service slots in between the filter and the SIEM, the team buys back that work in exchange for some configuration. Whether that’s the right trade depends on the volume, the destinations, and the team’s appetite for owning yet another resilient pipe.

What we’ll filter on

Real-time vs batch: seconds to destination, or minutes/hours?
Pattern filtering at source: does the filter run before data leaves the log group?
Fan-out to multiple consumers: how many destinations per source?
Cross-account support: can logs in account A route to account B’s destination?
Ingestion-format flexibility: JSON, plain text, CloudTrail, VPC Flow Logs?
Operational cost shape: do we manage the pipe or does AWS?

The log-routing landscape

Subscription filter -> Lambda. Log events that match the pattern are delivered (up to 6 MB per invocation) to a Lambda function. The Lambda sees a batch of events with the log group, log stream, and event JSON. Suits per-event logic with modest throughput, like the incident Slack bot.
Subscription filter -> Kinesis Data Stream. Matched events land as Kinesis records on a stream. Multiple consumers can read the same stream, and enhanced fan-out gives each consumer dedicated throughput. Suits high-volume streaming, like the product analytics case.
Subscription filter -> Kinesis Data Firehose. Firehose buffers matched events and delivers to S3, Redshift, OpenSearch, Splunk, or a generic HTTPS endpoint with optional format conversion (JSON -> Parquet) and failure-record handling. Suits SIEM and archival use cases, like the Splunk integration.
Account-level subscription filter. A single policy that applies to every log group in the account (with optional include/exclude by log group name). One filter, hundreds of log groups. Avoids the two-filter-per-log-group constraint when the pattern is the same everywhere.
CloudWatch Logs -> S3 export. CreateExportTask exports a log group’s contents over a time window to S3. Point-in-time, up to 12 hours per task. Archival, not streaming.
OpenSearch / Firehose -> OpenSearch. CloudWatch Logs can stream into an OpenSearch Service domain. Useful when the destination is an existing OpenSearch cluster the team queries directly.

Side by side

Option	Real-time	Source-side filtering	Fan-out	Cross-account	Format flex	Ops cost
Sub. filter -> Lambda	✓	✓	via Lambda code	✓ (via Destinations)	✓	Medium (Lambda)
Sub. filter -> Kinesis Data Stream	✓	✓	✓ (many consumers)	✓	✓	Medium (stream shards)
Sub. filter -> Firehose	✓ (buffered)	✓	✗ (one destination)	✓	✓ (w/ transform)	Low
Account-level subscription filter	✓	✓	One destination	✓	✓	Low
S3 export	✗	✗	At the bucket layer	✓	Raw logs	Low
Direct to OpenSearch	✓	✓	Via OS indices	✓	✓	High (cluster)

Reading by scenario:

SIEM (Splunk, every AccessDenied line). Account-level subscription filter with pattern { ($.errorCode = "AccessDenied") || ($.errorCode = "UnauthorizedOperation") }, destination Firehose with an HTTP endpoint pointed at Splunk’s HEC. Firehose’s buffering and retry behaviour is exactly correct for a SIEM integration.
Product analytics (order events to Kinesis). Subscription filter on the order-service log group with pattern { $.event = "order.created" }, destination Kinesis Data Stream. The analytics Lambda and a Firehose archive consumer both read from the stream.
Incident bot (stack traces to Slack). Subscription filter on the relevant service log groups with pattern { ($.level = "ERROR") && ($.stacktrace = *) }, destination Lambda. The Lambda formats the event and posts to Slack.

The three destinations, side by side

One subscription filter matches events at the source and delivers to a destination of the shape that fits the consumer. Lambda for per-event work, Kinesis for many-consumer fan-out, Firehose for buffered delivery to a sink.

Filter patterns in depth

CloudWatch Logs has two filter-pattern dialects. For plain text, the pattern is a space-separated list of terms with optional quoting: [ERROR, "database connection failed"]. For JSON events, the pattern uses a dollar-sign selector language: { $.errorCode = "AccessDenied" } matches events where the top-level errorCode field equals AccessDenied. Operators: =, !=, >, <, >=, <=, IS NULL, NOT EXISTS, plus && and ||.

Useful shapes:

{ $.level = "ERROR" }                        // level field exactly ERROR
{ $.level = "ERROR" && $.stacktrace = * }    // ERROR with a stacktrace field present
{ $.event = "order.*" }                      // event starting with "order."
{ ($.errorCode = "AccessDenied") ||          // multiple conditions
  ($.errorCode = "UnauthorizedOperation") }
{ $.requestParameters.instanceType = "p4d.*" }  // nested field, wildcard
{ $.latency > 5000 }                         // numeric comparison

Wildcards (*) match inside string values. Nested fields use dot notation. Array access via [i] (index) or [?] (any).

Worth testing filters before deploying: aws logs test-metric-filter --filter-pattern '{ $.errorCode = "AccessDenied" }' --log-event-messages '{"errorCode":"AccessDenied"}' '{"errorCode":"Allowed"}' returns which events matched. Cheap to iterate.

Creating the three filters

Scenario 1, account-level filter to Firehose -> Splunk:

aws logs put-account-policy \
  --policy-name splunk-security-events \
  --policy-type SUBSCRIPTION_FILTER_POLICY \
  --policy-document '{
    "DestinationArn": "arn:aws:firehose:eu-west-1:111122223333:deliverystream/splunk-security",
    "RoleArn": "arn:aws:iam::111122223333:role/CWLToFirehoseRole",
    "FilterPattern": "{ ($.errorCode = \"AccessDenied\") || ($.errorCode = \"UnauthorizedOperation\") }",
    "Distribution": "Random"
  }' \
  --scope ALL

The policy applies to every log group in the account. Distribution: Random spreads records across Firehose shards; ByLogStream preserves order within a log stream when order matters.

Scenario 2, service-specific filter to Kinesis:

aws logs put-subscription-filter \
  --log-group-name /aws/lambda/order-service \
  --filter-name order-created-to-kds \
  --filter-pattern '{ $.event = "order.created" }' \
  --destination-arn arn:aws:kinesis:eu-west-1:111122223333:stream/order-events \
  --role-arn arn:aws:iam::111122223333:role/CWLToKinesisRole \
  --distribution Random

Scenario 3, service-specific filter to Lambda (no role needed; Lambda uses resource-based permission):

aws lambda add-permission \
  --function-name incident-slack-bot \
  --statement-id cwl-invoke \
  --action lambda:InvokeFunction \
  --principal logs.amazonaws.com \
  --source-arn 'arn:aws:logs:eu-west-1:111122223333:log-group:/aws/ecs/payments-service:*'

aws logs put-subscription-filter \
  --log-group-name /aws/ecs/payments-service \
  --filter-name stacktrace-to-slack \
  --filter-pattern '{ $.level = "ERROR" && $.stacktrace = * }' \
  --destination-arn arn:aws:lambda:eu-west-1:111122223333:function:incident-slack-bot

Lambda-as-destination is the exception: no RoleArn, but the function needs a resource-based policy allowing the log-group ARN to invoke it. Missing that permission is the single most common reason a Lambda subscription silently stops delivering.

The Lambda receives what

A Lambda subscribed to a log group gets a compressed payload:

{
  "awslogs": {
    "data": "H4sIA...base64-gzipped-json..."
  }
}

Decoded, the inner JSON is:

{
  "messageType": "DATA_MESSAGE",
  "owner": "111122223333",
  "logGroup": "/aws/ecs/payments-service",
  "logStream": "payments-01",
  "subscriptionFilters": ["stacktrace-to-slack"],
  "logEvents": [
    {
      "id": "37954305091211983",
      "timestamp": 1717001234567,
      "message": "{\"level\":\"ERROR\",\"stacktrace\":\"...\"}"
    }
  ]
}

The Lambda handler code has to gunzip and JSON-parse. Up to 6 MB of gzipped events per invocation, which is usually a few thousand log events. The Lambda is invoked asynchronously; delivery failures are retried by Lambda’s own retry machinery and eventually land in the function’s on-failure destination if configured.

The two-filter limit and the account-level escape

The two-subscription-filters-per-log-group limit is the constraint that shapes most designs. Options when more than two consumers want the same log data:

Fan out downstream. One subscription filter to a Kinesis stream; many consumers read from the stream. This is the pattern when consumers need different views of the same data and latency matters.
Account-level filter. One policy with a single filter pattern and destination, applied to every (or a selected subset of) log groups in the account. Counts against a separate limit (one account-level policy per type per account, as of writing). Right for “this one pattern, this one destination, everywhere.”
Different patterns, one log group. If the two filters have genuinely different patterns (say, AccessDenied to the SIEM and ERROR-with-stacktrace to the Slack bot), two filters can coexist on the same log group and both are independent.

What’s worth remembering

Subscription filters are the streaming exit. Pattern-filtered at the log group or account level, delivered to Lambda, Kinesis Data Streams, or Firehose in near-real time.
Three destination shapes, three use cases. Lambda for per-event logic, Kinesis for many-consumer fan-out, Firehose for buffered delivery to a sink (S3, OpenSearch, Splunk, Redshift).
Two filters per log group is the hard limit. Past that, fan out via Kinesis or use an account-level subscription filter policy.
JSON filter pattern language. { $.field = "value" }, with &&, ||, wildcards, nested fields, and numeric comparisons. Test filters with aws logs test-metric-filter before deploying.
Lambda destinations need lambda:InvokeFunction resource-based permission, not an IAM role. Kinesis and Firehose destinations need a role that CloudWatch Logs assumes. Easy thing to get wrong on the first deploy.
Account-level subscription filters are the “one pattern everywhere” answer. PutAccountPolicy with SUBSCRIPTION_FILTER_POLICY type. Include/exclude by log group name.
Cross-account routing via Destinations. A CloudWatch Logs Destination in account B wraps a Kinesis stream and carries an access policy naming account A. Source log groups in A subscribe to the Destination ARN.
Subscription filters are streaming; S3 export is not. CreateExportTask pulls a time window to S3 as a batch job: correct for archival or analytics, wrong for two-minute SLAs.

The three consumers in the scenario end up with three subscription filters: one account-level filter to Firehose -> Splunk, one log-group filter to Kinesis for the analytics pipeline, one log-group filter to Lambda for the Slack bot. Filter at source, route by destination shape, accept the two-filter-per-log-group limit or work around it with fan-out. CloudWatch Logs goes from “the place we keep logs” to “the routing plane for log-derived signals”: same data, different consumers, correct tool at each exit.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.