The situation
Acme has an event backbone built around SNS. The main topic, platform-events, receives messages of many different shapes: order.created, order.cancelled, inventory.low, payment.succeeded, payment.failed, customer.registered, and a long tail of internal audit events. Twelve services publish to it; eighteen services subscribe.
Today, all eighteen subscribers receive every message. Each subscriber is a Lambda function (or an SQS queue feeding one) that parses the incoming message, checks the event type, and either processes it or returns. The subscribers that only care about order.* still get inventory.low and customer.registered; they parse, compare, discard. The cost is real: the finance-reporting Lambda alone is invoked 40 million times a month, of which only 300,000 are actually relevant.
Two related but distinct problems.
Problem 1: access control. Publishing to the topic currently works because the topic policy allows sns:Publish from any principal in the organisation. That’s a broad blanket: a misconfigured service in a dev account can publish to the production topic. There’s no fine-grained control over which services can publish which event types.
Problem 2: message routing. Every subscriber receives every message. The filtering logic is scattered across 18 Lambda codebases. Adding a new event type means updating no code (the new event publishes fine) but the subscribers that don’t care about it still pay to drop it.
Two policies can solve these, and they are genuinely different things. SNS topic access policies control sns:Publish (and sns:Subscribe) at the topic level: the IAM-like “who can do what.” SNS subscription filter policies filter messages at delivery time: the per-subscription “which messages does this subscriber actually want.”
The team is conflating the two. The interesting work is seeing where each belongs and where they overlap.
What actually matters
Before reaching for specific policy shapes, it’s worth being precise about the two questions on the table and why they’re genuinely different.
The first thing worth thinking about is the difference between authorisation and routing. “Who is allowed to put a message here?” is an access-control question: it has a yes/no answer, it’s evaluated against an identity, and a wrong answer is a security incident. “Which subscribers want this particular message?” is a routing question: the answer is a set of destinations, it’s evaluated against the message’s content, and a wrong answer is wasted Lambda invocations or missed deliveries. The same JSON document can’t reasonably answer both well, and a system that tries usually ends up doing one of them badly. Treating them as separate surfaces, with separate policies, is what lets each be precise about its own job.
The second is where filtering should live. Subscribers filtering in their own code “works” (the unwanted message gets dropped), but every drop costs an invocation, and the filter logic is scattered across as many codebases as there are subscribers. The cheaper place to filter is at the broker, before a delivery is even attempted. The broker has the message and the subscriber’s intent in the same hand; the subscriber’s code never runs for a message it doesn’t want. The economic and operational gap between “filter at the broker” and “filter at the subscriber” is the single biggest lever this scenario has.
The third is the granularity of access control. Topic-level authorisation (“you can publish to this topic”) is too coarse the moment one topic carries multiple event families. A payment-service that’s authorised to publish anything to the platform topic can accidentally (or maliciously) publish order.cancelled events and disrupt the ordering pipeline. The right granularity is per-event-type: each publisher can only emit the event families it’s meant to emit, and the topic itself enforces that. Whatever the mechanism, it has to reach into the message’s metadata at publish time, not just check the identity.
The fourth is the source of routing truth. If filtering needs to inspect message content, the content has to expose the routing-relevant facts in a stable place. Message attributes are a good home for routing keys; they’re cheap to read, the policy language is designed around them, and they’re independent of the body’s evolving schema. Bodies are a fallback for routing on data the publisher didn’t think of as routing-relevant at design time. The choice between attribute-led and body-led routing shapes how publishers serialise their messages, and that’s a decision that propagates outward to every producer.
The fifth is failure modes. An over-tight access policy rejects at publish with an explicit error: loud, traceable, easy to debug. A wrong filter policy silently drops messages at the broker: quiet, hard to spot, easy to ship. The two policies have asymmetric failure modes, which means they need different review and observability treatment. Filter policies in particular benefit from delivery metrics per subscription, so a change that filters away too much isn’t invisible.
The sixth is who owns each policy. The topic access policy is a platform-level artefact; the platform team or the topic owner decides who’s allowed to use the topic and for what. The filter policies are subscription-level artefacts; the subscriber owns “what do I want?” Conflating them puts the platform team in the routing business and the subscriber team in the authorisation business, and neither team is well-placed for the other’s job. Keeping them separate also keeps ownership clean.
What we’ll filter on
- Controls publish authorisation: can only the correct principals put messages on the topic?
- Fine-grained by event type: can authorisation key on the specific message, not just the topic?
- Controls subscribe authorisation: who can subscribe, and to what?
- Delivers only matching messages: does the subscription receive only what it cares about?
- No subscriber-side filtering code: does it eliminate the if-event_type logic in Lambdas?
- Attribute vs body filtering: does the filter have to key on attributes or can it reach into the body?
The SNS policy landscape
-
Topic access policy only. Controls publish and subscribe at the topic level. Doesn’t filter messages; every subscriber gets every message they’re subscribed to. Necessary, not sufficient.
-
Subscription filter policy only. Filters messages to subscribers. Doesn’t control who can publish. Necessary, not sufficient on its own for multi-tenant topics.
-
Both, combined. Topic policy scopes publishing (who, with what attributes); subscription policies scope delivery (what each subscriber cares about). The standard answer.
-
Topic per event type. Instead of filtering, split into many small topics:
order-events,payment-events, etc. Access control per topic; each subscriber picks which topics to subscribe to. Works; explodes the topic inventory; coordination cost grows with every new event type. -
Topic per tenant with cross-tenant subscriptions. Multi-tenant shape where each tenant has a topic and a global subscriber subscribes across all of them. Orthogonal; useful for SaaS shapes.
-
EventBridge. Different service, different model. Event buses with pattern-matching rules, targets decoupled from publishers. More expressive filtering, cross-account first-class, schema registry. Legitimate alternative when the filtering story grows beyond attribute matching. Worth naming; not the subject of this post.
Side by side
| Option | Controls publish | Fine-grained by event | Controls subscribe | Delivery filtering | No subscriber code | Body-level filter |
|---|---|---|---|---|---|---|
| Topic access policy | ✓ | With conditions | ✓ | ✗ | ✗ | n/a |
| Subscription filter policy | ✗ | ✗ | ✗ | ✓ | ✓ | ✓ (opt-in) |
| Both, combined | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ (opt-in) |
| Topic-per-event-type | ✓ (per topic) | Implicit | ✓ (per topic) | Implicit | ✓ | n/a |
| EventBridge bus | ✓ (bus policy) | ✓ (event pattern) | ✓ | ✓ | ✓ | ✓ |
Acme’s answer is Row 3: keep the one topic, use the topic policy for publish authorisation, use subscription filter policies for delivery routing. The topic-per-event-type approach is the refactor to consider only if the attribute filtering isn’t sufficient; for now it is.
The two policies, side by side
The topic access policy in depth
A topic access policy is JSON attached to the topic. The interesting knobs are the Action set (sns:Publish, sns:Subscribe, sns:GetTopicAttributes, sns:ListSubscriptionsByTopic), the Principal (IAM role, account, or *), and the Condition block.
Scoping by service and event type:
{
"Version": "2012-10-17",
"Id": "platform-events-access",
"Statement": [
{
"Sid": "AllowOrderServiceToPublishOrderEvents",
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::111122223333:role/order-service" },
"Action": "sns:Publish",
"Resource": "arn:aws:sns:eu-west-1:111122223333:platform-events",
"Condition": {
"StringLike": {
"sns:MessageAttributes.eventType.StringValue": "order.*"
}
}
},
{
"Sid": "AllowPaymentServiceToPublishPaymentEvents",
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::111122223333:role/payment-service" },
"Action": "sns:Publish",
"Resource": "arn:aws:sns:eu-west-1:111122223333:platform-events",
"Condition": {
"StringLike": {
"sns:MessageAttributes.eventType.StringValue": "payment.*"
}
}
},
{
"Sid": "AllowOrgServicesToSubscribeFromSameOrg",
"Effect": "Allow",
"Principal": "*",
"Action": ["sns:Subscribe", "sns:ListSubscriptionsByTopic"],
"Resource": "arn:aws:sns:eu-west-1:111122223333:platform-events",
"Condition": {
"StringEquals": { "aws:PrincipalOrgID": "o-abc123def4" }
}
}
]
}
Three patterns matter. sns:MessageAttributes.<attr>.StringValue is the condition key that lets you scope publishing by attribute value: the enforcement that makes “the payment service cannot publish order.created events” actually true. aws:PrincipalOrgID is the condition key for restricting subscribe to members of the AWS Organization; most production topics should gate subscribe this way. And Effect: Allow with an explicit principal list means the implicit Deny takes care of everything else, so there’s no need for per-service Denies.
The access policy does not look at the message body; it looks at message attributes. If the publisher is sending plain JSON bodies without attributes, the condition clauses can’t enforce event-type scoping at publish time. The rule: publishers should always send the event type (and any other routing-relevant metadata) as a message attribute, not just in the body.
Filter policies in depth
A filter policy is JSON attached to a subscription. The default scope (MessageAttributes) evaluates against the message’s attributes; opt-in scope (MessageBody) evaluates against fields inside the JSON payload.
aws sns set-subscription-attributes \
--subscription-arn arn:aws:sns:eu-west-1:111122223333:platform-events:abc-sub-id \
--attribute-name FilterPolicy \
--attribute-value '{
"eventType": [{"prefix": "order."}],
"region": ["eu-west-1", "eu-central-1"],
"priority": [{"anything-but": "low"}]
}'
The operators:
["value1", "value2"]: exact match against either value.[{"prefix": "order."}]: prefix match.[{"suffix": ".created"}]: suffix match (newer addition).[{"anything-but": "low"}]or[{"anything-but": ["low", "trash"]}]: negative match.[{"numeric": [">", 100]}]: numeric comparisons.[{"exists": true}]: match if the attribute is present (any value).[{"exists": false}]: match if the attribute is absent.[null]: match if the attribute is JSON null.
All conditions on all attributes must match: it’s an implicit AND across attribute keys, an implicit OR within the values for a single key.
For body filtering, the subscription attribute FilterPolicyScope is set to MessageBody, and the filter policy keys are JSON Pointer paths into the body:
{
"customerTier": ["gold", "platinum"],
"orderValue": [{"numeric": [">", 1000]}]
}
Body filtering is the correct tool when publishers can’t (or won’t) replicate all routing-relevant data in message attributes. The trade-off: slightly higher evaluation cost and a tighter coupling to the publisher’s payload schema.
A worked refactor
Acme’s refactor target: the 18 subscribers stop filtering in their own code, and the topic policy stops allowing “anyone in the org.”
Step 1: every publisher is updated to send routing-relevant data as message attributes. eventType on every message. region, priority, customerTier where relevant. This is a few days of work across the 12 publisher services.
Step 2: the topic access policy is replaced with the scoped version above. Each publisher’s role is listed explicitly; sns:MessageAttributes.eventType.StringValue conditions constrain what each can publish. Subscribe is scoped to the AWS Organization.
Step 3: each subscription gets its own filter policy matching what the subscriber actually cares about. For the finance-reporting Lambda that only cares about payment.succeeded and payment.failed, the filter is:
{ "eventType": ["payment.succeeded", "payment.failed"] }
Step 4: the subscriber code is simplified; the if-event_type ladder is removed. The Lambda receives exactly what it asked for.
Results after the refactor:
- The finance-reporting Lambda’s invocations drop from 40M/month to 300K/month. Cost saving: a few thousand dollars a year just on this one function.
- A misconfigured service in a dev account trying to publish to the production topic gets
AuthorizationErrorat publish time rather than silently succeeding and confusing downstream systems. - Adding a new event type no longer requires updating 18 subscriber codebases to add “ignore this new type.” Subscribers that don’t want it don’t see it.
When attribute-only filtering isn’t enough
A subscriber that wants “orders over $1000” has a problem if the publisher only sends eventType as an attribute. Two options: ask the publisher to send orderValue as an attribute too (cleaner), or enable body filtering on the subscription (more flexible but more coupled).
Body filtering is the pragmatic choice when publishers are third-party or legacy and can’t easily be changed. It’s also the correct tool when the routing criterion is something the publisher doesn’t know is routing-relevant (say, a field added to the payload six months after the service was written).
Use attributes for the first-line filter; reach for body filtering when attribute filtering runs out.
What’s worth remembering
- Two policies, two jobs. Topic access policy = who can publish/subscribe. Subscription filter policy = which messages this subscription actually gets. Don’t conflate them.
- Access policies can condition on message attributes.
sns:MessageAttributes.<attr>.StringValuescopes publishing by attribute value, which is how “the payment service cannot publish order events” gets enforced at publish time. - Filter policies are attribute-based by default.
FilterPolicyScopeset toMessageBodyenables body-level filtering. Attributes first; body when attributes aren’t enough. - Filter operators. Exact, prefix, suffix, anything-but, numeric, exists, null. All-attribute AND, within-attribute OR. Up to 5 nested levels, 150 attributes.
- Non-matching subscriptions are not invoked at all. Filtering happens at the SNS broker; the Lambda or SQS queue on the other side doesn’t pay. This is the cost win over code-side filtering.
- Publishers should send routing metadata as attributes. Access policies depend on it; filter policies default to it. Body-only messages leave both policy surfaces weaker.
aws:PrincipalOrgIDis the org-wide subscribe condition. A topic that only members of the AWS Organization should subscribe to should conditionsns:Subscribeonaws:PrincipalOrgID.- Topic-per-event-type is the refactor if filtering outgrows you. Attribute filtering is expressive; if policies are getting unwieldy, splitting the topic is cleaner than adding more operators.
A single SNS topic with 18 subscribers, scoped by topic policy and filter policies, goes from “every message reaches every Lambda” to “every Lambda receives only what it asked for,” and the publisher set goes from “anyone in the org” to “the named services, only for their own event families.” The same topic, the same eighteen subscriptions, two JSON documents with clearly separate responsibilities. The subscriber code shrinks; the bill shrinks; the access surface tightens; the team stops arguing about which policy “covers” a given requirement, because the answer is obviously the one that matches the question.