The situation
A platform team operates a customer-managed KMS key in a shared-services account (999999999999), used to wrap the envelope keys that protect inbound event payloads. Twelve spoke accounts host an event-processing Lambda fleet. Each Lambda is triggered by EventBridge or SQS, one message per invocation; it assumes an execution role (lambda.amazonaws.com with a per-account EventProcessorRole); it calls kms:Decrypt once to unwrap the payload envelope, processes the plaintext, and exits. It lives for under a second in the typical case and never longer than the function timeout.
Security’s constraints are specific to the shape of the workload: ephemeral permission per execution, audit evidence per grant, scope limited to Decrypt on exactly this envelope, and revocable without policy rewrites if a spoke account is compromised.
What actually matters
Before picking an instrument, worth naming the deeper properties an answer has to carry.
The first thing to notice is that the natural AWS instinct, “put it in a policy”, doesn’t fit this workload. A key policy statement allowing twelve spoke roles to call kms:Decrypt on the shared key would work. It would also mean that, at any moment, any principal in any of those twelve accounts that could assume the spoke role could decrypt any envelope ever wrapped with that key. That’s not what the workload is doing. The workload is decrypting one specific envelope, on one specific execution, for under a second, once. The policy instrument has an effective lifetime measured in months. The actual permission the workload needs has an effective lifetime measured in milliseconds. The mismatch is where blast radius expands.
Audit shape is the next thing to weigh. With a broad key-policy grant, every Decrypt call lands in CloudTrail as a single event, “this role decrypted something at this time.” To prove “permission issued to decrypt envelope evt-abc123 was granted, used, and ended,” we’d have to correlate that Decrypt to some other event that carries the context. The instrument we want issues its own creation record, uses the permission with a reference to that creation, and ends with its own termination record, three events per event, joinable on an id. That’s the difference between “the audit trail is the policy, frozen” and “the audit trail is the workload, traced.”
Encryption-context scoping is a property worth having even when it isn’t strictly asked for. KMS calls carry additional authenticated data, the encryption context, that can be matched as a precondition. A per-execution permission bound to {envelope-id=evt-abc123, tenant=acme} cannot be used to decrypt other envelopes. Someone who intercepts the permission’s credential in flight can’t replay it against a different envelope. That scoping is hard to express in a key policy alone.
Revocation deserves its own attention. If the issuer of per-execution permissions is compromised, the response is to revoke all outstanding permissions and rotate the issuer’s credentials. That’s an administrative action in one place with a bounded blast radius: the live permissions only. The key policy stays untouched, the spoke roles’ IAM policies stay untouched, production change management doesn’t get dragged in. Compare to the key-policy model, where revoking a compromised spoke means an authoritative policy edit with all the ceremony that entails.
Finally, there’s eventual consistency. KMS replicates state across its own regional endpoints; an authorisation created at t=0 may not be visible at every endpoint until t+a-few-seconds. For a sub-second Lambda execution, that’s a race we’d lose every time. The shape of an answer has to include some mechanism that lets a handler use a freshly-minted permission immediately, without waiting on replication.
What we’ll filter on
- Per-execution lifetime. Permission exists for one handler run and then ceases.
- Per-execution audit. Each issuance leaves a distinct CloudTrail record.
- Operation scope =
Decrypt. No Encrypt, no GenerateDataKey, no ReEncrypt, just the one operation. - Envelope-context scoping. The permission works for one envelope, not any envelope.
- Surgical revocation. Remove permission without editing the key policy or redeploying anything.
The KMS authorisation landscape
Every kms:Decrypt call is evaluated against up to five instruments. They compose as AND, every layer in scope must allow, any single Deny kills the request. Walking them in evaluation order:
1. The key policy. Every KMS key has exactly one, and it is mandatory. The key policy is the primary gate: no principal, not even the key’s creator or the account root, has any permission on the key unless a key policy statement explicitly allows it. Unlike an S3 bucket policy, the KMS default is no access. For cross-account use, the key policy must name the cross-account principal (or authorise the kms:CreateGrant call that will name them later).
2. IAM identity policies in the caller’s account. Once the key policy has enabled a caller, the caller’s IAM policy must also allow the specific action on the specific key ARN. For same-account KMS, the default key-policy statement that “enables IAM policies” delegates evaluation to IAM; for cross-account, IAM in the caller’s account is always in scope.
3. Grants. A separate, allow-only instrument attached to one key, naming one grantee principal, one operations list, and optional encryption-context constraints. Created via CreateGrant. Grants supplement the key policy, they cannot permit anything the key policy doesn’t already allow the creator to delegate.
4. VPC endpoint policies. If the Lambda reaches KMS through an Interface VPC endpoint, the endpoint policy narrows which keys and principals may traverse the path. Never grants, only constrains.
5. Service Control Policies. If any account sits under an Organizations OU, SCPs form an outer ceiling. Deny or allow-list only; they never grant.
Side by side
| Instrument | Per-execution lifetime | Per-execution audit | Scope = Decrypt only | Context-bound | Surgical revocation |
|---|---|---|---|---|---|
| Key policy | ✗ | ✗ | ✗ | , | ✗ |
| IAM identity policy | ✗ | ✗ | ✗ | , | ✗ |
| Grants | ✓ | ✓ | ✓ | ✓ | ✓ |
| VPC endpoint policy | ✗ | , | ✗ | ✗ | ✗ |
| Service Control Policy | ✗ | , | ✗ | ✗ | ✗ |
Key policies and IAM policies are persistent JSON, editing them per execution is absurd, they emit no per-execution audit event, and revocation means a policy edit. VPC endpoint policies and SCPs are not issuance mechanisms at all, they narrow and they deny. Grants are the only instrument designed to be created and destroyed on the timescale of an operation, with dedicated audit events per creation and termination, scoped to a single grantee, a single key, and a fixed operations list. The key policy is still needed, to authorise the CreateGrant call itself, but the per-execution permission lives in the grant.
Grant lifecycle
Grants in depth
A grant is a first-class KMS resource identified by a GrantId (1-128 characters) and, when first created, a GrantToken (1-8192 characters, base64, nonsecret). A single key can hold many grants.
Shape of a grant. Every grant carries KeyId (the key it applies to, one grant, one key), GranteePrincipal (the identity that gets the permissions. IAM user, role, federated principal, assumed-role session, or AWS account), Operations (the KMS operations the grant authorises, drawn from the grant-operations allow-list: Decrypt, Encrypt, GenerateDataKey*, ReEncrypt*, Sign, Verify, CreateGrant, RetireGrant, and others), Constraints (optional – EncryptionContextEquals or EncryptionContextSubset), RetiringPrincipal (optional, the principal permitted to call RetireGrant), and Name (optional idempotency token).
Allow-only. Grants cannot deny. Everything is additive; all deny logic belongs in key policies, SCPs, or endpoint policies.
The key policy still authorises the creation. CreateGrant is itself a KMS operation. The caller must have kms:CreateGrant on the key. For a cross-account call, the key policy must permit the cross-account principal to call kms:CreateGrant, and IAM in the caller’s account must allow it too. Once minted, the grantee no longer needs the key policy to name them for the granted operations.
Condition keys for CreateGrant. KMS exposes several that narrow grant creation: kms:GranteePrincipal (which principals may be named as grantee), kms:RetiringPrincipal (which may be named as retiring principal), kms:GrantOperations (which operations the grant may permit), kms:GrantConstraintType (requires grants to carry an encryption-context constraint), kms:GrantIsForAWSResource, kms:ViaService, and kms:CallerAccount. For this shape, the orchestrator’s IAM carries kms:CreateGrant constrained by kms:GrantOperations to Decrypt only and by kms:GranteePrincipal to the known spoke roles, the orchestrator literally cannot mint a grant broader than decrypt for one of the approved event-processor roles.
Eventual consistency. Create, retire, and revoke all propagate through KMS’s internal replication. Typically visible within seconds; documented worst case several minutes. The GrantToken returned by CreateGrant is the workaround: pass it explicitly on the cryptographic call (--grant-tokens on the CLI, GrantTokens in the SDK) and KMS honours it immediately without waiting for propagation.
Ending a grant: retire vs revoke
RetireGrant is the polite end, normal cleanup. Callable by the retiring principal, the AWS account that created the grant (any principal in that account with kms:RetireGrant), or the grantee, if the grant’s operations list includes RetireGrant. Takes either the original GrantToken or the KeyId/GrantId pair.
RevokeGrant is the administrative end, initiated by the key owner to actively deny the permissions. Requires kms:RevokeGrant on the key and takes KeyId and GrantId. Used when you’re pulling permission, not when the grantee is handing it back.
Both delete the grant and propagate the deletion. The CloudTrail distinction is the story: a RetireGrant with the grantee as caller says “worker finished its job”; a RevokeGrant with a key admin as caller says “we pulled permission.”
A grant your own code creates does not retire when the creator exits. A Lambda that grants to itself and crashes leaves the grant lingering until RetireGrant, RevokeGrant, or key deletion reaps it. That’s why the orchestrator model fits: the shared-services side creates, hands the token to the Lambda, and retires on completion or after a bounded timeout.
A worked grant lifecycle trace
t = 0, the event arrives. EventBridge in spoke 222222222222 delivers a message to the payload-processor Lambda queue. The orchestrator in the shared-services account receives a companion control-plane event naming the target spoke role and the envelope metadata.
t + ~50ms. CreateGrant. The orchestrator calls:
aws kms create-grant \
--key-id arn:aws:kms:eu-west-1:999999999999:key/abcd1234-... \
--grantee-principal arn:aws:iam::222222222222:role/EventProcessorRole \
--retiring-principal arn:aws:iam::999999999999:role/GrantIssuer \
--operations Decrypt \
--constraints EncryptionContextSubset={envelope-id=evt-abc123,tenant=acme} \
--name evt-abc123
KMS checks the key policy and the orchestrator’s IAM (does it allow kms:CreateGrant within the kms:GrantOperations constraint? yes), creates the grant, and returns GrantId and GrantToken. A CreateGrant event lands in the shared-services CloudTrail. The orchestrator ships the token with the message.
t + ~150ms, the Lambda decrypts. The handler calls kms:Decrypt with grant_tokens=[token] and matching encryption context. The key policy doesn’t name the spoke role directly, but the token references an active grant. KMS checks the constraint (request context satisfies the subset? yes), honours the token immediately, and returns the plaintext data key. A Decrypt event lands in the shared-services CloudTrail; cross-account KMS calls record in both accounts’ trails.
t + ~800ms, the handler exits. The Lambda emits a completion signal. The orchestrator calls kms:RetireGrant with the KeyId/GrantId pair. A RetireGrant event lands. Within seconds the grant is gone; any reuse of the token returns AccessDeniedException.
Three CloudTrail entries for one event. The grant existed for under a second. No policy was edited. The spoke role never gained blanket decrypt permission.
Failure modes worth naming. If the Lambda crashes before the orchestrator retires, the grant lingers until a timeout reaper calls RetireGrant. If the orchestrator is compromised, the key owner calls RevokeGrant on every outstanding grant and rotates credentials; blast radius is the in-flight grants only. If a GrantToken leaks, it’s non-secret but useless without the grantee principal. If the token is not passed on the first call, a sub-second race with propagation can return AccessDeniedException; always pass the token on the first call.
Why not just put decrypt in the key policy?
The obvious alternative: a key-policy statement naming all twelve spoke roles with kms:Decrypt. It fails every attribute.
Scope is the first problem, a policy-level Decrypt permission authorises any decrypt, any envelope, any payload; a grant with an EncryptionContextSubset constraint authorises one envelope-id at a time. Revocation is the second, removing a spoke from the key-policy model is a PutKeyPolicy change with production change management on top, while the grant model stops minting for that spoke and revokes in-flight grants with no policy edit. Audit is the third, key-policy access emits one Decrypt event per call with no “permission issued” record, while the grant model emits three events per event, joinable on grant id.
The key policy still earns its keep: it authorises the orchestrator to call CreateGrant, sets the outer ceiling on what operations any grant may permit, and denies everything not explicitly allowed.
What’s worth remembering
- The key policy is the primary gate, no principal, not even the account root, has any KMS permission unless a key-policy statement explicitly allows it.
- The authorisation layers compose as AND: key policy, IAM, grants, VPC endpoint policy when in scope, SCPs when in scope; any single Deny wins.
- Grants are the only instrument designed for per-execution, ephemeral permission, allow-only, one key, one grantee, explicit operations list, optional encryption-context constraints.
CreateGrantitself is gated by the key policy; the grants a caller may create can be constrained bykms:GrantOperations,kms:GranteePrincipal,kms:RetiringPrincipal, andkms:GrantConstraintType.GrantTokenis the eventual-consistency workaround, returned byCreateGrant, passed on the next cryptographic call, honoured immediately.RetireGrantandRevokeGrantare both terminations with different narratives, retire is the polite end, revoke is the administrative end.- Every lifecycle step is a CloudTrail event joinable by grant id; cross-account calls record in both accounts’ trails.
- Grants cannot deny, all deny logic lives in key policies, SCPs, or endpoint policies.
kms:ViaServicebinds a permission to a service endpoint;kms:CallerAccountnarrows by the initiating account.- Grants on AWS-managed
aws/*keys cannot be created by customers, this pattern requires a customer-managed key.