How to Build Ransomware-Proof Backups Across an Organisation

January 11, 2027 · 16 min read

Solutions Architect Professional · SAP-C02 · part of The Exam Room

The situation

The platform team runs forty AWS accounts under one AWS Organization: twenty-eight production workload accounts (EC2 and EBS, RDS Postgres, Aurora, DynamoDB, EFS, S3), six data-platform accounts, three shared-services accounts, and three security and audit accounts, one of them a recovery account spun up for this mandate, isolated from everything and accessible only to the backup-ops team under a break-glass process.

The board memo, written after a peer firm was ransomed, says plainly: “Assume the production account has been fully compromised. An attacker has valid AdministratorAccess. What stops them deleting the backups?” The compliance team turned that into six requirements: immutability against deletion even with admin credentials, storage in a separate account with separate IAM control, a copy in a second region, seven-year retention, central management across forty accounts, and audit reporting naming the controls and listing resources that passed or failed each one.

What actually matters

“Backups” is the easy word. “Backups that survive a full compromise of the account that owns the primary data” is a different problem, and every property the mandate names is pushing the design toward the same answer.

The threat model is “admin credentials inside the production account”. If we can assume a well-behaved production account, ordinary backups are fine. The point of this mandate is that we cannot. Whatever holds the authoritative copy must be outside the blast radius of any credential in any production account. That’s not a policy-based isolation. IAM policy in the production account can be mutated by the production admin, it’s a structural isolation, where the storage lives in a different account whose IAM is owned by a different team and whose vaults cannot be reached by any assume-role path from production.

Immutability has to be cryptographic-grade, not policy-grade. An IAM policy that says “backup-ops can delete” is exactly as strong as an IAM policy that says “backup-ops cannot delete”, both are mutable by someone with iam:*. Immutability enforced by the service (vault lock, compliance mode, with a past cooling-off window) is a different kind of thing. Once the window closes, even the root of the recovery account cannot shorten retention or delete the recovery point. That’s what “immune to admin credentials” actually looks like.

Cross-region is about physical failure modes as much as logical ones. Ransomware is one threat; a regional disaster is another; a misconfiguration that destroys the regional vault is a third. A copy in a second region covers all three at roughly the same cost, and the mandate names it explicitly, so design conformance is a must anyway.

Seven-year retention is a compliance number, but the architectural property is “nobody can shorten it”. A retention that’s enforced by a plan that someone could edit isn’t the same thing as a retention enforced by the vault’s own lock. The plan might say 2555 days today and 7 days next week under a different admin; the vault, once locked in compliance mode, doesn’t care what the plan says.

Central management over forty accounts is a scaling requirement, not a feature preference. Each production account could be configured individually, but doing that means per-account deploys, per-account config drift, and new accounts silently lacking coverage until somebody remembers. A policy that attaches at the Organizations layer and inherits down the tree is the structural answer; the alternative accumulates exceptions forever.

Auditable means “produces a report the regulator can read”, not “we have the data somewhere”. CSV and JSON with per-resource pass/fail against named controls is the artefact. A dashboard nobody exports from is not. Whatever mechanism we pick has to land evidence in a place we can point the regulator at quarterly without engineering effort per-quarter.

And one property that isn’t in the requirements doc but matters enormously: KMS is the silent failure mode. Cross-account and cross-region copies of encrypted recovery points touch KMS twice, source key (decrypt) and destination key (generate data key), and a key policy gap on either side surfaces as “the copy didn’t happen” or “the copy happened but is garbage”. Every design must name the KMS setup explicitly and audit it as part of the framework.

What we’ll filter on

  1. Cryptographic-grade immutability enforced outside the production account’s IAM blast radius.
  2. Cross-account storage where the storage account’s IAM is owned by a different team.
  3. Cross-region replication of every recovery point.
  4. Seven-year retention set once and locked against shortening.
  5. Organisation-wide central management via a policy primitive, not a per-account loop.
  6. Native auditability producing per-control pass/fail reports.

The backup landscape on AWS

AWS Backup. A managed cross-service backup service that orchestrates recovery points for EBS (and EC2), RDS, Aurora, DynamoDB, EFS, FSx, Storage Gateway, Neptune, DocumentDB, Redshift, Timestream, S3, SAP HANA on EC2, and VMware. A backup plan describes schedule, lifecycle, destination backup vault, and optional copy actions. Backup policies sit at the Organizations layer and push plans into member accounts from a delegated administrator. Vault Lock enforces WORM semantics. Backup Audit Manager produces framework-based compliance reports.

Service-native snapshots (per-service). Every service exposes its own snapshot primitive. Fine for one service. Eleven control planes, eleven IAM surfaces, eleven retention configurations across the estate. No organisation-wide policy mechanism.

Manual scripts on service APIs. A fleet of Lambdas or a Step Functions state machine walking resources. Technically complete; operationally a problem child. Immutability has to be built by hand. Audit reports are a data-engineering side-quest.

Third-party backup vendors. Purpose-built data-protection platforms (Veeam, Clumio, Cohesity, Rubrik, Commvault). The correct answer when posture is multi-cloud or when the vendor’s anomaly-detection features matter. A vendor relationship and a subscription when the mandate is AWS-only and AWS-native services satisfy it.

Side by side

Mechanism Immutable Cross-account Cross-region Org-wide Auditable
AWS Backup (plan + vault lock + copy) ✓ (vault lock, compliance mode) ✓ (copy action) ✓ (copy action) ✓ (Orgs delegated admin) ✓ (Backup Audit Manager)
Service-native snapshots — (varies) — (per-service) — (per-service)
Manual scripts on service APIs — (hand-built)
Third-party backup vendors

Two rows tick every column: AWS Backup inside the AWS-native world, and the third-party vendor row. The mandate is AWS-only; AWS Backup lands.

Matching the mandate to the primitives

Single account ordinary backups, one team Regulated enterprise ransomware-resilient, AWS-only Specialised posture multi-cloud, vendor features One team, one account daily EBS snapshots same account for copies no regulator 40 accounts, 1 Org 7-year retention mandated separate recovery account regulator-readable reports Multi-cloud estate AWS + Azure + on-prem anomaly detection required vendor-managed immutability Org-wide policy needed? no Org-wide policy needed? yes AWS-native enough? no Service-native snapshots? DLM / automated RDS backups? good enough Compliance-mode Vault Lock? copy actions cross-acct/region? Backup Audit Manager? Vendor covers AWS natively? vendor immutability grade? yes Service-native snapshots DLM, RDS automated per-service config fine for single account AWS Backup Organizations policy vault lock + copy actions Audit Manager reports Third-party vendor multi-cloud coverage anomaly detection vendor-operated immutability
Three backup postures, three primitives. The ransomware-resilience mandate lives in the middle column.

The delegated administrator and the backup policy

Central management in AWS Backup is not a dashboard. It is a policy primitive at the Organizations layer that pushes plans down the tree, plus one account elected to see estate-wide backup state.

From the management account, enable trusted access for AWS Backup in AWS Organizations and register one member account in the security OU as delegated administrator: aws organizations register-delegated-administrator --account-id <sec-acct> --service-principal backup.amazonaws.com. That account authors policies, publishes them to OUs, and runs Backup Audit Manager frameworks.

A backup policy is JSON, shaped deliberately like an SCP, it attaches to OUs or accounts and is inherited down the tree. It embeds one or more backup plans, each with rules (schedule, lifecycle, copy actions) and resource selections.

{
  "plans": {
    "RansomwareResiliencePlan": {
      "regions": { "@@assign": ["eu-west-1"] },
      "rules": {
        "DailyBackup": {
          "schedule_expression": { "@@assign": "cron(0 2 * * ? *)" },
          "target_backup_vault_name": { "@@assign": "prod-local-vault" },
          "lifecycle": { "delete_after_days": { "@@assign": "2555" } },
          "copy_actions": {
            "arn:aws:backup:eu-west-1:REC-ACCT:backup-vault/recovery-eu-west-1": {
              "target_backup_vault_arn": {
                "@@assign": "arn:aws:backup:eu-west-1:REC-ACCT:backup-vault/recovery-eu-west-1"
              },
              "lifecycle": { "delete_after_days": { "@@assign": "2555" } }
            },
            "arn:aws:backup:eu-west-2:REC-ACCT:backup-vault/recovery-eu-west-2": {
              "target_backup_vault_arn": {
                "@@assign": "arn:aws:backup:eu-west-2:REC-ACCT:backup-vault/recovery-eu-west-2"
              },
              "lifecycle": { "delete_after_days": { "@@assign": "2555" } }
            }
          }
        }
      },
      "selections": {
        "tags": {
          "BackupTagSelection": {
            "iam_role_arn": { "@@assign": "arn:aws:iam::$account:role/service-role/AWSBackupDefaultServiceRole" },
            "tag_key": { "@@assign": "Backup" },
            "tag_value": { "@@assign": ["true"] }
          }
        }
      }
    }
  }
}

Retention is 2,555 days (seven years) at every stop. Two copy actions name destination vault ARNs, one in the recovery account in the same region, one in the second region. The $account placeholder is resolved by Organizations to each member’s own account ID.

Attach the policy to the production OU and every account in it inherits the plan. New accounts joining inherit automatically.

The recovery account, and why it’s the whole point

The recovery account is what makes the mandate real. It is the account whose compromise the production team cannot cause by any action in their own.

Owned by a different team, under a separate OU. The recovery account sits in security/recovery. No IAM role in any production account trusts recovery-account principals; no SSO permission set grants production engineers access.

Vaults in each region, each with a local KMS CMK. Each vault uses a customer-managed key in the recovery account. The production account’s KMS key never touches the recovery copy.

A vault access policy that trusts specific sources. The destination vault’s resource policy allows source accounts the backup:CopyIntoBackupVault action, scoped to the vault ARN. It does not name source-account principals for backup:DeleteRecoveryPoint or backup:DeleteBackupVault. The source can write; it cannot delete.

Vault Lock in compliance mode with a cooling-off period. aws backup put-backup-vault-lock-configuration --backup-vault-name recovery-eu-west-1 --min-retention-days 2555 --max-retention-days 36500 --changeable-for-days 7. The parameters:

  • min-retention-days: recovery points cannot be deleted until at least 2,555 days old.
  • max-retention-days: a ceiling that prevents misconfigured plans creating points that live forever.
  • changeable-for-days: the cooling-off period (3 to 30 days). Within that window, backup-ops can modify or remove the lock if the configuration is wrong. Once it elapses, the lock is immutable. AWS support cannot unlock it. The root user of the recovery account cannot unlock it. Nobody can.

Vault Lock has two modes: governance (soft, a role with backup:DeleteBackupVaultLockConfiguration can remove it) and compliance (the one the mandate needs).

Why this beats production-account admin credentials

An attacker has AdministratorAccess in a production account. They can delete recovery points in the production-account local vault, that is the working copy, and losing it does not matter, because the authoritative copy sits in the recovery account. They can delete the source-side backup plan; authoritative copies already in the recovery-account vault remain intact. They cannot reach into the recovery account, no IAM role, no trust policy, no path. The cross-account trust is one-way: CopyIntoBackupVault only. Even if a path was found, Vault Lock in compliance mode rejects backup:DeleteRecoveryPoint against any point younger than 2,555 days; once the cooling-off window is over, not even the recovery-account root can delete. The encryption key for the recovery copy is a CMK in the recovery account that the production admin has no access to.

Backup Audit Manager, concretely

Backup Audit Manager is AWS Backup’s framework-and-controls layer, built on AWS Config. A framework is a named collection of controls. Pre-built controls relevant here:

  • BACKUP_RESOURCES_PROTECTED_BY_BACKUP_PLAN
  • BACKUP_PLAN_MIN_FREQUENCY_AND_MIN_RETENTION_CHECK
  • CROSS_REGION_BACKUP_COPY
  • CROSS_ACCOUNT_BACKUP_COPY
  • BACKUP_RECOVERY_POINT_ENCRYPTED
  • BACKUP_RECOVERY_POINT_MINIMUM_RETENTION_CHECK
  • BACKUP_VAULT_LOCK_ENABLED (with a compliance-mode variant)

Create the framework in the delegated admin account. AWS Config’s organisation-wide aggregator, enabled by the same admin, is what makes the framework a cross-account evaluator. Then create a report plan naming the framework and an S3 bucket. The report plan runs on schedule and publishes per-control pass/fail CSV plus JSON detail into the bucket.

The quarterly report is one S3 prefix per run: one file per control, an index, per-resource detail. AWS Backup generates it; the team forwards the bucket location to the regulator with a cross-account read grant.

The KMS piece, which is easy to miss

Cross-account and cross-region copies of encrypted recovery points touch KMS in two places, and misconfiguring either is how copy actions silently fail on an otherwise correct-looking vault.

Source-side: the original recovery point is encrypted with a CMK in the source account. The copy job needs kms:Decrypt (and kms:GenerateDataKey for supported resource types) with that key, granted to the AWS Backup service principal and to the destination account.

Destination-side: the destination vault’s CMK is in the recovery account, and its key policy must allow the source accounts (or aws:PrincipalOrgID scoped to the organisation) to kms:GenerateDataKey. Without both, the copy fails with a KMS permission error surfaced in the audit report.

The pragmatic pattern is one CMK per region in the recovery account, with a policy keyed on aws:PrincipalOrgID so a new production account joining the OU needs no key-policy edit.

What’s worth remembering

  1. AWS Backup is the AWS-native primitive for cross-service, cross-account, cross-region, centrally-managed backup.
  2. Backup policies attach to OUs via the Organizations delegated administrator. One policy inherits down the tree; new accounts joining inherit automatically.
  3. Backup plan copy actions are how cross-account and cross-region replication works. One plan, several copy actions to named destination vault ARNs, each with its own retention.
  4. Vault Lock has two modes. Compliance mode, once the cooling-off period elapses, is immutable, not even AWS support, not even root, can shorten retention. Governance mode is softer.
  5. The cooling-off period (3-30 days) is the only window in which a compliance-mode lock can be undone. Plan for catching misconfiguration during that window.
  6. Separate recovery account under a separate OU is the practical realisation of “separate IAM control”.
  7. Destination vault encryption uses a KMS CMK in the recovery account. Both source and destination CMK policies must allow the cross-account copy path; KMS is the silent-failure mode.
  8. Backup Audit Manager is a feature of AWS Backup, not the separate AWS Audit Manager service. Frameworks and report plans produce per-control pass/fail artefacts on a schedule.
  9. The controls that map to this mandate are BACKUP_RESOURCES_PROTECTED_BY_BACKUP_PLAN, CROSS_REGION_BACKUP_COPY, CROSS_ACCOUNT_BACKUP_COPY, BACKUP_VAULT_LOCK_ENABLED (compliance variant), BACKUP_RECOVERY_POINT_MINIMUM_RETENTION_CHECK, and BACKUP_RECOVERY_POINT_ENCRYPTED.
  10. The working copy and the authoritative copy are different things. An attacker with production admin credentials can destroy the working copy all day; the authoritative copy sits somewhere they have no path to.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.