How to Prove Backup Compliance With AWS Backup Audit Manager

September 27, 2027 · 19 min read

CloudOps Engineer · SOA-C03 · part of The Exam Room

The situation

Acme is SOC 2 and PCI DSS. Both audits care about backups: that they exist, that they run on the declared schedule, that they’re retained for the declared period, and that they’re recoverable. Every quarter the compliance team goes through a variant of the same ritual: ask engineering to prove it, engineering opens consoles, screenshots, assembles a PDF, and hands it over.

The backup landscape Acme has to account for:

  • 40 RDS instances across three production accounts. Automated backups enabled, 14-day retention. Some teams also take manual snapshots during migrations.
  • 280 EBS volumes on the EC2 fleet. Backed up via a mix of RDS-managed snapshots, Data Lifecycle Manager policies, and (historically) a custom Lambda that predates DLM.
  • 60 DynamoDB tables with point-in-time recovery enabled, plus on-demand backups before schema changes.
  • 12 EFS file systems with backups enabled through AWS Backup.
  • 8 Aurora clusters; the story here is similar to RDS but the API shape is different.
  • 3 FSx file systems for Windows workloads.

Today, the “backup policy” lives in a wiki page: a prose description of what should be backed up, how often, and for how long. There is no enforcement and no automated attestation. The only evidence an instance was backed up last night is either a console screenshot or a human promising they saw the console a few hours ago.

The compliance team wants three things:

  1. A single policy that says “this class of resource is backed up this way” and is applied uniformly.
  2. Automated attestation: at any moment, a report that shows every in-scope resource, its backup frequency, its retention, its most recent backup, and whether it’s currently compliant with the policy.
  3. Exportable to the auditor without a human touching a console.

What actually matters

Before reaching for a specific service, it’s worth pinning down what “prove the backup ran” actually requires and which properties of a solution end up doing the work.

The first thing worth thinking about is what counts as evidence. A screenshot is a claim, not evidence; it’s an artefact a human assembled, with no traceable link to the backup system’s own records. Real evidence has three properties: it comes from the system that actually performed the backup, it carries a timestamp the auditor can correlate, and it’s produced by a mechanism nobody had to remember to run. Anything that depends on a person opening a console is a screenshot with extra steps; the moment a person is in the loop, the loop is fragile.

The second is coverage across the resource zoo. The estate spans RDS, EBS, DynamoDB, EFS, FSx, Aurora: six different services with six different native backup mechanisms. Per-service tooling means six different consoles, six different APIs, and six different definitions of “ran successfully.” A unified plane that speaks all of them is worth more than the sum of its parts; the auditor doesn’t want six reports, and the platform team doesn’t want six integrations to maintain.

The third is the unit of policy. Backup intent is a small set of declarative statements (“this class of resource is backed up daily and retained 14 days, that class continuously with PITR”), not a long list of “did we remember to enable backups on this specific instance?”. The chosen tool has to let those statements live in a file (CloudFormation or similar), be reviewed in PRs, and apply uniformly. Per-resource clicks are how the wiki page and reality drift apart.

The fourth is enrolment as a side-effect of provisioning. New resources appear constantly. If joining the backup regime requires a manual step after provisioning, eventually somebody forgets, and the auditor finds it. The right model is enrolment by attribute (a tag, a resource type, a Region), so that everything matching the rule is in scope and the provisioning pipeline carries the responsibility of applying the attribute.

The fifth is immutability of recovery points. Retention isn’t just “we kept the snapshot.” It’s “we kept the snapshot, and nobody (not an engineer with a bad day, not a compromised credential, not the root user) could have deleted it before its time was up.” The chosen mechanism has to support a write-once posture on the recovery point itself; without that, “we have backups” is a polite fiction the moment a delete API exists.

The sixth is attestation versus operation. There’s the daily fact of backups running, and there’s the periodic fact of asking “did they run, and are they configured the way we said?”. These are different jobs. A solution that backs up but doesn’t attest leaves the auditor’s question unanswered; a solution that attests but doesn’t itself manage the backups can only report on what other tooling did, which adds a join the auditor has to trust. The cleanest answer is one plane that does both, so the policy and the evidence come from the same source.

The seventh is org-wide enforcement. The estate is multi-account. Defining backup policy in each account independently is how policies drift; defining it centrally and pushing it down is how they stay aligned. Whatever the mechanism, it has to let one definition land uniformly across accounts and survive accounts joining or leaving the org.

What we’ll filter on

  1. Cross-service: does it cover RDS, EBS, DynamoDB, EFS, FSx, Aurora in one plane?
  2. Policy as IaC: are plans declarative and version-controlled?
  3. Tag-based selection: do new resources auto-enrol by tagging?
  4. Automated attestation: does the tool generate the compliance report?
  5. Org-wide enforcement: can one plan be pushed to every account?
  6. Export-ready: is the report a file the auditor wants, or raw data for post-processing?

The backup-attestation landscape

  1. Per-service console screenshots (status quo). Works; doesn’t scale; evidence is a PDF of screenshots. The thing the compliance team is leaving behind.

  2. AWS Backup, plans only. Unified backup plane across supported services. Plans, vaults, selections, backup jobs. No native attestation beyond the job history in the console.

  3. AWS Backup + Audit Manager. Plans plus controls plus scheduled reports delivered to S3. The definitive answer for automated attestation, per account or across the organisation.

  4. DIY via Config rules. Config has managed rules like BACKUP_PLAN_MIN_FREQUENCY_AND_MIN_RETENTION_CHECK and BACKUP_RECOVERY_POINT_MANUAL_DELETION_DISABLED that evaluate backup configuration. Useful as supplementary checks; not a replacement for Backup Audit Manager’s framework-and-report model.

  5. Third-party (Veeam, Commvault, Rubrik). Full external backup platform. Legitimate for hybrid or multi-cloud; overkill for an AWS-only estate that already has native backups configured.

  6. Custom Lambda + Athena. Scrape the various service APIs, land data in S3, query with Athena, generate reports. Total control; meaningful engineering effort; hard to maintain when AWS ships new services. The last resort, not the starting point.

Side by side

Option Cross-service IaC Tag selection Attestation Org-wide Export-ready
Console screenshots Manual PDF
AWS Backup (plans only) Partial Via jobs API
AWS Backup + Audit Manager
Config rules ✓ (subset) Partial Via Config
Third-party platform Varies Varies Varies Varies
Lambda + Athena Custom ✓ (built) Custom ✓ (built)

Backup + Audit Manager is the row that ticks every column while letting the policy live in CloudFormation or Terraform. Config rules are a good supplement; third-party is overkill for an AWS-only estate that needs attestation, not re-platforming.

From tag to audit report

Tagged resources RDS orders-db-prod Backup=daily-14d EBS vol-0abc1234 Backup=daily-14d DDB customers Backup=daily-14d … 60+ more Backup plan: daily-14d rule 1: cron(0 2 * * ? *) retention: 14 days vault: prod-backups Selection tag:Backup == daily-14d auto-enrols new matching resources Plan runs produce backup jobs one job per resource per run recovery point in vault on success Backup vault: prod-backups KMS-encrypted, access policy scoped Vault Lock: Governance mode, 14-day min Recovery points arn:aws:backup:eu-west-1:…recovery-point/abc arn:aws:backup:eu-west-1:…recovery-point/def arn:aws:backup:eu-west-1:…recovery-point/ghi ... one per (resource, daily run) cross-Region copy to us-east-1 via the plan's CopyActions, if configured Backup Audit Manager: framework controls evaluated on each resource class BACKUP_RESOURCES_PROTECTED_BY_BACKUP_PLAN BACKUP_RECOVERY_POINT_MANUAL_DELETION_DISABLED BACKUP_PLAN_MIN_FREQUENCY_AND_MIN_RETENTION_CHECK BACKUP_RECOVERY_POINT_MINIMUM_RETENTION_CHECK Report plan: compliance-monthly cron(0 9 1 * ? *) · format: PDF + CSV s3://acme-compliance-reports/backup/2027-06-01/ signed PDF handed to auditor; CSV for engineering
One tag drives plan selection; plans run against vaults; Audit Manager evaluates a framework and a report plan delivers the evidence to S3 on a schedule. The auditor receives a file, not a meeting.

The plan in depth

A backup plan is declarative JSON, typically deployed as CloudFormation:

AcmeDaily14dPlan:
  Type: AWS::Backup::BackupPlan
  Properties:
    BackupPlan:
      BackupPlanName: acme-daily-14d
      BackupPlanRule:
        - RuleName: daily-02utc
          TargetBackupVault: !Ref ProdBackupVault
          ScheduleExpression: cron(0 2 * * ? *)
          StartWindowMinutes: 60
          CompletionWindowMinutes: 360
          Lifecycle:
            DeleteAfterDays: 14
          CopyActions:
            - DestinationBackupVaultArn: !GetAtt CrossRegionVault.BackupVaultArn
              Lifecycle: { DeleteAfterDays: 14 }
          EnableContinuousBackup: false

AcmeDaily14dSelection:
  Type: AWS::Backup::BackupSelection
  Properties:
    BackupPlanId: !Ref AcmeDaily14dPlan
    BackupSelection:
      SelectionName: daily-14d-tag-selected
      IamRoleArn: !GetAtt BackupRole.Arn
      ListOfTags:
        - ConditionType: STRINGEQUALS
          ConditionKey: Backup
          ConditionValue: daily-14d

Four bits earn their keep. StartWindowMinutes: 60 gives Backup an hour to start each job, useful when the scheduler is about to fire 40 jobs at 02:00 and some need to wait. CompletionWindowMinutes: 360 caps the job’s total time before it’s marked failed. CopyActions replicates the recovery point to a cross-Region vault automatically, which is the cross-Region disaster-recovery posture most auditors want. EnableContinuousBackup: true is the switch for PITR on supported resources (DynamoDB, Aurora, RDS); leaving it false here means the plan takes discrete snapshots rather than continuous.

The selection uses ListOfTags to match resources by tag; this is the join. A disciplined tag-to-plan mapping is what makes auto-enrolment work.

The vault in depth

A backup vault is the storage destination. Vaults are KMS-encrypted; each has an access policy that controls who can restore or delete recovery points. For compliance, Vault Lock is the critical feature:

ProdBackupVault:
  Type: AWS::Backup::BackupVault
  Properties:
    BackupVaultName: prod-backups
    EncryptionKeyArn: !GetAtt BackupKmsKey.Arn
    AccessPolicy:
      Version: '2012-10-17'
      Statement:
        - Effect: Deny
          Principal: '*'
          Action: ['backup:DeleteRecoveryPoint', 'backup:UpdateRecoveryPointLifecycle']
          Resource: '*'
          Condition:
            StringNotEquals:
              'aws:PrincipalArn': arn:aws:iam::111122223333:role/BackupOperator
    LockConfiguration:
      MinRetentionDays: 14
      MaxRetentionDays: 365
      ChangeableForDays: 3

LockConfiguration puts the vault into Vault Lock’s Governance mode initially (deletable during the ChangeableForDays window) and can transition to Compliance mode after which the lock is immutable for the life of the vault. Auditors love this: recovery points in a Compliance-mode locked vault cannot be deleted by any principal, including the root user, until their retention expires. It’s the backup equivalent of object lock on S3.

The framework and the report

Backup Audit Manager’s unit of policy is a framework:

AcmeBackupFramework:
  Type: AWS::Backup::Framework
  Properties:
    FrameworkName: acme-core-backup
    FrameworkDescription: Core backup controls for Acme production
    FrameworkControls:
      - ControlName: BACKUP_RESOURCES_PROTECTED_BY_BACKUP_PLAN
        ControlInputParameters:
          - ParameterName: requiredRetentionDays
            ParameterValue: "14"
          - ParameterName: requiredFrequencyValue
            ParameterValue: "1"
          - ParameterName: requiredFrequencyUnit
            ParameterValue: "days"
        ControlScope:
          ComplianceResourceTypes: [ RDS, EBS, DynamoDB, EFS, Aurora, FSx ]

      - ControlName: BACKUP_RECOVERY_POINT_MANUAL_DELETION_DISABLED
        ControlInputParameters:
          - ParameterName: principalArnList
            ParameterValue: "arn:aws:iam::111122223333:role/BackupOperator"

      - ControlName: BACKUP_PLAN_MIN_FREQUENCY_AND_MIN_RETENTION_CHECK
        ControlInputParameters:
          - ParameterName: requiredFrequencyValue
            ParameterValue: "1"
          - ParameterName: requiredFrequencyUnit
            ParameterValue: "days"
          - ParameterName: requiredRetentionDays
            ParameterValue: "14"

      - ControlName: BACKUP_RECOVERY_POINT_MINIMUM_RETENTION_CHECK
        ControlInputParameters:
          - ParameterName: requiredRetentionDays
            ParameterValue: "14"

Each control is evaluated continuously and the framework’s compliance status rolls up in real time. For the scheduled export, a report plan wraps the framework:

AcmeMonthlyReport:
  Type: AWS::Backup::ReportPlan
  Properties:
    ReportPlanName: acme-backup-compliance-monthly
    ReportDeliveryChannel:
      S3BucketName: acme-compliance-reports
      S3KeyPrefix: backup/
      Formats: [ CSV, JSON ]
    ReportSetting:
      ReportTemplate: CONTROL_COMPLIANCE_REPORT
      FrameworkArns: [ !GetAtt AcmeBackupFramework.FrameworkArn ]

The report runs (on-demand or scheduled via EventBridge targeting backup:StartReportJob), evaluates the framework, and writes the results to S3. PDF is available for some templates; CSV and JSON for structured consumption. The auditor’s copy is a signed PDF; the compliance team’s view is the CSV rolled up into a dashboard.

Report templates of note: BACKUP_JOB_REPORT (every backup job in a window, with status), CONTROL_COMPLIANCE_REPORT (every control’s evaluation), RESOURCE_COMPLIANCE_REPORT (every resource’s compliance against its plan), COPY_JOB_REPORT (cross-Region copy history), RESTORE_JOB_REPORT (every restore action). The quarterly audit at Acme uses the first three; the restore report is the artefact for “prove we’ve tested recovery.”

Org-wide: one policy, one report

For the estate-wide view, Organizations has backup policies (BACKUP_POLICY type) that push a plan into every member account. Attached at the OU level, the policy contains the same shape as an account-level plan; each member account ends up with a plan named after the org policy, pointing at an auto-created vault per account.

{
  "plans": {
    "acme-daily-14d": {
      "regions": { "@@append": ["eu-west-1", "us-east-1"] },
      "rules": { "rule-daily": {
        "schedule_expression": { "@@assign": "cron(0 2 * * ? *)" },
        "start_backup_window_minutes": { "@@assign": "60" },
        "complete_backup_window_minutes": { "@@assign": "360" },
        "lifecycle": { "delete_after_days": { "@@assign": "14" } },
        "target_backup_vault_name": { "@@assign": "prod-backups" }
      }},
      "selections": {
        "tags": {
          "daily-14d-selection": {
            "iam_role_arn": { "@@assign": "arn:aws:iam::$account:role/AWSBackupOrgAdminRole" },
            "tag_key": { "@@assign": "Backup" },
            "tag_value": { "@@assign": ["daily-14d"] }
          }
        }
      }
    }
  }
}

The @@assign / @@append syntax is Organizations’ policy merging: higher-level policies can extend or override OU-level ones. The vault role is parameterised by $account so each member account uses its own local role.

Audit Manager’s org-level reporting requires the delegated admin role for Backup: a designated account that can query framework state across every member account and produce a consolidated report.

A worked audit

Auditor arrives on Tuesday. The compliance team hands over three artefacts:

  1. Policy document. The YAML CloudFormation for the plans, vaults, framework, and report plan, version-controlled, with git history showing every change in the last year.
  2. Latest compliance report. The PDF from the most recent CONTROL_COMPLIANCE_REPORT run, dated two days ago. Shows: 412 resources evaluated across 4 resource types; 409 compliant; 3 non-compliant (two RDS instances in a new account that haven’t been tagged yet, one EFS file system created last week). For each non-compliant resource, the specific control that failed and the time of evaluation.
  3. Restore job report. Last quarter’s RESTORE_JOB_REPORT showing the two quarterly recovery drills (one RDS instance, one DynamoDB table) completed successfully with the restored resource IDs and timings.

The auditor’s follow-ups are specific (why are those 3 resources non-compliant?) and the engineering team’s answers are specific (tags missed during provisioning; tagging patch deployed last night; next report will show 412/412 compliant). Total time for the backup section of the audit: about 20 minutes, most of it reading the PDF.

What’s worth remembering

  1. AWS Backup is the policy plane. Plans, selections, vaults, recovery points across RDS, EBS, DynamoDB, EFS, FSx, Aurora, and the rest. One set of concepts; one place to configure.
  2. Tag-based selection is the auto-enrolment mechanism. A single tag (Backup=daily-14d) applied by the provisioning pipeline lets new resources join the plan without any explicit update.
  3. Backup Audit Manager turns policy into evidence. Frameworks hold controls; controls evaluate continuously; report plans deliver findings to S3 on a schedule. Per-account or org-wide.
  4. Controls to know. BACKUP_RESOURCES_PROTECTED_BY_BACKUP_PLAN, BACKUP_PLAN_MIN_FREQUENCY_AND_MIN_RETENTION_CHECK, BACKUP_RECOVERY_POINT_MINIMUM_RETENTION_CHECK, BACKUP_RECOVERY_POINT_MANUAL_DELETION_DISABLED, BACKUP_VAULT_LOCK. Together, they encode the backup policy’s entire intent.
  5. Vault Lock is the “auditor’s dream” switch. Compliance mode makes recovery points immutable until retention expires; even root can’t delete. Governance mode is the stepping stone with a change window.
  6. Report templates differ by audience. CONTROL_COMPLIANCE_REPORT is the executive summary; RESOURCE_COMPLIANCE_REPORT is the engineering worklist; BACKUP_JOB_REPORT and RESTORE_JOB_REPORT are the operational history.
  7. Organizations Backup Policy pushes plans to every account. Attached at the OU, the policy is inherited by member accounts with a locally-scoped vault role. Combined with Audit Manager’s org-level report, one policy produces one consolidated compliance artefact.
  8. Cross-Region copy lives inside the plan. CopyActions declares the destination vault and its retention. The disaster-recovery posture is in the same file as the daily schedule.

Acme’s audit goes from a four-hour manual scramble to a file pulled out of an S3 bucket. The policy is a CloudFormation template reviewed in PRs. New resources auto-enrol by tag. Non-compliant findings are the interesting part of the audit meeting, not the proof of existence. The auditor leaves with a PDF; the compliance team’s next quarter is the same PDF with a different date; the engineering team spends its audit cycles on actual findings, not screenshots.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.