How to Aggregate Logs Across an Organisation

February 15, 2027 · 16 min read

Solutions Architect Pro · SAP-C02 · part of The Exam Room

The situation

Twenty AWS accounts; at least six distinct log sources per account:

  • CloudTrail. API calls, management and data events.
  • VPC Flow Logs, network accept/reject decisions at the ENI level.
  • CloudWatch Logs, application logs (ECS stdout, Lambda logs, etc.).
  • S3 server access logs, per-object operations on buckets with sensitive data.
  • ELB access logs. ALB and NLB request records.
  • GuardDuty findings, security detections.

Today each account owns its own logs. The auditor’s “who accessed payroll” query requires visiting one account, running Athena against its bucket, formatting the result. The SOC’s alerting relies on Config rules that fire per-account, with no cross-account view.

The goal is one pane of glass, one cross-account query, one alerting pipeline, without doubling the storage bill or creating a second source of truth.

What actually matters

The core trade in log aggregation is cost-per-byte in exchange for query latency. Object storage is cheap but higher-latency to query; a managed log service is more expensive per ingest but supports sub-minute alerting; a dedicated search/observability platform is the most expensive but fastest to query with dashboards. Picking one for everything loses the plot; picking per log source is the adult answer.

The first thing to ask is: what’s each log for? Audit-style records: audit, compliance, incident investigation (monthly to yearly retention, ad-hoc queries). Network flow records: forensic network analysis (days to months retention, ad-hoc queries). Application logs: real-time debugging and alerting (hours to weeks in hot storage, months in cold). Each has a different access pattern and a different cost-per-query tolerance.

The second is: where do we centralise? A dedicated logging account in the organisation, owned by the security team, is the standard pattern. No workloads run there; only log destinations and query tooling. The logging account is the blast-radius boundary, a compromise in a workload account can’t delete central logs because the logging account’s IAM policies don’t allow it.

The third is: how do the logs get there? Each source has its own native delivery mechanism, and the costs differ. The choice is between native cross-account features (the source ships straight to the central bucket), subscription-based forwarders (a destination in the logging account, subscriptions in each source account), or roll-your-own. Native is cheaper to operate but only as flexible as the source service allows; subscriptions are flexible but mean more moving parts.

The fourth is: how do we query? Object-store query engines for audit-style ad-hoc queries across months of data. A managed log-search service for near-real-time queries. An indexed search cluster (or a third-party SIEM) for dashboards and long-retention full-text search. Usually two or three of these; each has a role.

The fifth is: how long do we keep each? Regulatory retention (7 years for financial data) wants a cold-archive tier. Operational retention (30-90 days hot, then cold) wants a hot store with class-based tiering. Forensic retention sits between, weeks of hot, months of cold, depending on incident-response policies.

What we’ll filter on

  1. Aggregation mechanism, native cross-account, subscription-based, manual shipping?
  2. Query latency, seconds (hot store), minutes (Athena), hours (restore from Glacier)?
  3. Cost per GB ingested. S3 Standard/IA/Glacier tiers vs CloudWatch Logs ingest vs OpenSearch ingest.
  4. Retention flexibility, how easily can I keep 90 days hot and 7 years cold?
  5. Alerting fit, is this store one I can alert from in sub-minute time?

The aggregation landscape

  1. CloudTrail Organization Trail. One trail configured in the management or delegated admin account, writes every member account’s CloudTrail to a central S3 bucket. Management events free (first trail per account), data events priced per event. Bucket in the logging account; bucket policy trusts the CloudTrail service principal. Querying via Athena with a partitioned table; typical queries take seconds to tens of seconds over months of data.

  2. VPC Flow Logs to central S3. Each VPC’s flow log configuration points to the central bucket. Bucket policy must allow the delivery.logs service principal from each account. Custom log format to include all fields. Athena table partitioned by account, region, date. Volumes are large, a busy VPC generates many GB per day, so S3 Standard + IA lifecycle is the norm, with Athena for query.

  3. Central CloudWatch Logs subscription. CloudWatch Logs destination (LogDestination) in the logging account; each member account’s log group has a subscription filter forwarding to the destination. The destination typically forwards to Kinesis Data Streams or Firehose, which writes to S3 and/or OpenSearch. Near-real-time (seconds of latency); priced on ingestion bytes.

  4. AWS Security Hub / AWS Audit Manager. Not log stores, but aggregators of specific finding types. Security Hub pulls GuardDuty, Inspector, Macie, Config findings; Audit Manager pulls compliance evidence. Central view without moving the underlying logs; consumes its own service charges.

  5. Amazon OpenSearch Service (central cluster). Full-text indexed storage with Kibana/OpenSearch Dashboards. Fed from Firehose, Lambda, or direct from CloudWatch Logs. Best for investigators; priced on cluster size, typically the most expensive log destination.

  6. Third-party SIEM (Datadog, Splunk, Sumo Logic). Similar to OpenSearch but operated by the vendor. Better-polished dashboards; the bill comes quarterly.

Side by side

Option Aggregation Query latency Cost per GB Retention Alerting
CloudTrail Org Trail -> S3 Native org-wide Seconds-minutes (Athena) S3 tiers 7+ years via Glacier CloudWatch alarms on metric filter
Flow Logs -> S3 Per-VPC Seconds-minutes (Athena) S3 tiers Per policy Limited
Central CWL subscription Per-log-group filter Seconds (Insights) CWL ingest + destination CWL class ✓ (metric filters + alarms)
Security Hub Native Seconds (finding queries) Per finding 90 days (Hub’s own) ✓ (EventBridge)
Central OpenSearch Firehose / direct Seconds (dashboards) Cluster + storage ISM per index ✓ (watchers)
Third-party SIEM Vendor collectors Seconds (dashboards) Vendor Vendor Vendor

Reading by log source:

  • CloudTrail. Organization Trail -> central S3. Athena for audit queries, CloudWatch metric filter + alarm for critical events (root login, IAM policy changes).
  • VPC Flow Logs, per-VPC configuration to central S3. Athena for forensic, near-real-time not required.
  • Application logs. CloudWatch Logs in each account, subscription filter to central Firehose -> S3 (cold) + OpenSearch (hot 30 days).
  • GuardDuty findings. Security Hub aggregation to the logging account, EventBridge fan-out for alerting.
  • S3 access logs. S3’s native log destination to central bucket. Rarely queried; keep cheap.

The aggregation architecture

Source accounts (3 of 20) acct: payments-prod CloudTrail (via org trail) VPC Flow Logs -> central bucket CWL subscription filter on /aws/* -> central Firehose destination GuardDuty detector S3 access logs -> central bucket acct: ledger-prod CloudTrail (via org trail) VPC Flow Logs -> central bucket CWL subscription filter GuardDuty detector acct: customer-prod same as above ... all 20 production accounts onboarded via StackSet Central logging account S3: org-cloudtrail (central bucket) partitioned by account / region / date Glacier IR at 90d, Deep Archive at 365d, 7-year retention, bucket policy restricts DELETE S3: org-flow-logs Parquet format via Firehose Athena workgroup: SecurityAnalysts Kinesis Data Firehose destination for CWL subs Lambda transform (JSON -> Parquet) -> S3 (cold) + OpenSearch (hot) buffer: 5MB / 60s OpenSearch cluster hot tier: 30 days UltraWarm + S3 cold Dashboards for SOC index per log type per day Security Hub (delegated admin) aggregates GuardDuty, Config, Inspector org-wide via designated admin EventBridge -> SNS -> PagerDuty Athena + Glue Catalog workgroups: Audit, SecurityAnalysts partitioned tables for Trail, Flow Logs query cost: per-TB-scanned IAM: logging account policies deny delete from any source-account principal; SCP locks the bucket
Each log type has its own pipeline. CloudTrail and Flow Logs write direct to S3; CWL streams through Firehose to hot (OpenSearch) and cold (S3); GuardDuty flows through Security Hub. Athena queries the S3 tiers.

The picks in depth

CloudTrail Organization Trail. One trail in the management account (or delegated to the logging account) with IsOrganizationTrail: true. Writes to s3://org-cloudtrail/AWSLogs/<account-id>/CloudTrail/<region>/.... Management events free; data events configured for the buckets and tables that matter (s3:GetObject on records-primary, DynamoDB reads on payroll_people, etc.).

Bucket policy allows the CloudTrail service principal to write; the bucket is in the logging account so no source account can delete its own trail records.

Lifecycle: S3 Standard for 30 days (hot for audit), S3 Glacier Instant Retrieval at 30 days, Glacier Deep Archive at 365 days, expiry at 2555 days (7 years) for compliance.

Flow Logs per VPC to the central bucket. Configure each VPC’s Flow Log to write to s3://org-flow-logs/<account>/... in custom format that includes all fields (timestamps, TCP flags, packet/byte counts, interface ID, etc.). Partition the bucket by year/month/day to keep Athena queries cheap.

A Lambda fires on S3 PUT to update the Athena partition if new date prefixes appear (or use partition projection on the Athena table, which skips the Lambda entirely).

CloudWatch Logs central subscription. A LogDestination in the logging account, targeted at a Kinesis Data Firehose delivery stream:

{
  "destinationName": "central-logs",
  "targetArn": "arn:aws:firehose:eu-west-1:LOGGING:deliverystream/central-log-ingest",
  "roleArn": "arn:aws:iam::LOGGING:role/CWLtoKinesisFirehoseRole",
  "accessPolicy": "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Principal\":{\"AWS\":[\"arn:aws:iam::SOURCE1:root\",\"arn:aws:iam::SOURCE2:root\"]},\"Action\":\"logs:PutSubscriptionFilter\",\"Resource\":\"arn:aws:logs:eu-west-1:LOGGING:destination:central-logs\"}]}"
}

Source accounts put subscription filters on their log groups targeting this destination. A CloudFormation StackSet rolls the subscription configuration out to every account.

Firehose dual-writes: to S3 (as Parquet for cheap storage and Athena queries) and to OpenSearch (for dashboards). A Lambda transform converts CloudWatch Logs’ JSON format to a cleaner Parquet schema.

OpenSearch for the hot tier. A single cluster in the logging account, one index pattern per log source. ISM policies roll indices to UltraWarm after 30 days (cheaper, slower) and S3-backed cold storage at 90 days. Dashboards in OpenSearch Dashboards provide the SOC’s view.

Alerting is configured per-index-pattern with OpenSearch alerting monitors; triggers fire on thresholds (e.g., more than 10 failed sudo attempts from one host in 5 minutes) and send to an SNS topic that fans out to PagerDuty.

Security Hub as the aggregator for findings. The logging account is delegated as the Security Hub administrator. GuardDuty’s organisation-wide configuration sends all findings here. Same for Inspector, Macie, Config. Security Hub’s native UI shows a cross-account view; EventBridge rules on aws.securityhub fire on severity >= HIGH and page the on-call.

Athena workgroups. Two workgroups in the logging account: AuditReadOnly for auditors (read-only access to CloudTrail tables), SecurityAnalysts for the SOC (access to all log tables including Flow Logs). Workgroup settings pin the query result location to a specific prefix, cap query bytes scanned per query (a $-bounded setting to prevent runaway queries), and set a default engine version.

Partitioning is the single biggest cost lever: a partitioned table scanned by date + account + region reads MBs instead of TBs for a typical query.

Governance on the logging account. SCP on the organisation:

  • Deny s3:DeleteObject on arn:aws:s3:::org-cloudtrail/* from anywhere except the LogRetentionAdmin role in the logging account.
  • Deny s3:PutBucketPolicy on those buckets from anywhere except the platform-bootstrap role.
  • Deny cloudtrail:DeleteTrail and cloudtrail:StopLogging from any principal in any member account.

This closes the loophole where a compromised source-account principal could disable its own logging. The central store is the source of truth; the source account can’t silence it.

A worked audit query

Auditor question: “Who accessed secrets/payroll/production-db-password in the last 90 days?”

  1. Open Athena in the logging account, workgroup AuditReadOnly.
  2. Query:
    SELECT eventtime, useridentity.arn, eventsource, eventname, sourceipaddress
    FROM cloudtrail_logs
    WHERE eventname = 'GetSecretValue'
      AND requestparameters LIKE '%payroll/production-db-password%'
      AND year = '2028' AND month BETWEEN '01' AND '03'
    ORDER BY eventtime DESC;
    
  3. Athena scans only the partitions in that date range across all source accounts (partitioning by year/month/day + account is what makes this efficient).
  4. Results appear in under a minute. One table, cross-account, auditable.

Without the organisation trail and the central bucket, this would be 20 queries, 20 result sets, manual collation.

A worked SOC alert

GuardDuty in payments-prod fires a finding: UnauthorizedAccess:IAMUser/RootCredentialUsage.

  1. Finding syncs to Security Hub in the logging account within a minute.
  2. EventBridge rule on aws.securityhub, detail-type Security Hub Findings - Imported, severity HIGH or CRITICAL, matches.
  3. Rule target: SNS topic security-alerts.
  4. SNS fans out: PagerDuty integration for immediate paging; Slack webhook for channel notification; Lambda for automated response (disable the root credential if possible, revoke sessions).
  5. On-call sees the PagerDuty alert within two minutes of the finding.

Without central aggregation, each account’s GuardDuty would need its own EventBridge rule + SNS topic + IR response; the fan-out would be 20 identical stacks to maintain.

What’s worth remembering

  1. Pick aggregation per log source. CloudTrail has a native org trail; Flow Logs configure per-VPC; CWL uses subscription destinations; Security Hub aggregates findings. One tool for all logs forces compromises.
  2. A dedicated logging account is the blast-radius boundary. No workloads there; only log destinations, query tooling, and the SOC’s day-to-day. SCPs and bucket policies prevent source accounts from deleting their own logs.
  3. S3 + Athena for cold and large. Flow Logs, CloudTrail, S3 access logs, all large, all rarely queried. Partitioned S3 buckets queried by Athena is the cheapest path.
  4. CloudWatch Logs + OpenSearch for hot and fresh. Application logs, near-real-time alerting, investigator dashboards. Stream CWL through Firehose to both destinations; pay for the hot tier at the right retention.
  5. Security Hub for findings, not raw logs. It aggregates GuardDuty, Inspector, Macie, Config. Don’t try to make it your general log store, it isn’t.
  6. Partitioning is the single biggest cost lever. Queries that scan partitions read MBs; queries that scan the whole bucket read TBs. Design tables with date + account + region partitions from day one.
  7. Lifecycle policies are mandatory. Without them, 40TB a month of flow logs grows unbounded. Standard -> IA -> Glacier -> Deep Archive, with expiry matching regulatory retention.
  8. SCPs protect the aggregation pipeline. Deny CloudTrail disabling, bucket deletion, log-group removal across the organisation. The pipeline must be tamper-evident.

Every log in one place, organised by what each log is for: audit queries against cold S3, SOC dashboards against hot OpenSearch, alerting against Security Hub. The auditor’s query works, the SOC’s pager works, and nothing is paid for twice.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.