How to Add Sidecars to an ECS Task

July 19, 2027 · 25 min read

Developer · DVA-C02 · part of The Exam Room

The situation

A payments team runs a Node.js service on ECS Fargate. Today the task definition has one container: the app. The application team wants to move three concerns out of the app image and into sidecars:

  1. A FireLens log router that forwards structured logs from stdout to Datadog without the app having to know anything about Datadog.
  2. An Envoy proxy (App Mesh sidecar) handling outbound calls, giving the mesh mTLS and observability without every service writing their own.
  3. A parameters-and-secrets fetcher that runs at task startup, reads several Parameter Store values, and writes a .env file to a shared volume that the app reads on its own startup.

The team wants a single task definition with four containers: app, log router, Envoy, secrets-init. What’s in play is how to sequence them, how to share files and loopback between them, and how to make sure the task stops cleanly when the app stops.

What actually matters

Before reaching for the task-definition reference, it’s worth thinking about what a well-behaved multi-container task actually needs to guarantee.

The first thing is locality. Sidecars only earn their name if they share a fate with the app: same network namespace, same node, same lifecycle. If the Envoy proxy lives on a different host from the app it’s supposedly gatewaying, the mTLS story collapses into “another hop over the network.” The question isn’t “can these run together” but “what’s the smallest unit of co-scheduling we can express, and does ECS give us a primitive for it?”

The second is start-order determinism. A bag of containers that all start simultaneously is fine for independent processes, but these three sidecars have real dependencies. The app shouldn’t start until secrets-init has dropped its .env on disk; it shouldn’t start issuing outbound requests until Envoy is actually listening, not just launched. The wiring has to distinguish between “started” and “ready”, because the second is what the app actually depends on.

The third is stop behaviour and blast radius. When the app exits, the whole task should stop, otherwise the mesh and logger sidecars become a drifting resource leak, holding task slots for an app that isn’t there. Conversely, an init container that exits successfully shouldn’t drag the task down with it; that’s its job. The primitive that controls which containers are task-defining and which are transient matters as much as the one that controls start order.

The fourth is filesystem and loopback sharing. Secrets-init writes; app reads. The app dials 127.0.0.1 and expects Envoy on the other end. Both work on ECS only if the primitives for shared volumes and shared network namespace exist and are trivial to declare. If either one required EFS or service discovery, the pattern wouldn’t feel like a sidecar pattern at all.

The fifth is observability ownership. The app’s logs need to leave the app’s container without the app caring where they go. The log-router sidecar should take the stream. The logger’s own logs, meanwhile, need to go somewhere else, the sidecar can’t be its own recipient, so there has to be a separate log driver for the router itself.

The sixth, quieter, is IAM scope. The task gets a role, not per-container identities. Sidecars add API calls (parameter reads, mesh control plane, tracing) that the app itself would never need; the task role becomes the union of every container’s AWS needs. Keeping that union honest is harder than writing one container’s IAM policy.

What we’ll filter on

Distilling that exploration into filters we can score each primitive against:

  1. Controls start order. Does the primitive decide when a container can start, not just whether it’s in the task?
  2. Controls stop behaviour. Does it decide whether a container’s exit stops the task, or is transient?
  3. Shares filesystem. Does it let two containers see the same bytes?
  4. Shares loopback/network. Does it put containers in the same network namespace?
  5. Routes logs. Does it decide where container stdout ends up?
  6. Declares IAM. Does it bind AWS permissions to containers?

The sidecar-primitive landscape

Six fields on the task definition hold the pattern together.

  1. containerDefinitions. A list of container configurations. Each container has an image, name, resource reservations, environment variables, port mappings, mount points, log configuration, and a handful of other fields. All containers in a task share the task’s network namespace (in awsvpc mode, which is the Fargate default) and can access shared volumes. This is the “same pod” primitive: locality, loopback, and shared lifecycle come from being listed under the same task definition.

  2. essential. Per container. When essential: true, the task stops if that container stops. The app is essential; most sidecars are also essential for observability reasons (a task running without its log router is a task running blind); the secrets-init is not essential because it’s meant to exit after its work. This is the primitive that separates init containers (run once, then go away) from long-running sidecars (must live as long as the task).

  3. dependsOn. An array of conditions per container: { containerName: "secrets-init", condition: "SUCCESS" } means “start this container only after secrets-init has exited with status 0.” Supported conditions: START (started, not necessarily healthy), COMPLETE (exited with any status), SUCCESS (exited with 0), HEALTHY (passes its health check). Start-order comes from this and nothing else.

  4. healthCheck. Per container; defines a command whose exit status determines the container’s health. When combined with dependsOn: HEALTHY, start ordering can wait for a sidecar to be ready (Envoy listening, for example), not just started. Readiness, as distinct from liveness, lives here.

  5. volumes + mountPoints. Volumes are declared at the task level; containers mount them via mountPoints. Two types matter for sidecar patterns: type: "efs" for EFS-backed shared storage, and host or unnamed volumes for task-local ephemeral shared filesystem between containers. This is the filesystem-sharing primitive: declare once at task level, mount from both producer and consumer.

  6. logConfiguration. Per container. For FireLens, the log driver is awsfirelens and logs from stdout/stderr are routed through the FireLens container. For everything else, awslogs ships logs to CloudWatch. The router and the routed need different drivers. FireLens can’t ship its own logs via itself.

Side by side

Primitive Start order Stop behaviour Shares FS Shares loopback Routes logs Declares IAM
containerDefinitions ✓ (task-wide)
essential
dependsOn
healthCheck + dependsOn: HEALTHY ✓ (ready, not just started)
volumes + mountPoints
logConfiguration (awsfirelens)
taskRoleArn (task-level) ✓ (shared across containers)

Reading the table by concern rather than by primitive:

  • Start order is dependsOn with the right condition: SUCCESS for init containers, HEALTHY for long-running sidecars the app depends on.
  • Stop behaviour is essential, set deliberately per container. The init container is the one that gets false; everything else gets true.
  • Filesystem handoff is one task-level unnamed volume and two mountPoints. No EFS unless the data needs to outlive the task.
  • Network handoff is free: all containers in awsvpc share the task ENI, so 127.0.0.1 works across containers without configuration.
  • Log routing is awsfirelens on the source container and firelensConfiguration on the router; the router itself logs via awslogs.
  • IAM is one taskRoleArn that must be the union of every container’s needs, the place where least-privilege discipline is most easily lost.

The picks, wired up

Init work run once, then exit Proxy long-running, needs ready-gate Log routing long-running, receives stdout App waits for everything else secrets-init reads SSM at startup writes /shared/.env must exit cleanly envoy (App Mesh) mTLS to other services listens on localhost must be READY before app log-router Fluent Bit, FireLens receives app stdout forwards to Datadog app (Node.js) reads /shared/.env dials 127.0.0.1:15000 serves traffic on :3000 essential: false essential: true essential: true essential: true dependsOn: none starts at t=0 dependsOn: none healthCheck gates readiness dependsOn: none starts at t=0 secrets-init: SUCCESS envoy: HEALTHY mounts: shared (rw) log driver: awslogs mounts: none log driver: awslogs mounts: none log driver: awslogs (self) mounts: shared (ro) log driver: awsfirelens init container exits at t≈3s task stays up (essential: false) app unblocks on dependsOn SUCCESS long sidecar ready at t≈12s /ready returns LIVE healthCheck passes app unblocks on dependsOn HEALTHY FireLens receiver live from t=0 firelensConfiguration type: fluentbit app stdout arrives via awsfirelens driver main container starts at t≈12s .env read from shared outbound via Envoy stdout to log-router exit stops whole task
Four containers, each flowing through the same three gates, essential, dependsOn, volume/log driver, and landing on a role in the task.

The four-container task, in depth

secrets-init (init container). Image: a small container with the AWS CLI. Purpose: read Parameter Store values at startup and write them to a shared volume as a .env file. essential: false because it’s expected to exit. No dependsOn, it starts first.

{
  "name": "secrets-init",
  "image": "acme/secrets-init:1.0",
  "essential": false,
  "command": [
    "sh", "-c",
    "aws ssm get-parameters-by-path --path /payments/prod/ --with-decryption --query 'Parameters[*].[Name,Value]' --output text | awk '{n=$1; sub(/.*\\//, \"\", n); print n \"=\" $2}' > /shared/.env"
  ],
  "mountPoints": [{ "sourceVolume": "shared", "containerPath": "/shared" }]
}

envoy (sidecar). App Mesh’s Envoy image. Essential. Has a health check on the admin port so dependsOn: HEALTHY on the app can wait for readiness, not just startup.

{
  "name": "envoy",
  "image": "public.ecr.aws/appmesh/aws-appmesh-envoy:v1.29.x",
  "essential": true,
  "healthCheck": {
    "command": ["CMD-SHELL", "curl -s http://localhost:9901/ready | grep -q LIVE || exit 1"],
    "interval": 5, "timeout": 2, "retries": 3, "startPeriod": 10
  },
  "portMappings": [{ "containerPort": 15000, "protocol": "tcp" }],
  "environment": [
    { "name": "APPMESH_VIRTUAL_NODE_NAME", "value": "mesh/payments/virtualNode/receipts-vn" },
    { "name": "ENVOY_LOG_LEVEL", "value": "info" }
  ]
}

app (main container). The Node.js service. Starts after secrets-init has completed and envoy is healthy. Reads /shared/.env on startup. Dials outbound requests at 127.0.0.1:15000 so Envoy handles them.

{
  "name": "app",
  "image": "acme/receipts:prod-2f1a8c",
  "essential": true,
  "dependsOn": [
    { "containerName": "secrets-init", "condition": "SUCCESS" },
    { "containerName": "envoy",        "condition": "HEALTHY" }
  ],
  "mountPoints": [{ "sourceVolume": "shared", "containerPath": "/shared", "readOnly": true }],
  "portMappings": [{ "containerPort": 3000, "protocol": "tcp" }],
  "logConfiguration": {
    "logDriver": "awsfirelens",
    "options": {
      "Name": "datadog",
      "Host": "http-intake.logs.datadoghq.com",
      "TLS": "on",
      "dd_service": "receipts",
      "dd_source": "nodejs"
    },
    "secretOptions": [
      { "name": "apikey", "valueFrom": "arn:aws:secretsmanager:...:secret:datadog/api-key-abcdef" }
    ]
  }
}

log-router (FireLens sidecar). The FireLens-enabled Fluent Bit image. Declared with firelensConfiguration: { type: "fluentbit" }. Receives the app’s stdout through the awsfirelens log driver wired to this container. Essential so the task stops cleanly if it dies.

{
  "name": "log-router",
  "image": "public.ecr.aws/aws-observability/aws-for-fluent-bit:stable",
  "essential": true,
  "firelensConfiguration": { "type": "fluentbit", "options": { "enable-ecs-log-metadata": "true" }},
  "logConfiguration": {
    "logDriver": "awslogs",
    "options": {
      "awslogs-group": "/aws/ecs/receipts/log-router",
      "awslogs-region": "eu-west-1",
      "awslogs-stream-prefix": "firelens"
    }
  }
}

Volumes. One task-level volume shared between secrets-init and app:

"volumes": [
  { "name": "shared" }
]

No host parameter means it’s a task-local volume; it exists for the life of the task and is cleaned up when the task stops.

Why essential: false on the init container matters. With essential: true, a container exiting, even with status 0, stops the task. An essential: false container exiting with status 0 lets the task continue; with a non-zero exit code, the task fails (Fargate retries based on the service’s deployment policy). This is exactly the behaviour for a setup-and-exit init container. Every long-running sidecar, in contrast, should be essential: true. If Envoy or the log router dies mid-task, the task should stop so the service scheduler can replace it; a running app with a dead log router is a task running blind, worse than no task at all.

IAM per task, not per container. ECS doesn’t give each container a separate IAM identity. All containers in the task share the task role. The definition has two roles: executionRoleArn is the role ECS uses to pull images, fetch secrets referenced by secretOptions, and ship logs via awslogs, small, platform-owned. taskRoleArn is the role containers in the task assume for AWS API calls, and it’s where the secrets-init’s ssm:GetParametersByPath permission sits, along with Envoy’s App Mesh permissions and any app-level grants. Because all containers share the task role, its policies must be the union of what every container needs. Least-privilege across a task means trimming that union honestly, not granting broader privileges just because the image is shared.

Corner cases worth naming. Some dependsOn conditions (notably HEALTHY and SUCCESS) require a relatively modern Fargate platform version; LATEST is the right default. awsfirelens requires FireLens on EC2 or Fargate 1.3.0+; older platform versions won’t capture stdout correctly. App Mesh needs proxyConfiguration on the task definition declaring Envoy and which ports get intercepted; without it, outbound traffic bypasses Envoy and the mesh sees nothing. Task-local volumes are capped by Fargate’s ephemeral storage (21 GiB default; up to 200 GiB configurable), fine for .env files, size the task up for larger caches. And for plain secret fetching that doesn’t need composition, prefer the secrets/secretOptions fields (which ECS resolves at task start using the execution role) over a shell-scripted init container.

A worked example: the lifecycle over time

Task lifecycle, four containers, one task t=0s t=5s t=10s t=20s ... task running ... stop secrets-init essential: false reads SSM → /shared/.env starts, runs ~3s exit 0 envoy essential: true health: /ready → LIVE starting · initialising HEALTHY · proxying mesh traffic stops when task stops log-router essential: true FireLens / Fluent Bit forwards app stdout to Datadog via awsfirelens runs from task start to task stop app essential: true dependsOn: secrets-init SUCCESS, envoy HEALTHY starts when dependsOn conditions met reads /shared/.env, serves traffic dependsOn SUCCESS dependsOn HEALTHY shared volume (secrets-init w, app r) Key behaviours Start order is controlled by dependsOn: SUCCESS (exited 0) for init containers, HEALTHY (passed check) for long-running sidecars. Essential: false on secrets-init is what lets it exit cleanly; essential: true on app, envoy, log-router means any of them dying stops the whole task. Shared volume is task-local; declared once at task level, mounted in both containers. No extra infrastructure. awsfirelens log driver on app redirects stdout into the log-router container, which forwards to Datadog.
Four containers, one task. Init exits; Envoy and log-router run for the task's life; the app starts once its dependencies are satisfied.

What’s worth remembering

  1. A task can have many containers. They share network namespace, task IAM role, and task-local volumes.
  2. essential is the stop signal. Essential true: container dying stops the task. Essential false: suited to init containers that exit after setup.
  3. dependsOn orders starts. SUCCESS for init containers; HEALTHY for long-running sidecars that need to be ready before the app.
  4. healthCheck plus dependsOn: HEALTHY gives real readiness gating. A started-but-not-ready container isn’t enough for the app to depend on.
  5. Shared volumes are task-local by default. Declare at task level, mount in multiple containers. Useful for init → app handoff.
  6. FireLens redirects stdout to a log-router sidecar. logDriver: "awsfirelens" on the source container; firelensConfiguration on the sidecar.
  7. Task role vs execution role. Execution role for ECS platform actions (image pull, secrets, log shipping); task role for application AWS calls.
  8. App Mesh requires proxyConfiguration. Declares which ports Envoy intercepts; without it, the mesh sidecar is a noop.
  9. Secrets can be injected via secrets/secretOptions. Reference Secrets Manager or Parameter Store ARNs; ECS resolves at task start using the execution role.
  10. Restart policy is per service, not per task. When the task stops (any essential container dying), the service scheduler launches a replacement based on the deployment config.

Init to prepare the filesystem, Envoy to handle the network, log router to ship the logs, app to do the work, four containers in one task, glued together with dependsOn, essential, shared volumes, and FireLens. The work isn’t adding sidecars one by one, it’s knowing which primitives they all hang on, so the next one is three lines of JSON.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.