The situation
The business operates five fulfilment centres. Each has:
- Robotics running a warehouse management system that needs sub-10ms latency between the controller and the fleet.
- Intermittent WAN connectivity, rural fibre with monthly outages lasting hours.
- Regulatory requirements that sensitive inventory data for certain products stay physically in-country.
- A small site-ops team who patches Windows servers, juggles SANs, and wishes they didn’t.
Today each centre runs a local VMware cluster with around 20 VMs and a SAN. The site team is stretched; hiring more is hard; the WAN outages still cause robot downtime when the WMS’s database goes unreachable.
Options evaluated:
- AWS Outposts 42U rack. AWS-managed hardware running AWS services (EC2, EBS, RDS on Outposts, S3 on Outposts, EKS, ECS) physically in the warehouse.
- AWS Outposts Servers (1U/2U), smaller form factor, same idea, for sites that don’t need a full rack.
- AWS Local Zones, a separate post handles this, but worth mentioning: AWS’s Region-adjacent infrastructure (in a nearby city) isn’t the same as on-premises.
- Snow Family (Snowball Edge Compute Optimized), ruggedised for edge, but not a long-term sit-in-a-rack service.
- VMware on AWS + stretched cluster, keeps the on-prem VMware, extends to AWS; still needs the on-prem hardware.
- Status quo: local VMware.
What actually matters
The core trade in edge computing is locality in exchange for operational sameness. Running workloads at the edge gives latency and WAN-independence, but usually means a different operational model, different patching, different APIs, different deployment tools. The interesting options are the ones that put edge hardware on-site while keeping the cloud API, control plane, and deployment story.
The first thing to ask is: what actually needs to be local? The robotics controller: yes, the 10ms latency demands it. The WMS database: probably, because the controller talks to it constantly. The analytics warehouse: absolutely not, that runs in the cloud, pulling data asynchronously. Identifying what is latency-bound is the first job; anything else should stay in-cloud.
The second is: what happens when the WAN is down? An edge platform earns its keep when the link to the cloud drops. Data-plane operations (running workloads, serving local clients) need to keep working without parent-Region connectivity; control-plane operations (launching new instances, changing config) may legitimately pause. The question is which operations sit on which side of that line, and whether the workload survives the outages the WAN actually has.
The third is: what services are available locally? No edge platform offers the full cloud catalogue. The pieces that aren’t supported locally either get hosted back in the parent Region (and break during WAN outages) or get replaced with something else. Knowing the gap before signing matters more than admiring the supported list.
The fourth is: the operational model. Customer-managed hardware is one shape, maximum flexibility, maximum effort. Vendor-managed hardware on-site is another, the vendor ships, installs, patches, monitors; the customer provides rack space, power, cooling, and the uplink. Picking the right end of that spectrum is mostly about how stretched the site-ops team already is.
The fifth is: cost. Edge hardware sold as a subscription with a multi-year term is expensive in raw terms. The maths is: compare against the total cost of running the status quo (hardware, licences, maintenance, DC staff), adjust for operational simplicity (fewer people, faster changes), and see if it breaks even. For sites with a full-time IT role it usually does; for small sites it can go either way.
What we’ll filter on
- Local persistence when WAN fails, does the workload keep running?
- Service breadth, which AWS services are available?
- Operational model, customer-managed, AWS-managed, hybrid?
- Latency target, what’s the achievable latency from on-site clients?
- Cost model, capex, opex, subscription?
The edge landscape
-
AWS Outposts Rack (42U). Full rack, AWS-managed hardware, subset of AWS services, parent Region as the control plane. Data plane survives WAN outages; control plane requires parent-Region connectivity. 3-year minimum term.
-
AWS Outposts Servers (1U/2U). Smaller form factor, single server instead of a rack. Same services, reduced capacity. Good for sites that don’t need a rack’s worth of compute.
-
AWS Snowball Edge Compute Optimized. Ruggedised compute for field deployments (maritime, oil rigs, field trucks). Not a permanent-install product in the same sense as Outposts. Good for short-term or mobile edge.
-
VMware Cloud on AWS with stretched cluster. Extends on-prem vSphere to AWS. Helps with DR and cloud bursting; doesn’t solve “workload runs locally with cloud API model” because the on-prem side is still VMware.
-
Self-managed Kubernetes on local hardware with ECS Anywhere / EKS Anywhere. EKS Anywhere is a Kubernetes distribution that runs on your hardware and can be managed consistently with cloud EKS. ECS Anywhere runs ECS tasks on your external hardware. You provide the hardware; AWS provides the control plane software.
-
Status quo (on-prem VMware or bare metal). Keep what you have. Accept that patching, SAN management, and resizing are manual and team-owned.
Side by side
| Option | Local survives WAN | Service breadth | Ops model | Latency | Cost |
|---|---|---|---|---|---|
| Outposts Rack | Data plane ✓ | EC2, EBS, S3, RDS, ECS/EKS, ALB | AWS-managed | ~1ms in-rack | Subscription 3y |
| Outposts Servers | Same | Smaller capacity | AWS-managed | ~1ms in-rack | Subscription 3y |
| Snowball Edge | ✓ | EC2, S3 subset | Customer-managed | Local | Per-job or per-year |
| VMware Cloud stretched | Local via vSphere | VMware services | VMware + AWS | LAN to on-prem | Both hardware + VMC |
| ECS/EKS Anywhere | ✓ (own hardware) | Containers | Customer hardware | Local | Licence + hardware |
| Status quo | ✓ | Whatever you run | Fully customer | Local | Hardware + staff |
For permanent warehouse sites with sub-10ms latency needs and a desire to stop patching Windows Servers, Outposts is the fit. The rest of the post goes deep on Outposts mechanics.
The Outposts architecture
The picks in depth
The Outposts rack. Ordered from AWS with a compute+storage configuration sized for the site’s workloads. AWS ships; AWS installs; AWS owns the hardware. The customer provides the rack space (42U worth, with appropriate floor loading), power (redundant, typically 3-phase), cooling, and a network uplink. Delivery takes weeks; installation by AWS personnel is a day-ish.
Post-install, the rack appears in the AWS console as an Outpost resource with an ARN. Subnets within the VPC can be created on the Outpost (specifying the Outpost ARN on the subnet); those subnets’ instances land on the local hardware.
EC2 on Outposts. Same API: aws ec2 run-instances --subnet-id subnet-outpost --instance-type m5.2xlarge. The Outposts has a fixed capacity, you ordered, say, 10 slots of m5.2xlarge, which means you can run at most that many concurrently. If all 10 are in use, a new run-instances call fails. Auto Scaling still works but within the capacity envelope.
EBS on Outposts. Storage on the rack; volumes attach to Outposts instances only. Snapshots are local by default (cheap, fast) but can be copied to parent-Region S3 for DR.
S3 on Outposts. Limited-capacity S3; the API is mostly the same (with --outpost-id parameter). Supports bucket replication to parent-Region S3, which is the DR path for data that must not be lost with the rack.
RDS on Outposts. PostgreSQL, MySQL, and a handful of others. Single-AZ only (there’s only one AZ on an Outposts rack). Automated backups go to parent-Region S3 through the service link. For multi-AZ durability, run two Outposts racks in the same building (expensive) or accept single-AZ with cloud backups as the DR.
Service Link. An outbound TLS connection from the Outpost to the parent Region. Typically routed via Direct Connect or VPN; can also be over the public internet for smaller deployments. Carries control-plane API calls (EC2 provisioning, CloudWatch metrics, config changes), instance metadata, and data-plane traffic between Outposts and Region (for services like RDS backups, S3 replication). The service link is one of the rack’s top operational concerns, monitoring it, ensuring it has enough bandwidth, having a fallback ISP for the warehouse.
Local Gateway (LGW). The networking piece that lets Outposts instances talk to the on-prem LAN directly, bypassing the service link. On-prem robotics controllers reach Outposts instances via the LGW, over the warehouse LAN, with LAN-grade latency (~1ms). The LGW peers with the warehouse’s core router via BGP, announcing the Outposts subnet’s CIDR. When the service link is down, LGW-mediated traffic keeps flowing, robots keep running.
What breaks if the WAN fails. The service link is down. Impact:
- Existing EC2 instances keep running.
- EBS volumes remain attached and readable/writable.
- LGW traffic to/from on-prem continues.
- Cannot launch new instances (control plane paused).
- Cannot change security groups, create new IAM resources, etc.
- CloudWatch metrics buffer locally and flush when link returns.
- RDS keeps serving reads and writes; backups queue locally and flush when link returns.
For the warehouse, this is exactly the failure-mode goal: robots keep running during WAN outages; the control-plane pause isn’t customer-facing.
Deployment model. Same Infrastructure-as-Code as the cloud. CloudFormation and Terraform have Outposts-aware resources. A CI pipeline deploys WMS updates to Outposts-subnet ECS tasks identically to how it deploys to Region-subnet ECS tasks, the only differences are the subnet ID and the Auto Scaling capacity-provider config.
A worked site rollout
Manchester North pilot site:
- Month 1: design. Capacity sizing (how many m5s, how much EBS, how much local S3), network plan (LGW peering, service link path via Direct Connect to London, backup ISP for service link), power/cooling review with the warehouse operations team.
- Month 2: order placed with AWS. 42U rack, specific instance-type slots, specific EBS TiB.
- Month 3-4: AWS ships, installs, commissions. Customer-side: pre-provision power, cooling, network drops.
- Month 5: Outpost online. Cut over WMS workload from local VMware to Outposts-hosted EC2. Parallel run for two weeks.
- Month 6: decommission old VMware; reclaim rack space.
Two years later: four more centres follow the pattern. Site teams stop patching servers; AWS deals with hardware failures by shipping replacement components. The uniform pipeline from cloud to edge means new features ship to all five sites in one deploy.
What’s worth remembering
- Outposts is AWS-managed hardware on your floor. AWS ships, installs, patches, replaces. Customer provides rack space, power, cooling, network uplink. The operational-sameness story is the product.
- Latency is the reason, not cost. Outposts is expensive. Use it when sub-10ms latency or data-locality requires on-prem compute with the cloud API model.
- LGW keeps you alive when the WAN fails. Data-plane traffic via LGW survives service-link outages. Control-plane pauses; nothing already running stops.
- Subset of AWS services. EC2, EBS, S3 (limited), RDS, ECS, EKS, ALB. No Lambda on Outposts, no DynamoDB, no most of the managed services. Design workloads around what’s available.
- Same APIs, same IaC.
aws ec2 run-instances --subnet subnet-outpost. CloudFormation understands Outposts. The deployment pipeline doesn’t fork for edge. - Single-rack = single-AZ. No multi-AZ on a single Outpost. For HA, multiple Outposts in the same site, or failover to parent-Region for degraded service.
- Three-year subscription, big fixed cost. Capex-shaped commitment. Compare against the total cost of running on-prem alternatives including staff and downtime.
- Outposts Servers for smaller sites. 1U or 2U form factor for sites that don’t need a rack. Same model, smaller scale, smaller bill.
AWS on the warehouse floor, without the warehouse team becoming AWS experts. The robots keep running when the fibre drops; the WMS patches itself overnight as part of routine cloud deploys; the site team focuses on operations, not infrastructure. The million-pound rack earns its keep by doing the job the cloud can’t, being in the same building as the machines it’s controlling.