Archive

14 Mar 2029 · 13 min read

Exam Room · Networking

Running an IPv6-Only VPC in Practice (preview)

IPv4 space has become a cost line item. Public IPs bill by the hour. RFC1918 is exhausted in the M&A portfolio. The question of whether to run an IPv6-only VPC has moved from 'interesting experiment' to 'concrete ask.' The dataplane supports it. The services mostly support it. The edge cases are where the work lives: legacy clients that only speak IPv4, third-party services that haven't caught up, instance-types that do or don't have IPv6. Understanding what an IPv6-only VPC looks like in practice is the difference between a clean launch and a post-migration support queue.

Read article 12 Mar 2029 · 12 min read

Exam Room · Networking

Choosing the Layer for Egress Filtering (preview)

A requirement to constrain outbound traffic to an allowlist of destinations. The interesting choice isn't which AWS service to buy, it's which OSI layer to stop the traffic at. The lower we inspect, the less we see and the less we can break; the higher we inspect, the more granular the policy and the more we have to terminate. Which layer matches the policy decides how much we see, how much we can control, and what we pay in complexity.

Read article 07 Mar 2029 · 15 min read

Exam Room · Networking

Designing Centralised Packet Inspection With GWLB (preview)

An organisation has decided every packet leaving or entering a VPC must pass through a vendor firewall for deep inspection. Running firewall appliances per VPC is a licensing nightmare; having each VPC route traffic through a central firewall VPC is the plan. AWS has a primitive for centralised inspection; what matters is what that primitive actually has to deliver for deep-packet inspection to work in practice: the appliance must see the real 5-tuple (no NAT in the way), both directions of a flow must land on the same appliance, the pool must scale horizontally, multiple VPCs must share one fleet, and return paths have to stay symmetric or stateful rules will silently drop legitimate traffic.

Read article 05 Mar 2029 · 12 min read

Exam Room · Networking

Picking the Right Tool to Debug VPC Reachability (preview)

A ticket comes in: 'the app in the prod VPC can't reach the RDS instance in the data VPC, worked yesterday, nothing changed.' Nothing ever changed. AWS ships more than one tool that answers 'can this thing reach that thing', and the tools answer subtly different questions. Knowing which one matches the question on the ticket is the difference between a one-minute answer and an afternoon of tcpdump.

Read article 28 Feb 2029 · 13 min read

Exam Room · Networking

How to Carry Market Data Multicast Into AWS (preview)

A market data pipeline that ingests multicast feeds from an exchange's colocation facility. On-prem, this is routine: IGMP on the switch, multicast routing in the core, consumers join groups and receive. In AWS, multicast has historically been absent from the VPC fabric. There are a handful of ways to get multicast working in the cloud, ranging from rewriting every consumer to keeping the protocol exactly as it is on-prem, and the right one depends on whether the network can carry multicast natively or whether we build the replicator ourselves.

Read article 26 Feb 2029 · 14 min read

Exam Room · Networking

Plugging SD-WAN Into a Transit Gateway (preview)

An SD-WAN vendor's appliance running in the VPC, carrying branch-office traffic. Today it's stitched to the Transit Gateway via IPsec over a VPN attachment, which means encrypting twice and accepting a throughput ceiling AWS imposes on VPN. The TGW has more than one way to ingest routes from a customer-owned appliance, and which attachment type fits depends on whether traffic between the appliance and the TGW is already inside AWS's trust boundary.

Read article 21 Feb 2029 · 14 min read

Exam Room · Networking

Picking CloudFront or Global Accelerator at the Edge (preview)

A global application whose users complain about latency from Sydney, intermittent connection failures from Mumbai, and a jittery experience everywhere that isn't Europe. The origin is in eu-west-1. AWS has more than one edge service that can put a user closer to the origin, and they do very different jobs. Picking between them is a question of protocol, cacheability, and whether the workload needs a static IP that customer firewalls can allowlist.

Read article 19 Feb 2029 · 13 min read

Exam Room · Networking

When to Use VPC Lattice for Service-to-Service Traffic (preview)

Dozens of microservices across half a dozen VPCs, each one exposed via its own NLB or ALB, each one reachable via PrivateLink endpoints in every consumer VPC, each one tied into a per-service IAM policy for who can call it. The network team is managing load balancers; the platform team is managing endpoints; nobody owns the overall shape. AWS has a primitive that bundles service discovery, IAM-based authorisation, and cross-VPC routing into one thing; whether that consolidation earns its place depends on how many services we're actually running.

Read article 14 Feb 2029 · 13 min read

Exam Room · Networking

Steering Traffic With BGP Attributes (preview)

Two Direct Connect links, one primary and one backup, and a business requirement that says 'backup is backup, outbound traffic goes out the primary, return traffic comes back the primary, and everything fails over cleanly.' BGP has five attributes that decide which route wins, and only two of them are levers we can pull with AWS. Understanding local preference, AS-path prepending, MED, and AWS's Direct Connect BGP community tags is what gets you there.

Read article 12 Feb 2029 · 14 min read

Exam Room · Networking

How to Allowlist Egress by Hostname (preview)

Compliance has a new requirement: egress from every production workload must be allowlisted by destination hostname, not IP address, and every denied connection must leave an audit trail. Security groups and NACLs are the wrong abstraction; they work on IPs and ports. The interesting question is which layer of the egress path actually sees a hostname, and which AWS service does the matching with the right blend of managed-ness and audit depth.

Read article 07 Feb 2029 · 15 min read

Exam Room · Networking

Setting Up Hybrid DNS Resolution Both Ways (preview)

A hybrid estate where Windows-based corporate services still live on-prem under corp.example.com, and new AWS workloads resolve cloud.example.com via Route 53 Private Hosted Zones. An EC2 instance needs to look up both. Today, it works for cloud names and silently fails for on-prem ones. Hybrid DNS is two directions of resolution. AWS to on-prem and on-prem to AWS, and the work is figuring out where the forwarders live, who owns the configuration, and how the split-horizon edge cases behave.

Read article 05 Feb 2029 · 14 min read

Exam Room · Networking

Expose One Service, Not the Whole VPC (preview)

A payments service in one VPC. A dozen consumer teams across half the company who need to call it. The first instinct is to attach everything to a Transit Gateway and let the route tables sort it out. The better question is whether the consumers need a whole network path into the producer's VPC, or just a TCP socket against the one service. Which abstraction is correct depends on whether we're running a service or sharing a subnet.

Read article 31 Jan 2029 · 13 min read

Exam Room · Networking

Choosing Between Cloud WAN and a Transit Gateway Mesh (preview)

A company with 30 AWS Regions, a handful of branch offices on every continent, and a Transit Gateway tangle that nobody can draw on a whiteboard anymore. Peering meshes across Regions, route tables copy-pasted between accounts, and the network team has finally said 'no more' to any new region request. The interesting question isn't whether to keep scaling the mesh, it's whether the operational debt is now large enough that a different abstraction earns its place, and what we give up to switch.

Read article 29 Jan 2029 · 15 min read

Exam Room · Networking

Designing Direct Connect to Survive a Fibre Cut (preview)

A single 10 Gbps Direct Connect link carrying every byte between the on-prem data centre and AWS. Finance is happy, the network team is nervous, and the business is about to depend on this path for a quarterly close that cannot slip. Four resilience shapes exist in the Direct Connect catalogue, each with a different blast radius, a different price tag, and a different SLA. Every shape is most resilient at something; the work is figuring out which failure modes we're actually buying insurance against.

Read article 24 Jan 2029 · 16 min read

Exam Room · Networking

How to Hit Four Nines on Direct Connect (preview)

Five gigabits of replication traffic over a single 10 Gbps Direct Connect, a compliance team asking for four nines, and five plausible topologies on the table. The one that lives up to the paperwork isn't the cheapest and isn't the most expensive, it's the one where a whole building can vanish and the traffic keeps moving. Picking it means knowing what each Direct Connect resilience model survives, and which tier of SLA each one actually signs you up for.

Read article 22 Jan 2029 · 16 min read

Exam Room · Networking

Cloud WAN for Multi-Region Transit Gateway Sprawl (preview)

Eight Transit Gateways, five regions, two hundred attachments, and three engineers spending half a release window arguing about route table entries. The way out isn't a cleverer Terraform module, it's a higher layer that takes declarative segment policy and materialises the routes itself. Finding the right layer means touring what AWS actually offers above raw TGW route tables.

Read article 03 Jan 2029 · 12 min read

Exam Room · Security

How to Externalise Authorisation With Verified Permissions (preview)

Cognito handles who is signed in. What they can do inside the app (edit this document, view that folder, approve that workflow) is a different question, usually answered by code scattered across services. Verified Permissions is the managed policy engine AWS built for exactly that: Cedar policies, hierarchical resources, batched evaluation, and a clean separation between authentication and authorisation.

Read article 20 Dec 2028 · 12 min read

Exam Room · Security

How to Scan EC2 and S3 With GuardDuty Malware Protection (preview)

GuardDuty sees the packets. Inspector sees the installed packages. Neither looks at what's actually on disk: whether a cryptominer has been written to /tmp, whether a reverse shell is sitting in /usr/local/bin. GuardDuty Malware Protection is the extra tier that takes a snapshot, spins up a dedicated scanner, and tells us what's in the filesystem. Knowing when it fires and what it can't see is the difference between a real defence and a tick-box.

Read article 18 Dec 2028 · 14 min read

Exam Room · Security

How to Encrypt a Card Number From the Edge to the Processor (preview)

TLS protects every field in a form in transit; it doesn't protect any of them at the origin. A credit-card number posted through the edge arrives at the ALB in plaintext, at the application in plaintext, in application logs and error traces and APM spans in plaintext. PCI says encrypt the sensitive fields; the harder work is deciding where in the request path the encryption should happen, which keys live where, and how to shrink the PCI scope without rewriting every service the request passes through.

Read article 13 Dec 2028 · 13 min read

Exam Room · Security

How to Rate-Limit GraphQL Operations Through WAF (preview)

Every GraphQL request is a POST to the same URL with the query inside the JSON body. WAF's pattern rules that trigger on URI path don't help; rate-based rules by endpoint don't distinguish a cheap query from an expensive one. The answer is JSON body inspection: WAF rule statements that parse the POST body, navigate into the query string, and pattern-match or rate-limit on what's actually happening at the GraphQL layer.

Read article 11 Dec 2028 · 12 min read

Exam Room · Security

Designing a Hierarchy in AWS Private CA (preview)

A root CA with a 20-year key, offline in an HSM, that signs two subordinate CAs: one for production workloads, one for developer environments. Each subordinate signs its own leaf certificates with short lifetimes. A managed private CA is a foregone conclusion; the alternative is the Windows server nobody can log into. The decisions that matter are how many tiers the hierarchy needs, what key types and templates each tier issues with, and what revocation model keeps a single compromise from costing us the whole PKI.

Read article 06 Dec 2028 · 13 min read

Exam Room · Security

Choosing Between KMS, CloudHSM, and KMS Custom Key Stores (preview)

KMS is the default answer for "we need a key." CloudHSM is the answer for "we need a key AND we need to be the only ones who can touch it, AND we need FIPS 140-3 Level 3, AND we need a PKCS#11 interface for that legacy Java client." The difference between them is less about capability and more about where the raw key material actually lives, who can get to it, and how much of the operational work is yours versus AWS's.

Read article 04 Dec 2028 · 15 min read

Exam Room · Security

How to Grant Cross-Account Access to Humans and Workloads (preview)

Forty accounts, two hundred engineers, and the perennial question: how do they get into the accounts they need and nothing else? IAM Identity Center permission sets and cross-account IAM roles both answer it, but in different shapes. One is a scalable, identity-centric pattern that works for humans; the other is the building block that's older, lower-level, and still necessary for programmatic access. Picking the correct one is less about which is better and more about what the caller is.

Read article 29 Nov 2028 · 15 min read

Exam Room · Security

Picking Between Network Firewall, WAF, Shield, and DNS Firewall (preview)

Network Firewall, WAF, and Shield sound like overlapping answers to the same question. They aren't. Each sits at a different layer of the network, each sees a different slice of traffic, each is good at a different kind of threat. Picking the correct one (or the correct combination) starts with what we're trying to protect and works outward to what the traffic looks like when it arrives.

Read article 27 Nov 2028 · 13 min read

Exam Room · Security

Choosing Between Config Conformance Packs and Audit Manager (preview)

An auditor wants evidence the org is meeting PCI DSS 4.0.1. One team reaches for AWS Config conformance packs; another reaches for Audit Manager. Both produce dashboards, both produce reports, and both cost money, but they're doing very different work. Config is the continuous compliance engine; Audit Manager is the evidence-collection and assessment workflow on top. Picking the wrong one leaves the other half of the job undone.

Read article 22 Nov 2028 · 13 min read

Exam Room · Security

Aggregating Findings With Security Hub (preview)

GuardDuty in twelve accounts. Inspector finding vulnerabilities across EC2, ECR, and Lambda. Macie flagging sensitive S3 objects. IAM Access Analyzer catching public exposure. Five different consoles, five different finding formats, one security team that doesn't have time to open five tabs. Security Hub is the aggregator that turns finding streams into a single normalised queue, and using it well is mostly about what you route, how you score, and who you route it to.

Read article 20 Nov 2028 · 13 min read

Exam Room · Security

How to Use Detective to Cut Incident MTTR (preview)

A GuardDuty finding on an EC2 instance that leads to questions no single log source answers: what else was that role used for, which accounts did the session touch, is this user's behaviour unusual for a Tuesday? Those are graph questions. Adding an investigation tool is the easy part; the harder work is shaping the graph so it can answer 'what else' under incident pressure, knowing what it still can't tell us, and walking from an entity profile to a containment decision in the time the incident gives us.

Read article 15 Nov 2028 · 14 min read

Exam Room · Security

Operationalising Inspector Findings (preview)

A Lambda function written three years ago, still in production, still using a library with a known RCE. A container image that passed review in 2024 and has quietly rotted since. An EC2 AMI that was hardened at launch and hasn't been looked at since. Three workloads, one vulnerability story. The org-wide scanner has been on for months and the findings are piling up; the work now is triaging what it found, what it can and can't tell us, and who owns remediation when a 2023 vulnerability lands in a 2026 inbox.

Read article 13 Nov 2028 · 15 min read

Exam Room · Security

CloudTrail Lake for Long-Term Audit Queries (preview)

An auditor asking for every AssumeRole call into the production account for the last seven years. A SIEM bill dominated by CloudTrail forwarding. An incident responder grepping gzipped JSON in an S3 bucket at 2am. Three jobs that look like they want the same thing. CloudTrail data, and three jobs that pull in opposite directions when the storage format is just gzipped JSON in S3. Lake is what happens when AWS stops making customers build that pipeline themselves.

Read article 08 Nov 2028 · 15 min read

Exam Room · Security

When Shield Advanced Earns Its Subscription Over Shield Standard (preview)

Shield Standard is on by default, free, and genuinely useful. Shield Advanced is $3,000 a month per organisation, a lot of paperwork, and, on the quarter when a volumetric attack lands on a revenue-bearing endpoint, probably the cheapest insurance AWS sells. Picking between them is less about the attack surface and more about the budget, the team, and whether the DDoS response team on speed-dial is worth more than the subscription.

Read article 06 Nov 2028 · 16 min read

Exam Room · Security

Choosing Between Managed, Rate-Based, and Custom WAF Rules (preview)

A login endpoint under a credential-stuffing run, a news site whose comment form is getting SQL-injected, and a tier-one PCI application that has to prove to an auditor that it blocks the OWASP Top Ten. Three problems, three kinds of WAF rule, three different ways the rule decides whether a request is hostile. Picking the wrong kind wastes effort at best and lets the attacker through at worst; picking the correct kind is almost the whole job.

Read article 01 Nov 2028 · 15 min read

Exam Room · Security

When to Reach for Flow Logs, GuardDuty, or Detective (preview)

VPC Flow Logs tell us a packet moved; GuardDuty tells us it looks suspicious; Detective tells us what else that actor has been doing. Same investigation, three stages, three services that talk to each other so precisely that people often assume they are one thing. They aren't, and the difference matters when a finding fires at 03:00 and the on-call has to decide which console to open first.

Read article 30 Oct 2028 · 17 min read

Exam Room · Security

How to Share KMS Key Material Across Two Regions (preview)

A SaaS company runs active-active in us-east-1 and eu-west-1. Every ciphertext is wrapped in us-east-1 and replicated to eu-west-1, where the decrypt fails because the customer-managed KMS key is regional. They refuse to stand up a decrypt proxy and refuse to route every read back across the Atlantic. This scenario asks for a solution that puts the same key material in both places, one key id, two regional arms, symmetric decrypt local to whichever side the reader is on.

Read article 25 Oct 2028 · 19 min read

Exam Room · Security

Choosing Between IAM Access Analyzer's Three Analyzer Types (preview)

A security team at a fintech wants three things: every resource shared outside the AWS Organization in one list; a way to catch an accidental 'Principal: *' before a change merges; and a report of IAM roles and permissions nobody has used for ninety days. Three asks that sound like three products turn out to be three analyzer types of one service, enabled from one delegated administrator and reading across every account.

Read article 23 Oct 2028 · 18 min read

Exam Room · Security

Scanning S3 for PHI With Macie (preview)

A healthcare SaaS has 4,000 S3 buckets across 30 accounts and an auditor who wants proof that no PHI has turned up anywhere it shouldn't. The ask is continuous scanning for SSNs, medical record numbers, dates of birth, and card numbers; automated alerting when a match appears; and a dashboard that ranks buckets by sensitivity rather than by filename. One managed service covers all three, enabled once at the organisation level.

Read article 18 Oct 2028 · 16 min read

Exam Room · Security

Prioritising Inspector Findings by Reachability (preview)

Four hundred EC2 instances, eighteen hundred Lambda functions, two hundred ECR images, and no vulnerability scanner in the account. Compliance wants evidence of coverage and remediation timelines; the security team wants CVEs ranked by CVSS, by whether an exploit is known, and by whether the thing is even reachable from the internet. One managed service covers all three surfaces with a prioritisation signal built from the attributes that matter, enabled once across the whole organisation.

Read article 16 Oct 2028 · 17 min read

Exam Room · Security

How to Investigate One Incident With GuardDuty, Flow Logs, and Detective (preview)

An EC2 instance allegedly pushed 4 GB of data to an unusual external IP over three hours. The security team has VPC Flow Logs in CloudWatch, a GuardDuty finding that flagged the behaviour, and Detective switched on across the organisation. Each of the three services does a different job: Flow Logs hold the raw per-flow evidence, GuardDuty raised the alarm with a classification, Detective pivots across the evidence to build a timeline. Pick the wrong one and you either drown in records or stare at a single finding with no context.

Read article 11 Oct 2028 · 18 min read

Exam Room · Security

Choosing Between Parameter Store and Secrets Manager Across a Mixed Inventory (preview)

A platform team has 150 secrets scattered across Parameter Store, Secrets Manager, and env vars committed to S3. They want one canonical home with database rotation, hierarchical naming, cross-region replication, and a reasonable bill. Secrets aren't one kind of thing, rotating database credentials, vendor API keys, and signing keys behave differently, and the storage choice should follow the rotation lifecycle each one needs.

Read article 09 Oct 2028 · 16 min read

Exam Room · Security

KMS Grants for Per-Execution Decrypt (preview)

A Lambda fleet across twelve spoke accounts needs to call kms:Decrypt on a shared customer-managed key, but for one execution, on one event, with a scoped permission that vanishes when the handler returns. Key policies and blanket IAM grants are both too coarse. The interesting work is matching the per-execution, per-envelope, audit-per-event shape to the right KMS authorisation instrument.

Read article 04 Oct 2028 · 16 min read

Exam Room · Security

How to Federate Okta to AWS Across an Organization Without Long-Lived Keys (preview)

Thirty-five AWS accounts, four hundred engineers, one corporate Okta, and a pile of long-lived IAM access keys that nobody can fully account for. The brief is to collapse all of it into one Okta change per joiner and leaver, with MFA everywhere and no static credentials anywhere. Getting there means touring what federated access on AWS actually looks like, and what each service will and will not do.

Read article 02 Oct 2028 · 16 min read

Exam Room · Security

Cross-Account S3 With Customer-Managed KMS (preview)

A BI team in one account needs to read encrypted objects in a data platform account. One bucket. One customer-managed key. Two accounts. Three policies, and every one of them has to say yes. Miss any single lock and the request dies in a different place, with a different error, depending on which lock shut.

Read article 13 Sep 2028 · 14 min read

Exam Room · DevOps

Application Insights Without Dashboards (preview)

A .NET service on EC2 behind an ALB with a SQL Server database, a team that has drilled into CloudWatch Metrics three hundred times looking for 'why is this slow,' and an aggregate noise floor that makes a real regression hard to spot. Picking the right AWS box is the easy part; what an automated problem-detection layer actually has to do to be useful on this estate is correlate signals across components instead of firing one alarm per metric, know the stack well enough to monitor IIS workers and SQL queries without hand-rolled configuration, baseline the patterns that change by time-of-day, and feed the operational queue without being yet another thing the team maintains.

Read article 11 Sep 2028 · 13 min read

Exam Room · DevOps

Chaos Engineering With AWS Fault Injection Service (preview)

Resilience reviews sign off the architecture; game days sign off the application. A managed fault-injection service sits between them: declarative experiments that stop an EC2 instance, throttle a DynamoDB table, or drop network traffic to an AZ, with safety rails so a test that starts hurting production aborts itself. Understanding experiment templates, actions, targets, and the abort surface is the difference between a controlled test and an incident.

Read article 06 Sep 2028 · 14 min read

Exam Room · DevOps

Recovery Objectives Made Measurable (preview)

Every service the team runs claims a 4-hour RTO and 1-hour RPO in the design doc. Nobody has actually measured whether the numbers hold, and last year a Tier 2 app took eleven hours to recover. The interesting work isn't whether some AWS tool can produce a number (one can); it's what an honest, ongoing measurement of recovery posture has to cover: which disruption types, what the model can see in infrastructure-as-code, what it can't (application logic, runbook clock time), and how the resulting score becomes evidence the auditor will accept.

Read article 04 Sep 2028 · 14 min read

Exam Room · DevOps

Patching On-Premises Alongside Cloud (preview)

Four hundred EC2 instances, two hundred on-prem servers in a data centre connected via Direct Connect, and a single compliance baseline that says 'patch cadence is the same everywhere.' The interesting work isn't whether AWS can patch a server it doesn't own (it can); it's what an 'identical policy, different maintenance windows' arrangement looks like in practice, where the evidence lives, and what happens when a hybrid node loses connectivity mid-patch.

Read article 30 Aug 2028 · 14 min read

Exam Room · DevOps

Always-On Vulnerability Scanning (preview)

EC2 instances, Lambda functions, and container images sitting in ECR, each a potential surface for a known CVE, each with a different way to find out. Weekly third-party scans catch things after they land. Continuous CVE scanning is a given; the work is figuring out which workload types are covered, what quietly defers to other tooling, and how findings flow into a remediation pipeline that doesn't drown the security team.

Read article 28 Aug 2028 · 16 min read

Exam Room · DevOps

How to Automate Incident-Response Runbooks With Step Functions and SSM (preview)

A GuardDuty finding fires; someone on call looks at the runbook in Confluence; twelve steps later the EC2 instance is quarantined, a snapshot has been taken, the on-call has typed the AWS console password three times. Turning the twelve-step runbook into code, with deterministic execution, human-approval pauses, retries, compensation, and an audit trail, is what an automated incident response wants. The interesting choice is which orchestration model fits the shape.

Read article 23 Aug 2028 · 15 min read

Exam Room · DevOps

Service Catalog vs AppConfig (preview)

Twelve teams wanting to self-serve an internal platform (VPC, IAM roles, CI pipelines, logging baseline) without asking platform engineering every time. Feature flags and configuration that should update in seconds, without a redeploy. Two different jobs, two different timelines: provisioning happens once per request, configuration changes happen many times per running application. Treating them as the same problem is the mistake that takes weeks to recover from.

Read article 21 Aug 2028 · 15 min read

Exam Room · DevOps

How to Reconcile CloudFormation Stack Drift Without Losing Fixes (preview)

Forty CloudFormation stacks, an auditor's drift report showing eighty modified resources, and a team that can't remember which of those were incident fixes and which were accidents. Detection names the resources; the remediation is where the work lives. Reverting, re-declaring, importing, and previewing all produce different outcomes, and picking the correct one depends on what caused the drift in the first place.

Read article 16 Aug 2028 · 14 min read

Exam Room · DevOps

Choosing CodeBuild Compute: On-Demand, Reserved Fleets, or Lambda (preview)

Two thousand CodeBuild runs a day across forty services, queue depth that spikes on merge storms, nine-minute builds when build capacity is cold and ninety-second builds when it's warm. CI compute comes in several billing shapes (pay-per-build-minute, pay-for-the-fleet, pay-per-millisecond), each honest for a different build profile. The decision isn't about which is fastest but which matches the workload, and the right answer for a forty-project monorepo isn't one shape for all of it.

Read article 14 Aug 2028 · 14 min read

Exam Room · DevOps

Backup Compliance Reports From AWS Backup (preview)

Three accounts, eight resource types, a backup policy that already runs and a compliance ask that nobody wants to answer with screenshots. The activity is happening; the audit trail isn't. Turning a running backup programme into an auditor-ready artefact takes three separate things wired up, and which artefact answers which question depends on what the auditor asks.

Read article 09 Aug 2028 · 13 min read

Exam Room · DevOps

Choosing a Container Runtime for Mixed Workload Shapes on AWS (preview)

A small service mesh of ten applications, each a container image in ECR. One handles a sustained 2,000 requests per second; one is invoked 800 times a day by a webhook; one is the team's favourite long-running worker. AWS has more than one runtime that will accept the same container image, and the correct answer changes per application. The interesting work is mapping request shape, cold-start tolerance, and networking constraints to the runtime that fits.

Read article 07 Aug 2028 · 14 min read

Exam Room · DevOps

GitOps on EKS: Pull vs Push (preview)

Three EKS clusters, sixty teams deploying into them, a CodePipeline stage that runs kubectl apply against a kubeconfig file that nobody wants to rotate. Moving to GitOps means one decision about the controller and a dozen smaller ones about repository layout, RBAC, secrets, and how the cluster gets told what to run. The mental models the GitOps options offer differ enough (in multi-tenancy, in observability, in failure modes) to be worth thinking through before reaching for a tool.

Read article 02 Aug 2028 · 15 min read

Exam Room · DevOps

AWS Config vs Audit Manager (preview)

AWS Config already says every S3 bucket in the Organization has default encryption enabled. The auditor wants the same statement in a different shape: a signed PDF with a framework reference, an assessor sign-off, and evidence attached to each control. The live compliance state and the audit artefact are different jobs, and a working audit story needs both, because each one alone leaves a hole the other fills.

Read article 31 Jul 2028 · 16 min read

Exam Room · DevOps

CodeArtifact and Build Caching for CI (preview)

A monorepo of forty Node services, npm install eating eight minutes of every CodeBuild run, and a security finding that half the internal packages are installed straight from a public registry. The two problems are related: a private artefact registry that the builder doesn't cache against is the same wound as a cache that doesn't know about private packages. Fixing both is one story about where dependency traffic flows, and which layer caches it.

Read article 26 Jul 2028 · 14 min read

Exam Room · DevOps

How to Produce Auditable Patch Compliance Reports for EC2 Fleets (preview)

Eight hundred EC2 instances, five operating-system flavours, a compliance team that wants to know which hosts are behind on security patches and a change-management team that wants a record of exactly what landed where. AWS has the moving parts to do this without buying tooling, but wiring them into a report someone outside the team will read takes a specific set of decisions about what counts as 'patched', when patching happens, and where the evidence lives.

Read article 24 Jul 2028 · 14 min read

Exam Room · DevOps

How to Share a CodePipeline Artefact KMS Key Across Accounts (preview)

A CodePipeline in a tooling account, artefacts in an S3 bucket in the tooling account, deploy stages that run CloudFormation in staging and prod accounts. The pipeline is green in the console and red in every deploy account: 'AccessDenied' on the artefact download. The fix isn't another role, it's the customer-managed KMS key that signs the bucket contents, and who is allowed to decrypt with it.

Read article 19 Jul 2028 · 15 min read

Exam Room · DevOps

How to Set Up Continuous Audit Evidence Collection Across an AWS Organization (preview)

A compliance team is six weeks out from a SOC 2 Type II audit across sixty AWS accounts. Today their evidence lives in a shared drive full of screenshots and a spreadsheet tracking which control each screenshot is meant to prove. They need the evidence pulled automatically, mapped to the controls the auditor will actually ask about, packaged into a report a human can sign off, and monitored continuously so drift between audits is caught the day it happens.

Read article 17 Jul 2028 · 16 min read

Exam Room · DevOps

EventBridge Pipes vs Glue Lambdas (preview)

A platform team maintains roughly twenty tiny Lambda functions whose only job is to pull events from SQS, drop the ones that don't matter, reshape the rest, and hand them to SNS, DynamoDB, Step Functions, or Kinesis. Each one is forty lines of boilerplate around three lines of value, and each one bills for every invocation whether the event survived the filter or not. AWS has more than one answer for source-filter-target wiring, and the interesting choice is which layer the wiring belongs in.

Read article 12 Jul 2028 · 17 min read

Exam Room · DevOps

How to Build Multi-Account Multi-Region Pipelines With CDK Pipelines (preview)

A platform team wants one pipeline for a microservice that flows dev → staging → prod across us-east-1 and eu-west-1, knows about every target account, gates prod on a human, and updates itself when the pipeline definition changes. CodePipeline with hand-wired CloudFormation actions gets close but has to be rebuilt every time the shape changes. CDK Pipelines generates the whole thing from code, self-mutates on each run, and treats accounts and regions as first-class arguments.

Read article 10 Jul 2028 · 15 min read

Exam Room · DevOps

How to Set Up Blue/Green ECS Deploys With CodeDeploy (preview)

An ECS Fargate platform took eight minutes of elevated 5xx from a rolling deploy that left old and new tasks serving the same path with incompatible contracts. The fix is blue/green via CodeDeploy: two target groups, a test listener for smoke tests, an alarm gate for rollback, and a task definition revision the new task set is pinned to. The rolling controller can't give that shape; the CodeDeploy controller can.

Read article 05 Jul 2028 · 15 min read

Exam Room · DevOps

How to Route Operational Events Across AWS Accounts With EventBridge (preview)

Five AWS accounts emit operational events. Two accounts want to consume them, the observability account for dashboards, the security account for a SOAR pipeline. The events need versioned schemas, replay for the SOAR pipeline, and a dead-letter path. EventBridge custom buses with cross-account targets are the one answer that ticks every box without a control plane of their own.

Read article 03 Jul 2028 · 15 min read

Exam Room · DevOps

How to Deploy a Security Baseline Across an AWS Organization (preview)

A four-person platform team owns a security baseline that has to live identically in fifty AWS accounts, auto-apply to any new account the Org picks up, and shout when someone meddles with it. Logging into fifty consoles is not a plan. CloudFormation StackSets with service-managed permissions is the one answer that ticks every box without a control plane of its own.

Read article 28 Jun 2028 · 13 min read

Exam Room · DevOps

How to Choose a CodeDeploy Config for Lambda With Auto-Rollback (preview)

AWS ships nine predefined CodeDeploy configurations for Lambda. Three families, canary, linear, and all-at-once, and most of the variation is just timing. The interesting part isn't the percentages; it's the alarm gate that turns any of them into a self-rolling-back deploy.

Read article 12 Jun 2028 · 17 min read

Exam Room · Machine Learning

How to Train a Reinforcement Learning Policy on SageMaker (preview)

An energy-storage operator wants to train a policy that decides when to charge and discharge a battery to maximise arbitrage profit across a volatile electricity market. The business problem is straightforward; the ML framing is anything but. Supervised learning needs labelled 'correct decisions' that don't exist; it's a sequential-decision problem under uncertainty, which is what reinforcement learning is for. SageMaker RL exists for exactly this shape of problem, and the interesting work is knowing what it provides, what it doesn't, and how the pieces fit together into something that trains and deploys.

Read article 07 Jun 2028 · 14 min read

Exam Room · Machine Learning

Shadow, Canary, or Linear: Rolling Out a New Model (preview)

A recommendation model has a new version ready to ship. The old one serves 200 RPS of production traffic; the new one has passed offline evaluation but nobody's willing to point the live firehose at it on faith alone. There are three deployment patterns built into SageMaker for exactly this moment, shadow, blue-green with canary, and linear traffic shifting, and they're not interchangeable. Which one fits depends on what kind of confidence you need to build, and what you're willing to spend to build it.

Read article 05 Jun 2028 · 14 min read

Exam Room · Machine Learning

How to Serve Thousands of Per-Tenant Models on One Endpoint (preview)

A personalisation team has trained 4,000 per-tenant churn models, one per customer account, and deploying each to its own endpoint would cost $24,000 a month in minimum-instance floors before a single inference is served. The interesting work isn't picking the lowest-cost row off a price sheet; it's understanding which traffic patterns suit a model-per-endpoint shape, which suit a many-models-shared-instance shape, and how to size whatever we land on so the working set fits in memory and nobody's tenant notices their model had to be paged in.

Read article 31 May 2028 · 16 min read

Exam Room · Machine Learning

How to Train a Classifier on Heavily Imbalanced Data (preview)

A fraud detection model trained on a year of transaction data reports 99.3% accuracy and catches almost no fraud. The training set is 0.4% fraudulent and 99.6% legitimate, and the model has learned the safest possible strategy: predict 'not fraud' every time. The correct answer isn't 'try harder', it's knowing that imbalanced classification has a specific set of techniques, each with a different cost, and picking the one that matches both the data shape and the business cost of each error type.

Read article 29 May 2028 · 16 min read

Exam Room · Machine Learning

How to Right-Size a SageMaker Endpoint With Inference Recommender (preview)

A team is about to ship a fine-tuned language model to production. They have a trained artifact, a sample payload, a latency target, and no idea which of the 25 inference instance types is the correct one. The cheapest fit is probably six times cheaper than the most-expensive-that-works. SageMaker Inference Recommender runs the benchmark so we don't have to, but the interesting work is knowing which question to ask it, reading the results it gives back, and knowing when its answer is wrong for our actual workload.

Read article 24 May 2028 · 13 min read

Exam Room · Machine Learning

When SageMaker Training Compiler Actually Helps (preview)

A Hugging Face transformer is trained for 6 hours on 8 GPUs, three times a week, and the data scientists would like it to be faster. They've already tuned the hyperparameters; the bottleneck is the model itself running through PyTorch. SageMaker Training Compiler promises a 30-50% speedup on exactly this shape of workload by rewriting the training graph before it runs, but the docs hide when it helps and when it doesn't, and a naive drop-in makes some jobs slower. The interesting work is knowing which models and which training shapes it actually accelerates.

Read article 22 May 2028 · 14 min read

Exam Room · Machine Learning

How to Use SageMaker Lineage Tracking to Answer Audits (preview)

A regulator asks a compliance team to prove which training dataset produced the model that scored a specific loan application six months ago. The answer lives across S3 buckets, Git commits, training-job logs, and somebody's Jupyter notebook. SageMaker has a service specifically for this question (Lineage Tracking), but it's not useful as a report after the fact. It's useful as a graph built automatically while the training pipeline runs, and the interesting work is knowing which pieces it captures for free, which we have to add ourselves, and how the graph gets queried when the regulator actually asks.

Read article 17 May 2028 · 15 min read

Exam Room · Machine Learning

How to Add a Custom Metric to SageMaker Model Monitor (preview)

A recommender-system endpoint is drifting in a way the built-in Model Monitor reports don't see. Feature distributions look fine, ground-truth labels arrive too late to catch the regression, and the data quality metrics are all green. The signal that matters is a domain-specific one (the top-10 hit rate against a held-out validation set replayed hourly), and the way to get Model Monitor to watch for it is to teach it your own metric. The interesting work is in what counts as a metric, how the built-in jobs evaluate it, and where the boundary between 'just report the number' and 'also alarm on it' actually sits.

Read article 15 May 2028 · 14 min read

Exam Room · Machine Learning

Choosing Between SageMaker Real-Time Endpoints and Batch Transform (preview)

A customer-churn model is re-scored nightly across 20 million accounts, and a lead-scoring model is called the moment someone submits a form. Same model family, same training data, same accuracy target, and two completely different correct answers about how to deploy it. Real-time endpoints and batch transform jobs have overlapping names and very different bills. Getting the match correct turns out to be less about latency and more about what the downstream consumer actually does with the prediction.

Read article 10 May 2028 · 14 min read

Exam Room · Machine Learning

When To Reach for Async SageMaker Inference (preview)

A contract-review model takes 45 seconds to score a 30-page document, payload is 8 MB, and the front end team's XHR times out at 30 seconds every time. Throwing a bigger instance at it makes each call slightly faster but leaves 90% of the expensive GPU idle between requests. SageMaker has a third endpoint type that fits exactly this shape (not synchronous, not a batch job), and its trade-offs explain themselves once the workload is laid out properly.

Read article 08 May 2028 · 16 min read

Exam Room · Machine Learning

XGBoost, LightGBM, or DeepAR: Picking the Right Built-In (preview)

Three forecasting problems on the same desk. Next-quarter fraud rate by product line. Daily SKU-level demand across 4,000 products. Ad-spend to conversion attribution where explanations matter as much as numbers. SageMaker's catalogue has a built-in algorithm for each; the names (XGBoost, LightGBM, DeepAR) are not interchangeable. Each one was built for a particular shape of problem, and the cost of picking the wrong one is what we're trying to avoid.

Read article 03 May 2028 · 17 min read

Exam Room · Machine Learning

How to Deploy a SageMaker Endpoint Across Multiple Regions (preview)

A fraud-scoring model that runs in a single eu-west-1 endpoint earns its first European customer in Sydney, and then Tokyo, and then São Paulo. Latency goes from 40 ms to 340 ms and the product manager wants to know what we're going to do about it. Multi-region deployment sounds like a single decision; it's actually four decisions stacked (where the model lives, where the endpoints run, how traffic finds the nearest one, and how the weights stay in sync). Each one has options, and the wrong pairing turns a latency win into a disaster-recovery problem.

Read article 01 May 2028 · 16 min read

Exam Room · Machine Learning

Choosing Bedrock or SageMaker JumpStart for a RAG Chatbot (preview)

A product team wants a customer-facing chatbot that answers questions over our own documentation, cites its sources, and doesn't cost a fortune per conversation. Two AWS services have foundation-model in their marketing copy: Bedrock and SageMaker JumpStart. They sound interchangeable; they are not. Neither is better in the abstract. They're built for different shapes of problem, and picking one means committing to what we'd be willing to operate.

Read article 26 Apr 2028 · 16 min read

Exam Room · Machine Learning

How to Build a Multilingual Invoice Pipeline With Textract, Translate, and Comprehend (preview)

A stack of scanned vendor invoices in six languages lands in an S3 bucket every hour. Somebody needs to pull out the supplier name, line-item totals, and the tax rate, translate the freeform notes into English, and route the whole thing to accounts payable. Three AWS services have their names on that job, and each does a slice of it well and the other slices badly. The interesting work is lining up which service answers which question, rather than asking any one of them to do the whole pipeline.

Read article 24 Apr 2028 · 17 min read

Exam Room · Machine Learning

How to Run SageMaker Training Jobs on Spot Capacity (preview)

A nightly training pipeline burns 8 × A100 for four hours per run, which at on-demand prices is enough to have finance asking pointed questions. The training is idempotent, checkpointed every 30 minutes, and has no deadline tighter than 'done by the morning standup.' SageMaker training jobs can run on Spot capacity for roughly a 70% discount, provided the pipeline tolerates interruption, the instance choice is broad enough, and the checkpoint-and-resume logic actually works when AWS reclaims the instance halfway through epoch 7.

Read article 19 Apr 2028 · 15 min read

Exam Room · Machine Learning

How to Roll Out a New SageMaker Endpoint With Auto-Rollback (preview)

A fraud-detection model retrained on six more months of data is ready for production. The last time the team deployed a new version by flipping the endpoint over, it took three days to notice the new model was regressing on a specific merchant segment. They want the next rollout to be observable, reversible, and automated to roll back on its own. SageMaker endpoint variants with shadow testing and traffic shifting are the tool for that, and the shape of the safety is in the deployment-config choices.

Read article 17 Apr 2028 · 13 min read

Exam Room · Machine Learning

When to Use SageMaker Autopilot for Tabular Classification (preview)

A data scientist is staring at a new tabular classification problem: 50 features, 150k rows, balanced classes. Could be XGBoost. Could be CatBoost. Could be a small neural net. Iterating through five model families with hyperparameter tuning by hand is several engineer-weeks. SageMaker Autopilot is the automated alternative, it trains and tunes a set of candidate pipelines, picks the best, and hands back a model plus a notebook showing everything it tried. Worth understanding what it does automatically, what it doesn't, and when it's the correct first reach.

Read article 12 Apr 2028 · 14 min read

Exam Room · Machine Learning

Building a Churn Model in SageMaker Canvas Without Python (preview)

A business analyst with ten years of domain expertise and zero Python experience wants to build a customer-churn model. They have the data in Redshift, they have a reasonable understanding of which features matter, and they have no one on the data team with cycles to help for the next quarter. SageMaker Canvas is the no-code option that turns that into something they can actually ship; understanding what Canvas does (and what it hides) is the real story.

Read article 10 Apr 2028 · 14 min read

Exam Room · Machine Learning

Choosing a Bedrock Knowledge Base Vector Store (preview)

A team building a Bedrock Knowledge Base for 8,000 product documents is being asked the first infrastructure question of any RAG system: where do the embeddings live? Bedrock offers four backing stores. OpenSearch Serverless, Aurora PostgreSQL with pgvector, Pinecone, and Redis Enterprise Cloud, plus the Neptune-based graph store for graph-RAG. They look interchangeable on a diagram and diverge sharply once you ask about scale, cost shape, and operational model.

Read article 05 Apr 2028 · 18 min read

Exam Room · Machine Learning

Bedrock Agents vs Lambda Orchestration (preview)

A customer-facing assistant needs to look up order status, create a return, check inventory, and talk to the user in a coherent thread across all of it. Two AWS options for the orchestration: Bedrock Agents, which gives the LLM the keys to call tools and plan multi-step flows; or a Lambda-based state machine where the LLM is one step among many. The trade isn't about capability, both can do the work, but about where the intelligence sits, where the bugs can hide, and who's on call when a user gets a strange answer.

Read article 03 Apr 2028 · 16 min read

Exam Room · Machine Learning

Choosing Between RAG, Fine-Tuning, and Prompting (preview)

An internal-support team wants a chatbot that answers questions using the company's own product documentation, changelogs, and a decade of support tickets. A general-purpose foundation model knows none of it. The team has three levers to make the model look like it knows the domain: retrieval-augmented generation, fine-tuning, and careful prompting. The levers trade against each other on latency, cost, freshness, and how much the model actually internalises, picking between them is more about the nature of the knowledge than the quality of the model.

Read article 29 Mar 2028 · 17 min read

Exam Room · Machine Learning

Choosing Between Glue and SageMaker Processing for Feature Pipelines (preview)

A feature pipeline reads a few terabytes of raw transaction events, joins to customer and merchant reference data, computes rolling windows, and writes a parquet dataset ready for training. AWS has at least two very different ways to run that pipeline: Glue (ETL-shaped, Spark-backed, Data Catalog aware) and SageMaker Processing (ML-shaped, script-backed, in the same job graph as the training). Same work, two tools, the pick is about where the pipeline lives in the team's mental model as much as what it does.

Read article 27 Mar 2028 · 17 min read

Exam Room · Machine Learning

Bringing Your Own Container to SageMaker (preview)

A research team has been running a JAX + Flax model on their own EC2 instances and wants to migrate onto SageMaker for training, hyperparameter tuning, and inference. SageMaker ships first-party containers for PyTorch, TensorFlow, MXNet, HuggingFace, and XGBoost, and nothing for JAX. Building a bring-your-own-container image turns out to be almost the entire migration, and the shape of that container is more constrained than it looks from outside.

Read article 22 Mar 2028 · 14 min read

Exam Room · Machine Learning

How to Profile a Slow SageMaker Training Job (preview)

A training run on a `p4d.24xlarge` is taking twice as long as it did a month ago. The throughput metric says ~50% of expected; nvidia-smi shows GPUs sitting at 40% utilisation instead of the 90%+ they used to hold. Something is starving them, and it could be anything, the data loader, the network, a CPU pre-processing step, IO latency, a single layer allocating too much. SageMaker Debugger's profiler is designed for this exact hunt, and it breaks the training step into parts that let you answer 'where did the time go' rather than 'is it slow.'

Read article 20 Mar 2028 · 16 min read

Exam Room · Machine Learning

Choosing a Distributed Training Strategy When One GPU Isn't Enough (preview)

A model that was comfortably fitting on one `p4d.24xlarge` six months ago won't fit on eight A100s today. The dataset has grown, the parameter count has grown, and a single training run takes thirty-six hours if it completes at all. SageMaker supports at least four distributed-training strategies and they scale along different axes, one for when the data doesn't fit, one for when the model doesn't fit, one for when neither fits, and one for jobs that don't really need any of this but would like to be faster.

Read article 15 Mar 2028 · 15 min read

Exam Room · Machine Learning

From Notebook Instances to SageMaker Studio (preview)

A data science team that started with one `ml.t3.medium` SageMaker notebook instance now has fourteen of them, each bookmarked by a different person, each holding state nobody else can see. When a new hire joins, the onboarding doc says 'ask Priya which notebook to use.' SageMaker Studio is sold as the replacement, but the transition isn't notebook-for-notebook; it's a different model of who owns what, where the files live, and how the team shares kernels.

Read article 13 Mar 2028 · 16 min read

Exam Room · Machine Learning

Hosting Seventy Models Behind One SageMaker Endpoint (preview)

Seventy per-merchant fraud models, each trained independently, each needed on demand but almost never all at once. A separate endpoint per model would be an eye-watering monthly bill; squashing them into one model would waste months of per-merchant tuning. SageMaker offers at least three endpoint shapes for hosting many models together, and they look superficially similar until you ask which ones share containers, which ones share hardware, and what happens when a model isn't hot in memory.

Read article 08 Mar 2028 · 15 min read

Exam Room · Machine Learning

How to Tell Which Kind of Model Drift You Have (preview)

A fraud model that's been quiet for nine months starts throwing a different shape of alerts. Recall is down, precision is down, and the feature distributions look off in places that don't match any product change we shipped. 'The model is drifting' is a sentence that means four different things depending on which part of the pipeline moved. SageMaker Model Monitor names the four and gives each one a baseline, a schedule, and a metric, which is the part that matters when it's 3am and a dashboard is red.

Read article 06 Mar 2028 · 22 min read

Exam Room · Machine Learning

Running Edge ML Without SageMaker Edge Manager (preview)

Two hundred industrial cameras run defect detection on factory floors, ships with steel hulls, remote mining sites, patchy satellite uplinks. Inference has to happen locally and sync to the cloud when the network cooperates. The textbook answer for this shape used to be SageMaker Edge Manager. That service reached end-of-life on 26 April 2024. Working out what replaces it is the real scenario.

Read article 01 Mar 2028 · 21 min read

Exam Room · Machine Learning

Picking the Right Auto-Scaling Signal for GPU Endpoints (preview)

A SageMaker real-time endpoint serves an image classifier on a GPU instance. Auto-scaling is wired to CPU utilisation at a 70% target. When traffic spikes, latency spikes with it, and the scale-out fires several minutes after the queue has already blown out, because the GPU saturates long before the CPU does. The fix is choosing a signal that actually tracks demand.

Read article 28 Feb 2028 · 21 min read

Exam Room · Machine Learning

Governed Model Promotion on SageMaker (preview)

A regulated-industry ML team promotes models by copying S3 files and hoping. Legal wants a signed human approval, auditors want the holdout metrics on file, engineering wants to know which dataset trained the thing, and when production catches fire somebody wants a way back to whatever was running yesterday. Four requirements, one promotion gate, and a clear picture of which corner of AWS is built for this and which corners are almost-but-not-quite.

Read article 23 Feb 2028 · 19 min read

Exam Room · Machine Learning

Hosting a 70B LLM with Business-Hours Traffic (preview)

A fine-tuned Llama 3 70B, a traffic pattern that sits idle overnight and peaks at lunch, and three AWS hosting paths that look similar in the console. One is provisioned-only and wrong the second the day ends. One is full BYO and a quarter's worth of work. The middle path is the one that reads the traffic shape correctly.

Read article 21 Feb 2028 · 14 min read

Exam Room · Machine Learning

How to Label 500K Images with 100 Hours of Human Time (preview)

Half a million product images, a hundred hours of human annotation budget, and a requirement for bounding boxes around every product. Labelled by hand that's roughly ten seconds per image, less than a third of what's needed. The trick is to let a model label the easy ones and save the humans for the uncertain edges, then work out whether the numbers survive contact with reality.

Read article 16 Feb 2028 · 16 min read

Exam Room · Machine Learning

Orchestrating a Nightly Model Retrain (preview)

A nightly XGBoost retrain, a conditional push to the Model Registry if AUC clears a bar, a deploy Lambda downstream, and a hard requirement to know which dataset produced which model. Five orchestrators can run that DAG. Only one of them tracks the lineage for free.

Read article 14 Feb 2028 · 14 min read

Exam Room · Machine Learning

How to Avoid Train/Serve Skew with SageMaker Feature Store (preview)

A fraud model trains on historical data overnight and scores transactions in real time under a 50 ms budget. The features it needs are identical in both paths, and if the two paths compute them differently, the model gets train/serve skew and quietly loses accuracy. SageMaker Feature Store solves this by backing one feature definition with two stores: one for training, one for serving.

Read article 09 Feb 2028 · 15 min read

Exam Room · Machine Learning

Choosing a SageMaker Hyperparameter Tuning Strategy (preview)

An XGBoost fraud model, twenty hyperparameters, a two-hundred-job tuning budget, and twenty-five minutes per run. SageMaker Automatic Model Tuning ships four search strategies for exactly this shape of problem, and picking the correct one is worth roughly a 4x cut in wall-clock time for the same final AUC.

Read article 07 Feb 2028 · 13 min read

Exam Room · Machine Learning

Choosing a SageMaker Inference Mode (preview)

SageMaker offers four ways to serve a model. Pick by latency and you'll get the answer wrong. The interesting axis here is payload size, processing time, and arrival pattern, and three of the four options exclude themselves before you've measured anything.

Read article 19 Jan 2028 · 15 min read

Exam Room · Data

Choosing Between RA3, DC2, and Spectrum for Redshift Storage (preview)

A Redshift cluster that's been running DC2 nodes since 2019, a growing data lake in S3 the warehouse wants to reach into, and a new team that wants Redshift Serverless without any of the node-sizing conversation. RA3 with managed storage, DC2 with local storage, and Spectrum reading S3 directly are three different answers to 'how is compute and storage related in Redshift', and each makes a different trade.

Read article 17 Jan 2028 · 16 min read

Exam Room · Data

Tuning Redshift WLM for Mixed Workloads (preview)

A Redshift cluster where short dashboard queries wait behind five-hour ETL jobs every morning, a WLM queue configuration that's been default for eighteen months, and a DBA who wants to make the ETL stop starving the dashboards without making the ETL slower. WLM is Redshift's workload-management layer: queues, slots, memory, and the rules that route queries between them.

Read article 12 Jan 2028 · 14 min read

Exam Room · Data

Choosing the Right Glue Job Shape for the Workload (preview)

A Glue bill that's become the biggest line item in the data platform. Half the jobs are transforming hundred-gigabyte datasets; the other half are bouncing three hundred rows through a five-minute script. They're all the same job shape because that's what got written first. The work is figuring out which workloads match which shape on the Glue-job landscape, and where the wrong shape has been paying a cluster premium for work that doesn't need one.

Read article 10 Jan 2028 · 17 min read

Exam Room · Data

Step Functions or Glue Workflows for Nightly ETL (preview)

A nightly ETL with eleven Glue jobs, dependencies between them, three of which can run in parallel, one that has to retry on transient failures, and one that requires human approval before it writes to production. Step Functions and Glue Workflows both orchestrate Glue jobs; the shape of the work determines which is the correct fit and which tries to fit but won't.

Read article 05 Jan 2028 · 14 min read

Exam Room · Data

Choosing a Managed SaaS Connector for the Warehouse (preview)

A Salesforce tenant with thirty objects that need to land in the warehouse daily, a Zendesk instance adding another fifteen, and a team that's been writing a bespoke Python script per connector for each new source. Bespoke scripts aren't sustainable; the harder call is which shape of managed ingestion matches what we actually need, what the trade-offs look like between configured connectors and code, and where each option breaks down when the source isn't quite as well-behaved as the brochure assumes.

Read article 03 Jan 2028 · 14 min read

Exam Room · Data

How to Diagnose a Slow DMS Replication Task (preview)

A DMS full-load task that's been running for thirty-six hours with sixteen hours to go, on tables that shouldn't take half that. The replication instance is showing CPU headroom and memory to spare, but the throughput graph is a flat line. DMS tuning is a set of knobs that affect different parts of the pipeline; the work is figuring out which part is the bottleneck and which knob moves it.

Read article 22 Dec 2027 · 16 min read

Exam Room · Data

How to Tier OpenSearch Shards Across Hot, Warm, and Cold (preview)

Two years of application logs in OpenSearch, most of them never queried, all of them costing the same per GB per month. Storage tiering is how that bill gets sensible without losing the ability to search yesterday's and last year's data when it matters. The work is matching the access pattern of each age of data to the tier that serves it.

Read article 20 Dec 2027 · 15 min read

Exam Room · Data

Choosing Between QuickSight SPICE and Direct Query (preview)

A QuickSight dashboard with three hundred daily viewers, backed by a Redshift cluster that groans under the concurrent load every Monday morning. SPICE is QuickSight's in-memory columnar engine that caches data so dashboards don't hit the warehouse for every view. Direct query bypasses SPICE and reads live. The decision isn't 'which is faster'. SPICE usually is, but which trade-offs each makes and which matches the dashboard's actual freshness and scale.

Read article 15 Dec 2027 · 19 min read

Exam Room · Data

Choosing a Change Data Capture Pipeline on AWS (preview)

A Postgres order-book that needs to feed the warehouse, with every insert, update, and delete captured in near-real-time. Two shapes of pipeline on AWS can carry change data forward, one point-to-point, one fan-out through a shared log, and picking between them comes down to how many consumers the CDC stream has to feed, and what happens when a new consumer wants in.

Read article 13 Dec 2027 · 17 min read

Exam Room · Data

How to Clean a Messy CSV Without Writing Code (preview)

A CSV of lead records from a trade show, a finance analyst who needs to normalise phone numbers and deduplicate contact emails before Monday, and a data engineering team that's three weeks deep in a warehouse migration. The work is figuring out who does the cleanup, what tool meets a non-programmer where she is, and whether the 'one-off' label hides a recurring job that justifies a repeatable recipe rather than a Spark script.

Read article 08 Dec 2027 · 17 min read

Exam Room · Data

How to Govern Glue Catalogs Across Four Accounts (preview)

Four Glue catalogs across four accounts, an Excel spreadsheet of dataset owners that nobody keeps current, and a finance analyst who's joined her company's customer table against a sales table she wasn't supposed to read. The catalog is everywhere and the governance is nowhere. The work is figuring out what a governance layer above the catalogs actually has to do, discovery, ownership, access workflow, project isolation, and which approach on the landscape fits the shape we have.

Read article 06 Dec 2027 · 17 min read

Exam Room · Data

How to Add Data Quality Gates to a Glue Pipeline (preview)

A Glue job that silently loaded a malformed CSV last Tuesday and nobody noticed until an analyst asked why Q3 revenue looked odd. The fix isn't a bigger dashboard; it's an upstream gate that refuses to promote data to the warehouse unless a stated set of expectations holds. Adding data quality checks is obviously the right call; the harder part is deciding what to assert, where in the pipeline the assertions should run, and what should happen when an assertion fails.

Read article 01 Dec 2027 · 16 min read

Exam Room · Data

EventBridge Pipes or a Lambda for Filter-and-Enrich (preview)

A DynamoDB stream that needs to feed an EventBridge bus, with a filter that drops 80% of the events and an enrichment that adds customer-segment data. The classic answer is a function full of boilerplate around two meaningful transformations. The work is figuring out where on the integration landscape filter-then-enrich-then-fan-out can be expressed as configuration rather than code, and what trade-offs come with each option.

Read article 29 Nov 2027 · 17 min read

Exam Room · Data

Choosing Between Iceberg, Delta Lake, and Hudi (preview)

A pile of Parquet in S3, an Athena catalog that knows about partitions but nothing about versions, and an analyst who keeps asking 'what did the table look like yesterday?'. Three open table formats have answers. Iceberg, Delta Lake, and Hudi, and they answer a subtly different question each. The work is figuring out which question we're actually asking before picking.

Read article 24 Nov 2027 · 18 min read

Exam Room · Data

How to Enforce Schemas on a Kinesis Stream (preview)

Twelve producer services feed a Kinesis Data Stream read by eight downstream pipelines. Last week a producer quietly added a required field; three consumers crashed, one silently dropped the records, four carried on. The fix isn't a wiki page telling producers to be careful, it's a schema the producer has to register before it can serialise, a version ID embedded in every record, and a compatibility rule the registry enforces the moment the schema is published. Move the contract out of the heads of twelve teams and into a system that rejects a breaking change before a byte lands on the stream.

Read article 22 Nov 2027 · 15 min read

Exam Room · Data

How to Scale Kinesis Reads Across Many Consumers (preview)

A 50-shard Kinesis Data Stream sprouted an eighth downstream consumer and immediately broke the six already reading it. Every GetRecords started throwing ProvisionedThroughputExceededException. Every consumer thought it had the stream to itself; Kinesis knew otherwise. The fix means walking the read-side landscape, understanding what shard-level read throughput really caps, and weighing which consumers earn dedicated capacity and which don't.

Read article 17 Nov 2027 · 15 min read

Exam Room · Data

How to Enforce Column and Row Access on a Data Lake (preview)

A 12 TB S3 data lake, 40 Glue tables, and an IAM policy that says yes or no to whole prefixes. The data platform team wants analysts to read the sales tables, but not the commission column. Regional analysts in EMEA must see only EMEA rows. Every query must land in an audit log that the risk team can subpoena. Closing the gap means walking the data-lake access-control landscape and finding a policy surface that speaks the language of tables, columns, and rows, and that every engine consults before a single byte is returned.

Read article 15 Nov 2027 · 15 min read

Exam Room · Data

How to Cut Athena Scan Costs with Partitioning (preview)

A finance team's Athena queries are scanning eight terabytes of CSV every time and costing forty dollars a pop, fifty times a day. The data they actually want is one day's worth, about thirty gigabytes. Closing the gap means understanding the Athena cost model, the difference between a registered partition and a projected one, and what columnar encoding does on top of partition pruning.

Read article 10 Nov 2027 · 16 min read

Exam Room · Data

How to Ingest Telemetry from 50,000 IoT Sensors (preview)

A fleet of 50,000 IoT sensors pushing one-kilobyte events once a second needs a real-time Lambda consumer, a seven-day replay window, external Java consumers, and Parquet-in-S3 for analytics. No single AWS streaming service ticks all four boxes cleanly, picking well means walking the streaming-ingestion landscape and weighing which pieces buy back the ones that don't fit.

Read article 08 Nov 2027 · 15 min read

Exam Room · Data

Choosing Athena, Glue, or EMR for Parquet on S3 (preview)

A product analytics team with five terabytes of Parquet in S3, a handful of ad-hoc questions a week, and zero appetite for a Spark cluster. Three AWS services could answer them. Athena, Glue, EMR, and only one of them charges nothing between query bursts. Finding the correct fit means walking the whole analytics-on-S3 landscape and understanding where each service is cheap, where each is wasteful, and where each is flat-out wrong.

Read article 20 Oct 2027 · 18 min read

Exam Room · Operations

How to Inject Secrets and Scope Task Roles in ECS (preview)

A container image rebuilt from the Dockerfile in a git branch, shipped yesterday, with the database password in an environment variable. A task definition whose IAM role grants too much because nobody's audited it. A third-party API key passed via ECS environment variables in plain text. Three ways ECS workloads hold secrets, two of them wrong. Secrets management is a given; what's worth working through is how the task role and secrets injection fit together, which one carries which credentials, and what an audit should be able to see.

Read article 18 Oct 2027 · 17 min read

Exam Room · Operations

How to Migrate EC2 Fleets to the Unified CloudWatch Agent (preview)

An old EC2 fleet still running the legacy CloudWatch Logs agent. A newer fleet running the unified CloudWatch agent with metrics and logs. A Prometheus exporter sidecar on some hosts. Three agents, two configuration files per host, and documentation that contradicts itself on which is preferred. The unified agent is plainly better; the work is figuring out what each legacy piece was doing, what migrating feels like, and where the unified agent's knobs sit once it's installed.

Read article 13 Oct 2027 · 16 min read

Exam Room · Operations

Choosing Between CloudWatch, Grafana, and Datadog Dashboards (preview)

CloudWatch dashboards in production, tweaked from three tutorials; Grafana in staging, stood up by a team that wanted to plot alongside Prometheus data; a Datadog subscription for the parts nobody wants to move off. Three dashboarding surfaces, three bills, three user populations. They all draw graphs, so 'which tool is better' isn't a useful frame; what matters is which question each surface is best for answering, and where staying on two surfaces is cheaper than consolidating.

Read article 11 Oct 2027 · 14 min read

Exam Room · Operations

How to Search 45,000 Resources Across 22 AWS Accounts (preview)

'Find me every S3 bucket tagged Env=prod across our accounts and Regions.' Thirty tabs later, a half-finished spreadsheet, and a nagging feeling one Region got missed. An estate of 22 accounts and 15 Regions has made the Describe-and-diff approach untenable. There are five ways to search AWS; what's worth working out is which one gives an answer in seconds without running a scheduled crawler or building a search index.

Read article 06 Oct 2027 · 15 min read

Exam Room · Operations

How to Act on AWS Health Events Automatically (preview)

An EBS volume is degraded in one availability zone. A TLS certificate on an older API expires in two weeks. A planned maintenance window will reboot an RDS instance tomorrow night. AWS knows all three; the team finds out from the volume's first I/O error, the certificate warning in Slack, and a post-mortem. 'Get better monitoring' is too vague to act on. The specific work is turning AWS's own health signal into a first-class input that reaches the right people before the incident does.

Read article 04 Oct 2027 · 17 min read

Exam Room · Operations

How to Turn Structured Logs Into CloudWatch Metrics (preview)

A service that logs structured JSON but publishes no CloudWatch metrics. A dashboard of graphs that each pull from Logs Insights queries, taking 30 seconds to load. An on-call rotation that correlates 'error rate' with 'deploy time' by squinting. Adding metrics is the obvious move; the work is deciding where they should come from: a new instrumentation layer in the code, a CloudWatch agent, or the log lines that are already being written.

Read article 29 Sep 2027 · 18 min read

Exam Room · Operations

SNS Access Policy vs Filter Policy (preview)

An SNS topic that half the org has permission to publish to. Eighteen subscribers, each receiving every message, each filtering on their own side with hand-written if-statements. A service that joined last month and is getting flooded with events it doesn't care about. SNS is the correct tool here; what's worth pinning down is which policy answers 'who can publish' and which answers 'who actually receives this specific message.'

Read article 27 Sep 2027 · 19 min read

Exam Room · Operations

How to Prove Backup Compliance With AWS Backup Audit Manager (preview)

An auditor arrives with a simple question: show me that every RDS instance in production was backed up last night. The on-call engineer opens the console, picks the first instance, clicks Backups, takes a screenshot, and moves to the second. Four hours later they're still going. Whether the backups ran is barely in doubt; they almost certainly did. The work is getting the evidence from 'true but unprovable' to 'exportable in a minute' without an engineer ever touching a console.

Read article 22 Sep 2027 · 16 min read

Exam Room · Operations

EC2 Image Builder vs Packer (preview)

A golden AMI rebuilt by a shell script on a bastion host, last updated seven months ago. A CVE announcement this morning that requires a patched kernel across the fleet. A team that wants to adopt Packer 'because HashiCorp.' Image Builder versus Packer in the abstract isn't a useful frame; both build images. What matters is which one owns the pipeline, which owns the recipe, and what it costs when the choice is wrong.

Read article 20 Sep 2027 · 15 min read

Exam Room · Operations

How to Fan Out CloudWatch Logs to Real-Time Consumers (preview)

A security team wants every `AccessDenied` line copied into a SIEM within two minutes. A product team wants a near-real-time stream of structured business events into Kinesis. An ops team wants stack traces routed to an incident bot. Three consumers, one source: CloudWatch Logs, already holding the data. Getting logs out of CloudWatch isn't the problem (there are five ways). The work is figuring out which shape fits a line-at-a-time fan-out, which fits bulk archival, and where subscription filters earn their keep.

Read article 15 Sep 2027 · 15 min read

Exam Room · Operations

How to Monitor Service Quotas Before They Bind (preview)

A deploy fails at 03:14 because the account hit its `RunInstances` quota. A Lambda concurrency limit triggers throttles on a spike the team saw coming. An ops engineer filed a quota increase in the console and the ticket sat unassigned for three days. Quotas are soft limits the organisation should be steering; today they are hard walls discovered at 3 AM. Raising limits isn't the missing piece (AWS has an API for that). The work is choosing which limits to track, automating the request path, and pinning down where the responsibility for 'we're about to hit a ceiling' sits.

Read article 13 Sep 2027 · 17 min read

Exam Room · Operations

Picking a Scheduler on AWS (preview)

A nightly report that needs to fire at 02:00 local to each Region. A billing batch whose steps must run in sequence, with retries and branches. A per-customer reminder that should fire exactly 24 hours after signup. Three scheduling shapes, three very different tools, and a lot of code that reaches for cron on an EC2 instance because that's what the team knows. 'Which scheduler' isn't the right frame; the work is figuring out which scheduling shape each tool was actually built for, and what happens when you use the wrong one.

Read article 08 Sep 2027 · 17 min read

Exam Room · Operations

How to Catch Customer-Facing Outages With Synthetics Canaries (preview)

A checkout flow has gone down three times this year. Each time, the first page to notice has been Twitter. ELB health checks were green; CloudWatch metrics looked normal; the alarm that finally fired was disk-space on an unrelated host. More alarms isn't the missing piece, we have plenty. What's missing is a test that would have caught each failure before the customers did, and a way to run that test forever without paying someone to watch it.

Read article 06 Sep 2027 · 18 min read

Exam Room · Operations

Trusted Advisor vs Compute Optimizer (preview)

A new FinOps lead asks 'what's Trusted Advisor telling us?' Two hours later they ask the same question about Compute Optimizer. Both services sit in the console with 'save money' in their pitch, both read usage data, and both produce recommendations. They are not the same thing. Enabling both is almost certainly right; the work is knowing which class of advice each one gives, and how to stop the two services from contradicting each other on the same instance.

Read article 01 Sep 2027 · 17 min read

Exam Room · Operations

How to Roll Out Config Conformance Packs Across an Org (preview)

Twenty-two member accounts, forty compliance rules, and a security team that wants one report across all of it. Bolting Config rules on per-account is how the previous regime got to a spreadsheet that nobody trusted. Config is obviously the right tool; what's worth working through is what a conformance pack buys over the raw rules, where it lives in the org, and how remediation fits into something that was designed to only observe.

Read article 30 Aug 2027 · 17 min read

Exam Room · Operations

Run Command or Runbook (preview)

A one-shot patch across 400 instances. A four-step release pipeline with approvals. A nightly log-rotation job that needs to run everywhere. Three operational jobs, two Systems Manager features, and a lot of blog posts that treat them as interchangeable. Neither is 'better', they solve different problems, so the work is figuring out which shape of job belongs in Run Command, which belongs in an Automation runbook, and what we give up when we pick the wrong one.

Read article 25 Aug 2027 · 23 min read

Exam Room · Operations

How to Warm Up and Drain Instances With Lifecycle Hooks (preview)

A fleet behind an ALB scales on CPU. When new instances launch they take traffic instantly, before their JVM has settled, before their caches are warm, before Consul has a chance to register them, and users see 2-3 minutes of elevated errors. When instances terminate they lose whatever is in-flight and drop whatever is queued. Two symptoms, one missing mechanism: a pause on the way into service and a pause on the way out, held open by Auto Scaling lifecycle hooks while the application does the work the schedule didn't allow for.

Read article 23 Aug 2027 · 18 min read

Exam Room · Operations

How to Quiet Noisy Pagers With Composite Alarms (preview)

CPU spikes at 02:00 every night because the nightly backup does what nightly backups do. Latency spikes when the downstream payments API has a wobble. Error rate occasionally prints a one-off 5xx for reasons nobody can ever reproduce. Each signal alone pages the on-call. The on-call stops answering. A real incident. CPU climbing, latency climbing, errors climbing, arrives and looks like yet another cry-wolf. The fix is two layers: adaptive per-metric alarms that learn the service's daily pattern, and a composite alarm that only pages when all three agree.

Read article 18 Aug 2027 · 16 min read

Exam Room · Operations

How to Replace SSH Bastions With Session Manager (preview)

Three bastion hosts, shared SSH keys rotated quarterly, CloudTrail that records a login but not a keystroke. Security wants all of it gone: no inbound SSH, no shared secrets, per-engineer shell audit with input and output captured. Session Manager is the AWS-native answer, once you know which IAM role sits on the instance, which on the engineer, which document shapes the session, and which SCP keeps the whole org honest.

Read article 16 Aug 2027 · 17 min read

Exam Room · Operations

How to Patch a Fleet With Systems Manager Patch Manager (preview)

800 EC2s across four regions, 150 on-prem VMs in two data centres, three operating systems, three engineers, and a SOC2 auditor. SSH-ing into 950 hosts isn't a plan. Systems Manager Patch Manager is, knowing which pieces do which job is the skill.

Read article 11 Aug 2027 · 16 min read

Exam Room · Operations

How to Triage a 5xx Spike With CloudWatch Logs Insights (preview)

The pager goes at 03:00 for elevated 5xx on a public ALB. Logs are already in CloudWatch. The question is which path, which target group, which clients, and whether the spike correlates with anything — answered in minutes, from the console, without standing anything new up. CloudWatch Logs Insights is the tool the scenario is pointing at; getting the query correct is the skill it's testing.

Read article 26 Jul 2027 · 20 min read

Exam Room · Development

Lambda URL or HTTP API (preview)

A single webhook receiver that needs an HTTPS URL, nothing more. A small service with three routes that could do without API Gateway's feature list. A pair of endpoints with different authorisers, rate limits, and Lambda targets. Three different service shapes, two very similar AWS surfaces, and enough overlap that the wrong choice costs a little money today and a lot of migration work next year. Both options are cheap; what decides it is which fits the shape, and when 'just a URL for a Lambda' is genuinely enough.

Read article 21 Jul 2027 · 21 min read

Exam Room · Development

Choosing an API Gateway Authoriser (preview)

A public API open to the internet for a consumer-facing mobile app. A partner API behind an API key with usage plans. An internal API reachable only by other AWS services in the same account. A machine-to-machine API where a bank's back-office calls ours with a signed JWT. Four authorisation shapes, a handful of API Gateway authoriser types, and surprisingly different trade-offs depending on which mix lands on the same gateway. Different authorisers are right for different callers; the work is matching authoriser to caller profile, and seeing why IAM-auth isn't quite what the phrase suggests.

Read article 19 Jul 2027 · 25 min read

Exam Room · Development

How to Add Sidecars to an ECS Task (preview)

An application container that needs structured logs shipped somewhere other than stdout. A service-mesh proxy that must be co-located with the app to handle mTLS. A secrets-injection helper that reads from Parameter Store and writes to a shared volume. Three common sidecars, one ECS task definition, and a handful of primitives, container definitions, dependsOn, essential, shared volumes, link modes, that decide whether the task behaves the way you expect. Sidecars work fine on ECS; the work is knowing which primitives hold the pattern together, and which corner cases bite.

Read article 14 Jul 2027 · 13 min read

Exam Room · Development

How to Filter SNS Messages per Subscription (preview)

A single SNS topic carrying every event the platform emits. A subscriber that cares only about high-priority customer events. Another subscriber that needs events for one specific region. A third that wants everything except test traffic. The naive answer is a topic per filter; the better answer is one topic with SNS message filter policies, and the details of attribute-based matching, message-body filtering, and the 100-subscription limit repay a careful read before you commit.

Read article 12 Jul 2027 · 15 min read

Exam Room · Development

How to Handle Errors in a Step Functions Workflow (preview)

A state machine that calls a flaky third-party API. A workflow that must roll back partial work if a later step fails. A long-running process whose failures should reach a human reviewer rather than vanish into logs. Three failure shapes, one state machine language, and a set of error-handling primitives worth walking through before relying on them. Step Functions can catch errors, the work is figuring out which primitive handles which shape, and what the workflow would have to look like to stop thinking about errors.

Read article 07 Jul 2027 · 13 min read

Exam Room · Development

CloudFront Signed URLs or Signed Cookies (preview)

An e-book download link that should work once, for one customer, for ten minutes. A premium video library where the whole archive should open up to paying subscribers but stay closed to everyone else. A firmware update a device fetches directly by serial number over HTTPS. Three access-control shapes on private content served through CloudFront, and two different mechanisms for authorising them. Both are secure; the work is matching the shape to the mechanism, and seeing what the consumer would have to look like in each case.

Read article 05 Jul 2027 · 19 min read

Exam Room · Development

How to Host Private Packages in CodeArtifact (preview)

An internal TypeScript library that four services already depend on. A Python package the data team wants to share. Every build pulling open-source dependencies straight from npmjs.com and PyPI, trusting whatever got published last. A compliance deadline says nothing goes through unvetted public registries. CodeArtifact is AWS's answer, and the clean pattern, upstream proxy plus internal-publish repository, covers more than most teams first realise.

Read article 30 Jun 2027 · 16 min read

Exam Room · Development

How to Extend CloudFormation with Custom Resources or Constructs (preview)

A CloudFormation stack that needs to provision a third-party SaaS resource on every deploy. A CDK team that wants 'our standard service' to become a single line in every new stack. A resource AWS doesn't model natively but we need to manage with the rest of our infrastructure. Three problems, two mechanisms, and a lot of confusion about which is which. CloudFormation has to be extended either way; the work is sorting out which tool is doing which job, because they solve genuinely different things.

Read article 28 Jun 2027 · 14 min read

Exam Room · Development

How to Pick Between Env Vars, Parameters, and Secrets (preview)

A Lambda that needs to know the name of its DynamoDB table. A Lambda that reads a feature-flag threshold that changes weekly. A Lambda that must authenticate to a third-party API with a key that rotates monthly. Three pieces of configuration, three different lifetimes, three different secrecy requirements. AWS gives us three places to put them and enough overlap that they get used interchangeably. Each option is correct for something; what matters is matching configuration to shape, and knowing what each choice costs.

Read article 23 Jun 2027 · 13 min read

Exam Room · Development

S3 Event Notifications or EventBridge (preview)

An image-processing pipeline that needs to kick off a thumbnail Lambda on every upload. A compliance service that cares about every object mutation, creates, deletes, replication events, lifecycle expirations, across fourteen buckets. S3 has two event delivery mechanisms that look almost identical at first and differ in ways that matter. Both are correct ways to wire up S3 events; what matters is which shape each fits, and what happens when you accidentally enable both.

Read article 22 Jun 2027 · 11 min read

The Greenbox Story · What Came Next

Coaching at Scale: Charlotte's Pattern (preview)

Charlotte has done this six times now. Help a team grow up, teach them the practices, watch them succeed, and move on. The longest engagement she's ever had is over. She opens her notebook to a fresh page.

Read article 21 Jun 2027 · 23 min read

Exam Room · Development

Rotating the Database Password (preview)

A Postgres master password that''s been the same since the project started. A compliance deadline that says ''rotate every 30 days, no exceptions.'' An application fleet of eighteen services that all read the credential at startup. Rotation isn''t in doubt, it has to happen, so the work is rotating without taking the fleet down, and understanding why Secrets Manager''s rotation Lambda works the way it does.

Read article 16 Jun 2027 · 17 min read

Exam Room · Development

AppSync or Apollo on Lambda (preview)

A mobile client team that wants one endpoint and a strong schema. A backend team whose data is spread across DynamoDB, Aurora, and three internal HTTP services. A requirement for real-time subscriptions when a record changes. The shape calls for GraphQL; the question is whether to lean on AppSync's managed resolvers or a Lambda running Apollo behind API Gateway. Both can serve the same query; they pay very different tolls.

Read article 15 Jun 2027 · 12 min read

The Greenbox Story · What Came Next

Knowledge Transfer: Three Generations (preview)

Dave Morrison's family has farmed the same land since 1962. His son Ben runs the Greenbox farm programme. His grandson helps stack crates on Saturday mornings. The 'local' promise that started with a handshake now has a corporate contract behind it.

Read article 14 Jun 2027 · 16 min read

Exam Room · Development

EventBridge, SQS, or SNS for Fan-Out (preview)

A payment-captured event that needs to update four internal services. An order-placed event that kicks off five independent downstream workflows. A deeply integrated SaaS that emits events we want to route by schema. Three fan-out shapes, three services with overlapping jobs, and enough similarities that it's easy to pick the wrong one. None of the three services is best on its own, each is best for something, so the work is matching each fan-out shape to a service, and seeing what the consumers would have to look like in each case.

Read article 09 Jun 2027 · 19 min read

Exam Room · Development

How to Trace Distributed Requests with X-Ray (preview)

A p99 latency that climbed from 200ms to 2 seconds last week, nobody knows where. Logs in fourteen different CloudWatch log groups, each with its own request ID scheme. An API call that passes through a Lambda, a Step Functions state machine, two DynamoDB tables, and a third-party HTTP vendor before it returns. The problem is not a lack of data, the problem is that the data is scattered. Tracing is the obvious move; the work is in which pieces of the request chain carry a correlation ID forward natively, which ones drop it, and what the application code has to do to make the gaps close.

Read article 08 Jun 2027 · 10 min read

The Greenbox Story · What Came Next

Engineering Leadership: Tom in a Suit (preview)

Tom's title is now 'VP Engineering, Greenbox Division, Hartland Group.' His calendar has a recurring meeting called 'Change Advisory Board.' He misses the days when deploying meant SSH-ing into a server from his laptop.

Read article 07 Jun 2027 · 15 min read

Exam Room · Development

How to Shape API Gateway Traffic with Throttles, Quotas, and Usage Plans (preview)

A public API whose downstream can't survive a burst. A free tier that should top out at 1,000 calls per day without turning paying customers away. A partner integration whose contract says a thousand requests per second. Three traffic-shaping jobs, three different knobs in API Gateway, and enough overlap that it's easy to configure the wrong one. There isn't one 'the' throttle, there are at least four, so the work is figuring out which knob protects what, and what the traffic would have to look like to reach each one.

Read article 02 Jun 2027 · 16 min read

Exam Room · Development

Cache-Aside or Write-Through (preview)

A product-catalogue API returning the same items millions of times, an inventory counter the warehouse is constantly adjusting, and a leaderboard that must be strictly consistent on read. Three workloads, three read-write shapes, and four different caching patterns to pair them with. Each pattern is best for something; the harder call is which one matches which read-write shape, and what the application would have to tolerate to pick each one.

Read article 01 Jun 2027 · 7 min read

The Greenbox Story · What Came Next

Founder Transition: Maya's Next Thing (preview)

Six months after the partnership closed, Maya is sitting in a different cafe, with a different notebook, and a familiar itch. The produce box was never really about vegetables.

Read article 31 May 2027 · 19 min read

Exam Room · Development

SAM, CDK, or Serverless Framework (preview)

A Lambda-heavy microservice that needs to ship today, a platform team deploying a hundred services next quarter, and a multi-cloud greenfield product trying to stay portable. Three ownership shapes, three different definitions of 'infrastructure as code', and three tools that each solve a real problem. Each one is best for something; what matters is which tool matches which ownership shape, and what the team would have to believe about the future to pick each one.

Read article 29 May 2027 · 18 min read

The Greenbox Story · New Ground

CoPs Across the Merge (preview)

Hamish walks into Thursday in July, sits through forty minutes, and leaves feeling like he has trespassed. Priya figures out, sitting on a step outside the office, why the original Thursday is not a thing anyone can import, and what can be, instead.

Read article 26 May 2027 · 15 min read

Exam Room · Development

Step Functions Standard vs Express (preview)

A fintech runs two Step Functions workloads on the same flavour and wonders why the bill is what it is. One is an IoT data-processing flow firing about 800 times a second, short, idempotent, nobody looks at it after 24 hours. The other is an account-opening workflow that waits days for human approvals, survives weeks of KYC back-and-forth, and must be inspectable by auditors long after it finishes. One wants Express. One wants Standard. Picking the correct flavour per workload is a two-order-of-magnitude cost decision and a very different durability contract.

Read article 25 May 2027 · 19 min read

The Greenbox Story · New Ground

Post-Merger Integration: Where Acquisitions Go to Die (preview)

Greenbox bought Harvest Box for $800K and got 3,000 Sydney subscribers, a codebase nobody understood, and eight employees who weren't sure they wanted to stay. Six months later, half the team had left, the systems were still running in parallel, and the team was learning why 70% of acquisitions fail to deliver their expected value.

Read article 24 May 2027 · 14 min read

Exam Room · Development

DynamoDB Streams for Reliable Fan-Out (preview)

A deposit is recorded in DynamoDB. The application then writes an audit entry to S3, pushes a document to OpenSearch, and emits an event to EventBridge for the fraud team. Most of the time, all four places agree. Occasionally one of the three followers silently fails, the balance stands alone, and three days later a reconciliation job reports a ledger that says one thing and an audit log that says another. The fix is to stop doing the fan-out in application code and let DynamoDB itself be the source of the downstream stream.

Read article 22 May 2027 · 13 min read

The Greenbox Story · New Ground

Meeting Tax and Maker Time (preview)

Tom has five hours of focus time in a working week. Priya has four. Ifeoma has three. There is no meeting in which to discuss the meeting problem, which is, it turns out, the meeting problem.

Read article 19 May 2027 · 14 min read

Exam Room · Development

Cognito User Pools vs Identity Pools (preview)

A mobile-first social app needs email-and-password sign-up, Google and Apple login, direct-to-S3 photo uploads without a server in the middle, and different bucket prefixes for free-tier and paid-tier users. The team has heard of Cognito User Pools and Cognito Identity Pools and has been using the two names interchangeably. They aren't the same thing, they don't do the same job, and getting them confused is the difference between a working upload and a 403. One is about proving who you are. The other is about turning that proof into the AWS credentials that let you touch a bucket.

Read article 18 May 2027 · 23 min read

The Greenbox Story · New Ground

Two Maps, One Territory (preview)

The Greenbox–Hartland partnership is a week old and the first technical integration meeting is tomorrow. Greenbox engineering has a diagram. Hartland engineering has a diagram. Neither diagram knows the other one exists. Charlotte draws the thing she calls a System Landscape.

Read article 17 May 2027 · 14 min read

Exam Room · Development

How to Stop SQS Duplicate Processing with Visibility Timeout (preview)

A fulfilment Lambda reads an order off SQS, calls a slow partner API, and most of the time everything works. A few times a week the same order ships twice. The queue is on default settings, the function timeout is generous, and nothing in the logs looks unusual. The bug is the arithmetic between three numbers that almost never come up together in isolation, visibility timeout, function timeout, and how long processing actually takes.

Read article 12 May 2027 · 14 min read

Exam Room · Development

How to Fix DynamoDB Hot Partitions on a Leaderboard (preview)

A viral game sends every write for every player at one DynamoDB partition key. The table is provisioned for 30,000 WCU, only a fraction of that actually reaches the hot partition, and the application sees ProvisionedThroughputExceededException on writes it is entitled to. The answer is not a bigger provisioned number. It is a partition key that is no longer a single point.

Read article 11 May 2027 · 13 min read

The Greenbox Story · The Big Table

Post-Acquisition: After the Handshake (preview)

The Greenbox story ends where it began — at the Margaret River farmers' market. Same words, different meaning. Same people, different scale.

Read article 10 May 2027 · 15 min read

Exam Room · Development

Lambda Cold Starts Under a P99 Budget (preview)

A customer-facing API responds in 80 milliseconds once it's warm, and four full seconds on the first request after a quiet spell. The second trace is the one every user sees first. Hitting a P99 under 500 ms means touring every cold-start lever Lambda exposes, pricing the ones that survive, and accepting that the cheapest answer isn't always the correct one.

Read article 04 May 2027 · 13 min read

The Greenbox Story · The Big Table

Term Sheets and Earnouts: The Offer (preview)

The term sheet arrives. Earnout, retention bonuses, strategic partnership. Charlotte has been through this before. Tom is excited. Priya is worried. Dave is suspicious.

Read article 27 Apr 2027 · 19 min read

The Greenbox Story · The Big Table

Exit Strategy: The Founder's Dilemma (preview)

A major food group approaches. Maya faces the question every founder dreads: sell and walk away, or keep building? Nadia asks the only question that matters.

Read article 21 Apr 2027 · 19 min read

Exam Room · Foundations

Picking An AWS Support Plan (preview)

Three teams, three different relationships with AWS support. A developer with a quota question on a Monday morning. A payments platform that cannot be down for more than fifteen minutes. A bank-backed acquisition where the architect wants a phone number, not a web form. Same vendor, three different support plans, and a pricing model that rewards picking the correct one. No plan is best in the abstract; what matters is what each team needs when the pager fires, and what they're quietly paying for the silence between fires.

Read article 20 Apr 2027 · 20 min read

The Greenbox Story · The Big Table

Due Diligence: Every Decision Under a Microscope (preview)

Whether Greenbox expands internationally or considers selling, every decision they've ever made is about to be examined. The ADRs save them. The handshake deals don't.

Read article 19 Apr 2027 · 20 min read

Exam Room · Foundations

How to Run a Well-Architected Review Across Six Pillars (preview)

An architecture review on a Friday afternoon. One service, six people around the whiteboard, and the quiet realisation that the question 'is this a good design?' has at least six separate answers. AWS's Well-Architected Framework gives those answers names. Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and Sustainability, and turns a vibes-based review into a systematic one. 'Is the design good?' isn't quite the right question; 'good at which things, and what are we trading to get there?' is.

Read article 14 Apr 2027 · 22 min read

Exam Room · Foundations

How to Apply the AWS Shared Responsibility Model (preview)

A CVE drops in glibc. A misconfigured S3 bucket leaks customer data. A hypervisor in Dublin needs firmware. Three incidents, three very different pagers going off, and a shared responsibility model that draws a clean line between what AWS owns and what we own. Whether AWS is secure isn't really the question; which half of the contract we're holding up, and whether the service we picked moved the line, is.

Read article 13 Apr 2027 · 13 min read

The Greenbox Story · New Ground

Data Residency: Where the Bytes Live (preview)

When Greenbox acquires a New Zealand company, a problem appears that nobody had to think about when the business was only in Australia. Where is the data legally allowed to live? Who owns it? And what changes when subscribers cross a border?

Read article 12 Apr 2027 · 17 min read

Exam Room · Foundations

How to Match EC2 Pricing Models to Workload Rhythm (preview)

A 24/7 web tier whose load is boringly predictable. A nightly ETL that can survive being killed and restarted. A dev fleet nobody can forecast. Three workloads, three rhythms, and five different ways AWS will sell us a compute hour. Every option is cheapest for something; what we want to know is which rhythm pairs with which pricing model, and what we'd have to believe about the future to pick each one.

Read article 07 Apr 2027 · 17 min read

Exam Room · Foundations

How to Stay Inside the AWS Free Tier (preview)

A student wants to host a small web app on AWS for under $5/month and keeps reading stories of people getting surprise bills. 'AWS Free Tier' names three different programmes with three different expiry rules. Telling them apart is how you avoid the bill.

Read article 06 Apr 2027 · 25 min read

The Greenbox Story · New Ground

Platform Engineering: When Three Cities Break Everything (preview)

The technology Tom and Priya built for three cities hits its limits at six. Deploy queues, flaky tests, Conway's Law, and the first SRE hire.

Read article 30 Mar 2027 · 19 min read

The Greenbox Story · New Ground

One Diagram, Two Countries, Three Regions (preview)

The container diagram tells you what Greenbox is. It does not tell you where Greenbox lives. The day after agreeing to acquire Harvest Circle, Sam asks a question that the existing diagrams have no answer for — and the team discovers a C4 view they've never drawn.

Read article 23 Mar 2027 · 11 min read

The Greenbox Story · New Ground

Crossing the Ditch: Greenbox Goes to New Zealand (preview)

Greenbox is in three Australian cities. The Series B mandate was to expand, and the team has been debating whether the next step is a fourth Australian city or something bigger. A phone call from an Auckland founder changes the shape of the decision.

Read article 22 Mar 2027 · 14 min read

Exam Room · Advanced Architecture

How to Filter Egress at Organisation Scale (preview)

Forty thousand EC2 instances across two hundred VPCs in forty accounts. Today each workload speaks to the internet through local NAT; tomorrow a compromised container could exfiltrate data to anywhere on the web and nobody would notice until the invoice. Egress filtering at scale isn't about turning on a firewall, it's about building one organisation-wide policy, routing every packet through it, scaling it past the single-TGW limit, and convincing forty teams the blast radius is worth it.

Read article 17 Mar 2027 · 13 min read

Exam Room · Advanced Architecture

How to Build Change Feeds Across Regions (preview)

A DynamoDB table holds order state in eu-west-1. A warehouse-management service in us-east-1 and an analytics pipeline in ap-southeast-2 both need to react to every write. Global Tables solve 'the data exists in both Regions' but not 'Region B reacts to a change.' DynamoDB Streams plus Kinesis Data Streams plus cross-region delivery is the native answer, and knowing the shape lets us decide when Global Tables suffices and when we need the streaming overlay.

Read article 16 Mar 2027 · 22 min read

The Greenbox Story · New Ground

Scaling Culture: When Half the Team Has Never Met You (preview)

Fifty-five people across six cities. Values posters on walls. People who've never met Maya. How do you maintain culture when you can't fit everyone in one room?

Read article 15 Mar 2027 · 14 min read

Exam Room · Advanced Architecture

How to Meet Data Residency Without a Local Region (preview)

A financial regulator in a small country requires that all customer data physically reside inside the country's borders. AWS doesn't operate a Region there. Capex-scale on-prem hardware would work but the business doesn't want to run a data centre. Picking which AWS-branded box gets dropped into the country is the surface decision; underneath is what 'data residency' actually means in the regulator's eyes, which AWS extension models meet that bar, and what each one quietly gives up in exchange for putting compute close to the regulated jurisdiction.

Read article 10 Mar 2027 · 14 min read

Exam Room · Advanced Architecture

Choosing Wavelength, Local Zones, or Outposts at the Edge (preview)

A factory-floor AR application needs sub-10ms round-trip latency between a tablet and its inference backend. The parent Region is 50ms away; the factory's uplink is 5G. AWS Wavelength sits compute inside the carrier's 5G network, one hop from the tablet's radio. Knowing when it earns its weight, and when a nearby AWS Local Zone or Outpost does the same job for less, is the difference between ultra-low-latency and ultra-pricey-but-not-faster.

Read article 09 Mar 2027 · 23 min read

The Greenbox Story · New Ground

Acquisition Strategy: When Building Takes Too Long (preview)

Greenbox buys a smaller produce box company in Sydney. Integration is harder than anyone expected. Half the Sydney team quits in three months.

Read article 08 Mar 2027 · 14 min read

Exam Room · Advanced Architecture

KMS or CloudHSM for FIPS 140-2 Level 3 (preview)

A payments regulator asks for the cryptographic keys that sign card transactions to live in hardware rated FIPS 140-2 Level 3. KMS is 'hardware-backed'; is that enough? CloudHSM is 'single-tenant hardware'; is that necessary? The answer sits in the gap between what each service lets you do, what FIPS certifies, and what a specific regulator will accept. Getting it right means knowing when KMS is sufficient and when CloudHSM earns its weight.

Read article 03 Mar 2027 · 14 min read

Exam Room · Advanced Architecture

Choosing a Service Mesh on EKS (preview)

Forty microservices in an EKS cluster. Three teams have added their own Istio sidecars; one team has installed Linkerd; the platform team has a half-working AWS App Mesh deployment from two years ago. The CTO wants one mesh or none. Choosing is less about 'which mesh is best' than about what problems each solves, which of them the team is willing to run, and whether the answer should be a service mesh at all or something thinner.

Read article 02 Mar 2027 · 19 min read

The Greenbox Story · Going National

Supply Chain Design: The Fifty-Kilometre Promise (preview)

Regional Australia doesn't have the farm density of Perth. The fifty-kilometre local promise can't work everywhere, and the compromise tests everything Maya believes in.

Read article 01 Mar 2027 · 14 min read

Exam Room · Advanced Architecture

When to Put AWS Outposts on the Warehouse Floor (preview)

A fulfilment centre in a town with bad internet has to run warehouse management software with sub-10ms latency to the robots on the floor. Cloud isn't an option, the fibre link drops four times a month. On-prem hardware works but the team doesn't want another fleet to patch. AWS Outposts is the third answer: AWS-managed hardware sitting in the warehouse, running the same services as the cloud, connected back to the parent Region. Knowing what Outposts can and cannot do is how we avoid a million-pound rack collecting dust.

Read article 27 Feb 2027 · 16 min read

The Greenbox Story · Going National

Three CoPs, Two Survived (preview)

Patricia asks a question at the May board. Maya answers it by seeding three communities on purpose. One dies quietly in three weeks. Maya sits down to write Patricia an update in a numbered list, re-reads the list, and deletes it.

Read article 24 Feb 2027 · 13 min read

Exam Room · Advanced Architecture

How to Build Hybrid DNS with Route 53 Resolver (preview)

A Fargate service in AWS needs to resolve hostnames from the on-prem AD domain; an on-prem application needs to resolve internal AWS service endpoints. Today each side has its own DNS and they don't talk. The clean answer is Route 53 Resolver with inbound and outbound endpoints plus forwarding rules, one piece gets AWS-side queries to on-prem, the other gets on-prem queries to AWS. Knowing which endpoint points which way is what the design hinges on.

Read article 23 Feb 2027 · 19 min read

The Greenbox Story · Going National

Hiring at Scale: Beyond the Referral Network (preview)

Thirty-five people to fifty-five in two years. The culture that worked when everyone knew everyone doesn't work when half the company has never met Maya.

Read article 22 Feb 2027 · 15 min read

Exam Room · Advanced Architecture

How to Set Up Distributed Tracing Across Accounts (preview)

One customer request touches seven services in five accounts. When it times out, three teams look at three dashboards and disagree about whose fault it is. Distributed tracing is the obvious move; the harder part is how the trace follows a request across account boundaries, how spans get aggregated without each team reinventing the wheel, and what has to be true of the SDK, the sampling rate, and the IAM model for this to stop being a lunchtime-every-day conversation.

Read article 20 Feb 2027 · 11 min read

The Greenbox Story · Bets That Learn

Who Owns This on a Tuesday at 3 a.m. (preview)

The pause-risk model has been in full production for two weeks. Dina's agreement rate has slipped four points. The pager has not gone off, which is the problem. The team is going to have to learn to page themselves.

Read article 17 Feb 2027 · 15 min read

Exam Room · Advanced Architecture

How to Plan On-Prem DR with Elastic Disaster Recovery (preview)

A data-centre full of VMware VMs, the on-prem half of the estate nobody has refactored yet, needs a disaster recovery plan that covers actual data-centre fires, not just cloud-Region outages. The traditional answer is 'replicate to another data-centre, hope the failover runbook works.' AWS Elastic Disaster Recovery is the cloud-native answer: block-level replication to AWS, minutes of RTO, tested by actually failing over. Understanding it means understanding the replication agent, the staging subnet, the launch template, and what separates DRS from Backup.

Read article 16 Feb 2027 · 20 min read

The Greenbox Story · Going National

Expansion Playbook: Scaling What Worked (preview)

Greenbox can't Event Storm their way into every new city from scratch. They need a repeatable expansion playbook, and Adelaide is the first test.

Read article 15 Feb 2027 · 16 min read

Exam Room · Advanced Architecture

How to Aggregate Logs Across an Organisation (preview)

The auditor wants a single place to query 'who accessed the payroll database' across twenty accounts. The SOC wants alerts on a failed login from any account in under a minute. The data team wants the same logs for anomaly detection without paying for them twice. Aggregating logs across an organisation is less about shipping bytes than about deciding which log goes through CloudTrail Organization Trail, which through central CloudWatch subscriptions, which through S3 Athena queries, and which through an observability platform sitting on top.

Read article 10 Feb 2027 · 14 min read

Exam Room · Advanced Architecture

Choosing Private Connectivity Between VPCs (preview)

Two teams in the same organisation want to call each other's services. Today they do it over the public internet, through NAT, TLS, and each other's ALBs. It works. It's also logging into Flow Logs as 'traffic to a well-known cloud-provider IP range' and the security team has views. The options for keeping internal calls off the public internet, private endpoints, VPC peering, transit hubs, service meshes, each trade privacy against coupling differently. Once you understand who initiates, who sees whom, and who allow-lists whom, the picture clicks into place.

Read article 09 Feb 2027 · 20 min read

The Greenbox Story · Building the Company

Series B Fundraising: The Pitch (preview)

Maya presents to Series B investors. They ask questions she can answer, questions she can't, and one question that changes everything.

Read article 08 Feb 2027 · 15 min read

Exam Room · Advanced Architecture

How to Deploy One Template Across Many Accounts with StackSets (preview)

A security baseline stack. GuardDuty, Config, CloudTrail, a handful of IAM roles, has to exist in every account in the organisation. Twenty accounts today, one per week for the foreseeable future. Clicking through twenty CloudFormation deploys is how you get drift; CloudFormation StackSets promises to do it once and have the organisation maintain it. The promise is real; the details, service-managed vs self-managed, stack instance targets, drift detection, the dreaded failed-deployment rollback, are where it earns the Pro.

Read article 06 Feb 2027 · 16 min read

The Greenbox Story · Building the Company

Founder Commitment: Skin in the Game (preview)

Maya remortgaged her apartment. Tom deferred his student loan repayments. Sam took a 60% pay cut. Before the seed round, Greenbox ran on personal risk and a spreadsheet that Maya checked every morning at 5am.

Read article 05 Feb 2027 · 11 min read

Consulting and Craft · In Practice

Directional, Not Definitive (preview)

Most teams running A/B tests don't have the traffic to do them properly. The honest answer isn't 'don't test'. It's knowing what your test can and can't tell you, and picking techniques that match the evidence you can actually collect.

Read article 04 Feb 2027 · 18 min read

Under the Hood

How Multi-Factor Authentication Works: From Passwords to Time-Based One-Time Passwords (preview)

A warm, thorough walk through authentication, from single-factor passwords to TOTP codes, the maths behind them, why they can be phished, and what comes next.

Read article 03 Feb 2027 · 16 min read

Exam Room · Advanced Architecture

How to Build Self-Service Provisioning with AWS Service Catalog (preview)

Product teams want new RDS databases on demand. Security wants every database encrypted, tagged, in the right subnet, with the right backup plan. Platform wants the request to stop coming through Slack. A self-service portal with vetted building blocks sits between all three. AWS Service Catalog is the shape, once you know what's a Product, what's a Portfolio, what a launch constraint does, and why StackSets are the deployment engine underneath.

Read article 02 Feb 2027 · 9 min read

The Greenbox Story · Building the Company

Sustainability Reporting: What the Board Asked For (preview)

Three days before the Series B pitch, Diane raises a question Maya has been avoiding. The board needs a sustainability story — and the story has to be true. Greenbox has been measuring everything except the thing the whole business is supposed to be about.

Read article 01 Feb 2027 · 16 min read

Exam Room · Advanced Architecture

How to Roll Out AWS Backup Across an Organisation (preview)

Twenty member accounts. Each team owns its own data, its own retention policy, and its own idea of what 'backed up' means. The auditor wants one number for every resource: when was it last backed up, where does the backup live, and who can delete it. AWS Backup policies through Organizations are the answer, if you understand what they enforce, what they don't, and where the SCPs have to pick up the slack.

Read article 29 Jan 2027 · 9 min read

High Performance Teams

Two Deploys to Ship One Feature (preview)

The first time I noticed it, I thought I was looking at a coincidence. The pattern repeated for a quarter. By then it had a name.

Read article 28 Jan 2027 · 39 min read

Under the Hood

A Guide to DNS: How the Internet Finds Anything by Name (preview)

A thorough, narrative walk through the Domain Name System, from the single text file that used to hold every hostname on the internet, through the distributed hierarchy of root servers and resolvers, to the security nightmares of cache poisoning and the privacy promises of encrypted DNS. Everything you wanted to know about how your browser turns a name into an address.

Read article 27 Jan 2027 · 14 min read

Exam Room · Advanced Architecture

Choosing Between Security Groups, NACLs, and Network Firewall (preview)

A security auditor sits down with the VPC diagram and asks what stops an instance in the web tier from talking to the payroll database. Three answers are on the table: the security group attached to the database, the NACL on its subnet, and the Network Firewall in the egress VPC. They all say 'no'. They say it differently, they fail differently, and the real work is picking which one carries the real policy rather than being the belt-and-braces on something else.

Read article 26 Jan 2027 · 15 min read

The Greenbox Story · Building the Company

Governance Evolution: How the Board Grew Up (preview)

The first board meeting was Maya, Tom, and Angela at a cafe in Fremantle. The most recent one had an independent director, a Series B investor, a governance framework, and a 47-page board pack. The distance between those two meetings is the distance between a startup and a company.

Read article 25 Jan 2027 · 15 min read

Exam Room · Advanced Architecture

How to Set Up S3 Cross-Region Replication for Compliance (preview)

The regulator has told us customer records must be stored in two jurisdictions, the auditor has told us deletes must be irrecoverable for seven years, and finance has told us the Glacier bill is already too high. S3 Cross-Region Replication sounds like the answer to the first problem. It usually is, once you understand what it does and doesn't replicate, what happens to existing objects, and what it costs to undo a mistake.

Read article 22 Jan 2027 · 26 min read

The Workshop

The Workshop: The Planning Onion (preview)

A 90-minute pattern for making the layers of planning visible at once — strategy, portfolio, quarter, sprint, day — and finding the initiatives that live on one layer with nothing above or below to anchor them. With a vertical-slicing test that checks whether a sprint item can actually trace back to the strategy.

Read article 21 Jan 2027 · 22 min read

Under the Hood

How the Internet Routes Traffic (preview)

BGP, autonomous systems, peering, IXPs, the 2021 Facebook outage, how AWS routes your traffic, anycast, undersea cables, and traceroute: the infrastructure that decides how your packets get from here to there.

Read article 20 Jan 2027 · 14 min read

Exam Room · Advanced Architecture

Choosing Among the Four AWS DR Strategies (preview)

A regulator asks for a disaster-recovery plan. The CFO asks what it will cost. The CTO asks how often we'll test it. Three questions, one answer, and the answer depends on which of four DR strategies we pick. Backup and restore at one extreme; multi-site active-active at the other. Between them, pilot light and warm standby, each with a different daily cost, a different recovery time, and a different amount of infrastructure sitting idle.

Read article 19 Jan 2027 · 19 min read

The Greenbox Story · Building the Company

Succession Planning: Key Person Insurance (preview)

If Maya, Tom, or Dave disappeared tomorrow, what would break? The answer applies a 20% discount to the company's value.

Read article 18 Jan 2027 · 16 min read

Exam Room · Advanced Architecture

How to Centralise Egress for Many VPCs (preview)

Forty VPCs across twelve accounts, each with its own NAT gateway and its own outbound firewall rules. The bill for NAT alone is eye-watering, and every time a new SaaS domain needs allow-listing there are forty places to update it. Centralised egress promises one place to log, inspect, and pay, but it also promises a single choke point, a new hub account to run, and a Transit Gateway in the middle of every packet's life. The question is which boxes belong on the path and which don't.

Read article 15 Jan 2027 · 29 min read

The Workshop

The Workshop: Threat Modelling (preview)

A pattern for walking a system diagram with a deliberately hostile eye, using STRIDE as a checklist, so that the ways a system can be abused become visible before the people who'd like to abuse it find them.

Read article 14 Jan 2027 · 13 min read

Under the Hood

How WiFi Actually Works (preview)

WiFi is the most successful networking technology ever deployed, and most people — including most developers — have no idea how it works. It isn't just 'wireless ethernet.' It's an elaborate negotiation between devices sharing a medium they can't reserve, carried by radio waves on frequencies shared with your microwave.

Read article 13 Jan 2027 · 16 min read

Exam Room · Advanced Architecture

How to Design Multi-Region Failover Under a Minute (preview)

A payments API that has to keep taking traffic through a whole-Region outage, with a recovery-time objective of under a minute and zero planned data loss. That sentence sounds like a shopping list until you start pricing what each word costs. The real work isn't choosing active-active over active-passive; it's picking which of six data-replication shapes you can live with, and which latency, conflict, and cost you'll accept to get there.

Read article 12 Jan 2027 · 17 min read

The Greenbox Story · Building the Company

Board Governance: The Board Room (preview)

Informal governance got them to 12,000 subscribers. Series B investors want KPIs, board packs, and an independent director who asks uncomfortable questions.

Read article 11 Jan 2027 · 16 min read

Exam Room · Advanced Architecture

How to Build Ransomware-Proof Backups Across an Organisation (preview)

A financial services firm has a ransomware-resilience mandate with six absolutes: backups immutable against anyone with production credentials, stored in a separate account under separate IAM control, copied to a second region, retained for seven years, centrally managed across forty accounts, and auditable through a named reporting service. Each absolute, alone, has two or three plausible AWS answers. Taken together, only one arrangement survives contact with all six.

Read article 08 Jan 2027 · 26 min read

The Workshop

The Workshop: Cynefin Sensemaking (preview)

A 75-minute pattern for sorting a pile of problems into the kinds of actions they reward. Clear, complicated, complex, chaotic, confusion — and the realisation that disagreement about which domain a problem belongs in is often the problem itself.

Read article 07 Jan 2027 · 35 min read

Under the Hood

How TLS Works: From Trusted Networks to Trust-No-One (preview)

A thorough, approachable guide to TLS, from the first computer networks through to modern encryption. The history, the maths, the politics, and the practical reality of securing the web.

Read article 06 Jan 2027 · 13 min read

Exam Room · Advanced Architecture

Choosing PrivateLink for SaaS Vendor-to-Customer Connectivity (preview)

A SaaS vendor sells an observability product to enterprises who won't let production traffic touch the public internet, not even the vendor's HTTPS endpoint behind a WAF. The vendor runs the service from their own AWS account. Customers consume it from their own VPCs, in their own accounts, with overlapping RFC1918 space. Several AWS connectivity primitives could in principle solve a cross-account reach problem; the interesting work is filtering out the ones whose shape doesn't fit a vendor-to-customer relationship between separate companies.

Read article 05 Jan 2027 · 20 min read

The Greenbox Story · Building the Company

Go-to-Market: Selling Without Maya (preview)

Greenbox has never done outbound sales. The first sales hire speaks a language the engineering team doesn't understand.

Read article 04 Jan 2027 · 14 min read

Exam Room · Advanced Architecture

How to Share Hub-Account Resources Across Many Spokes (preview)

A platform with twenty-five accounts carves one out as shared-services and wants to put a Transit Gateway, Route 53 Resolver rules, customer-managed KMS keys, a Glue Data Catalog, and a tree of SSM parameters in it — visible to the other twenty-four. Four resources, one central account, and no single sharing mechanism that works for all of them. Pick the wrong primitive per resource and you end up either duplicating everything per spoke, or writing custom replication glue, or federating via a pile of resource policies that nobody audits. Pick the correct primitive per resource and the hub works.

Read article 23 Dec 2026 · 15 min read

Exam Room · Advanced Architecture

Preventing Lost Writes in DynamoDB Global Tables (preview)

A SaaS platform runs DynamoDB Global Tables across three regions. Users in all three read and write concurrently. Then a subscriber's update made in eu-west-1 vanishes — another write landed on the same item in us-east-1 a few milliseconds later, and replication's tie-breaker picked the other one. Multi-region active replication in DynamoDB is last-writer-wins by item timestamp, and the lost write is silent. This scenario is really about knowing when that's acceptable, when it isn't, and what patterns keep both writes alive.

Read article 22 Dec 2026 · 14 min read

The Greenbox Story · The Missing Layer

Cash Flow: The Bank Balance at 5am (preview)

Maya checks the bank balance every morning at 5am. She's done it since before the seed round. The number has grown from five figures to seven, but the anxiety hasn't scaled down with it.

Read article 21 Dec 2026 · 16 min read

Exam Room · Advanced Architecture

How IAM Policy Evaluation Actually Works (preview)

AdministratorAccess that can't create an access key, a Permissions Boundary that looks restrictive but isn't, and a cross-account Lambda write that dies despite both sides seeming to grant it. Three AccessDenied mysteries with one answer: AWS evaluates every request against up to six policy types, in a fixed order, starting from a default deny. Learn the order and the mysteries disappear.

Read article 19 Dec 2026 · 12 min read

The Greenbox Story · Bets That Learn

Shadow, Canary, Rollback (preview)

Six weeks of shadow mode. One canary week that stopped on the Wednesday. A rollback that took three minutes because it had been written in February by a woman who had decided that the next time, whatever it was, was not going to take them by surprise.

Read article 18 Dec 2026 · 7 min read

Consulting and Craft · In Practice

LLMs as Thinking Partners: How the Role Evolved (preview)

From code generator to thinking amplifier — how structured inputs transform LLM output, and why discovery techniques are the missing ingredient.

Read article 17 Dec 2026 · 17 min read

Under the Hood

How Randomness Works (preview)

Entropy, PRNGs, cryptographic randomness, the Mersenne Twister, /dev/urandom, and why the quality of your random numbers determines the security of everything from TLS keys to TOTP codes.

Read article 16 Dec 2026 · 14 min read

Exam Room · Advanced Architecture

Standing Up a Landing Zone for Eighty Accounts (preview)

A ten-year AWS customer with roughly eighty accounts has decided to formalise the multi-account foundation they should have built in 2018. Four engineers, six months, a prescriptive OU structure, centralised security tooling, guardrails, self-service provisioning, and SSO. Several foundations compete for the job; they differ on how much they ship out of the box, how much they expect the team to build, and how well they fit the regulatory shape of the work.

Read article 15 Dec 2026 · 17 min read

The Greenbox Story · The Missing Layer

Business Valuation: What Are We Worth? (preview)

The board wants a valuation. The answer depends on things Maya has never thought about — and one discount she doesn't want to hear.

Read article 14 Dec 2026 · 14 min read

Exam Room · Advanced Architecture

Choosing Aurora Global Database for Cross-Region DR (preview)

A 2 TB Aurora PostgreSQL fleet needs a cross-region copy with RPO under a minute and RTO under an hour, managed failover, and the secondary pulling low-priority reads during normal operations. AWS has several primitives that move database state across regions, with different RPO floors, different operational models, and different reasons to exist. The work is sorting which one matches each property the requirement actually demands.

Read article 13 Dec 2026 · 11 min read

The Greenbox Story · Bets That Learn

The Boring Middle (preview)

Three weeks into the data audit, Kai stops sleeping properly. The labels are a mess. The event log lies in two different ways. And the model that wins, when they finally run the bake-off, is the one Priya least wanted to ship.

Read article 12 Dec 2026 · 10 min read

The Greenbox Story · Bets That Learn

Framing Before the Model (preview)

Jas brings a bereavement case to the whiteboard and it ends the whole discussion about what goes on an at-risk list. The don't-call list gets built in pencil, in a notebook, in forty seconds, by a woman who has been writing about Thursdays since September.

Read article 11 Dec 2026 · 26 min read

The Workshop

The Workshop: Ensemble Programming (preview)

A pattern for putting the whole team on one problem at one keyboard, rotating roles on a short timer, so that knowledge spreads as the code is written and review happens at the moment of writing instead of the week after.

Read article 10 Dec 2026 · 19 min read

Under the Hood · Money

How Crypto Actually Works (preview)

Strip away the speculation, the scams, and the tribal wars, and cryptocurrency is a genuinely clever piece of computer science. Here's what it actually does, how it works, and what it solves, and doesn't.

Read article 09 Dec 2026 · 14 min read

Exam Room · Advanced Architecture

VPC Peering or Transit Gateway (preview)

Two VPCs that need to talk to each other is one peering connection. Fifty VPCs that need to talk to each other is one thousand two hundred and twenty-five. The arithmetic is what forces a different connectivity shape at scale, and at multi-region scale, the shape that works for fifty VPCs is just one rung on a ladder that keeps going.

Read article 08 Dec 2026 · 18 min read

The Greenbox Story · The Missing Layer

Strategic Focus: Saying No (preview)

The board wants a loyalty programme. Marcus wants a corporate gifting portal. A key customer wants a custom API integration. Maya has said yes to all three, and the engineering team has capacity for one.

Read article 07 Dec 2026 · 14 min read

Exam Room · Architecture

DynamoDB On-Demand or Provisioned Capacity (preview)

DynamoDB can bill per-request or by reserved capacity. Per-request scales to zero; provisioned locks in discount in exchange for commitment. At the switching point, one month the team pays more on-demand than they would have on provisioned, and the question gets asked. Cheaper depends entirely on the traffic shape, and what the knobs look like if you get the match wrong.

Read article 06 Dec 2026 · 7 min read

The Greenbox Story · The Missing Layer

Naming the Thursday Thing (preview)

Lee has a name for what Priya started in the kitchen five months ago. The name is older than Greenbox. Using it is more dangerous than not using it.

Read article 05 Dec 2026 · 7 min read

The Greenbox Story · Bets That Learn

Impact Mapping an ML Bet (preview)

Pause rate is drifting. Priya and Kai walk in with a three-week plan to build a model. Lee makes them sit with the goal for an hour. Two of the three things they meant to ship do not survive the hour.

Read article 04 Dec 2026 · 15 min read

The Workshop

The Workshop: Blameless Post-Mortems (preview)

A facilitator playbook for the meeting that turns an incident into a structural improvement. Five phases, three actions, no scapegoats. Pin the summary on the wall, keep the guide open on your tablet, and run the session.

Read article 03 Dec 2026 · 14 min read

Under the Hood

How Computer Networks Work (preview)

A warm, thorough walk through computer networking, from the first telegraph wire to cat gifs, from the OSI model to tcpdump, from trust to firewalls.

Read article 02 Dec 2026 · 15 min read

Exam Room · Architecture

How to Diversify a Spot Fleet to Survive Capacity Events (preview)

Spot Fleet gives up to 90% off on-demand, but the capacity comes and goes on AWS's terms. A fleet that all requests c6i.2xlarge in us-east-1a is a fleet that goes offline together. The whole point of a Spot Fleet is diversification: enough instance types in enough pools that when one pool gets reclaimed, the others absorb the load. The discount is the easy headline; the engineering question is how wide the diversification has to be before the fleet genuinely survives capacity events.

Read article 01 Dec 2026 · 18 min read

The Greenbox Story · The Missing Layer

Delegation: The Management Gap (preview)

Maya is the bottleneck again — but this time at company level. Charlotte says: you've built a product, now you need to build a company.

Read article 30 Nov 2026 · 15 min read

Exam Room · Architecture

Routing Many Sites Through One ALB (preview)

Four micro-frontends, three domains, and one shared ALB. Routing traffic to the correct target group is either a host-based decision, a path-based decision, or both. The correct rule order matters; the wrong one sends 'admin' requests to the marketing team. Host versus path is the wrong framing; the real work is how the two rule types compose, where priority comes in, and how to keep the rule table readable as the service catalog grows.

Read article 29 Nov 2026 · 14 min read

The Greenbox Story · Bets That Learn

Voice in the Weights, Voice in the Fire (preview)

A small model, fine-tuned on four years of Maya's handwritten box notes, drafts the weekly note for every subscriber. Week three, it sends the wrong sentence to the wrong person. The GPU bill turns out to be the cheapest thing about the project.

Read article 28 Nov 2026 · 11 min read

The Greenbox Story · What Growth Takes

The Thursday Thing (preview)

Three months after the allergen incident, Priya sends an invite with no name on it. Four people come. Twelve weeks later, the most useful meeting at Greenbox still doesn't have a name, and Priya is going to fight to keep it that way.

Read article 27 Nov 2026 · 13 min read

The Workshop

The Workshop: Which Workshop When (preview)

Sixteen techniques for discovery, design, and delivery. A flowchart and a categorised index for choosing the one that fits the problem in front of you. Pick the smallest workshop that solves the actual problem, not the biggest one your team knows how to run.

Read article 26 Nov 2026 · 22 min read

Under the Hood

How CI/CD Pipelines Work (preview)

From make and cron to GitHub Actions, continuous integration, continuous delivery, continuous deployment, artefacts, environments, feature flags, blue-green deployments, rollback, and the evolution from a shell script to a proper pipeline.

Read article 25 Nov 2026 · 24 min read

The Greenbox Story · What Growth Takes

Technical Migration: The Strangler Fig in Go (preview)

Tom's second migration attempt uses the strangler fig pattern. This is what the feature flags, dual reads, dual writes, and discrepancy detection look like in Go, the code that makes a database migration boring.

Read article 24 Nov 2026 · 18 min read

The Greenbox Story · What Growth Takes

Technical Migration: The One That Went Sideways (preview)

The subscription database needs to be split. Tom's plan is elegant. The migration starts on a Thursday afternoon, and by Friday morning, two thousand subscribers have duplicate billing records.

Read article 23 Nov 2026 · 17 min read

Exam Room · Architecture

Replacing a Windows File Server With FSx (preview)

An on-prem Windows file server holds 40 TB of departmental shares used by 600 staff. The hardware is end-of-life, the office is moving, and finance wants the capital refresh gone. AWS has four file-storage services that might replace it, and they're not interchangeable. None of them is 'the cloud file server'; the work is figuring out which of FSx for Windows, FSx for NetApp ONTAP, EFS, or File Gateway matches the access pattern, the protocol expectations, and the Active Directory footprint this shared drive actually has.

Read article 20 Nov 2026 · 19 min read

Consulting and Craft · In Practice

Written First: Making Async Work (preview)

A daily standup that worked at five people in one room becomes a war crime at twenty-eight people across three timezones. Better video calls aren't the fix. Writing things down first is.

Read article 19 Nov 2026 · 19 min read

Under the Hood · Money

How Central Banks Work (preview)

Central banks set interest rates, print money, and occasionally rescue entire economies. But how, exactly? The machinery is less mysterious than it sounds, and more powerful than most people realise.

Read article 18 Nov 2026 · 15 min read

Exam Room · Architecture

Detecting Drift With AWS Config (preview)

Somebody opened a security group to 0.0.0.0/0 on port 22 at 02:14 on a Saturday. The infrastructure pipeline would have rejected that change. The console did not. The team needs a mechanism that's independent of the pipeline, catches drift whether it came through Terraform or a console click, and, for a short list of specific drift types, reverses it automatically. The work is in which mechanism shape catches which drift, and how quickly a bad change gets reversed.

Read article 17 Nov 2026 · 17 min read

The Greenbox Story · What Growth Takes

Key Person Risk: The Quiet Departure (preview)

Ravi gives his two weeks' notice on a Tuesday morning. He's not angry, not burned out, not making a statement. He just got a better offer. And suddenly the subscription system has no owner.

Read article 16 Nov 2026 · 15 min read

Exam Room · Architecture

Configuring CloudFront Origin Failover for Regional Outages (preview)

CloudFront caches the hot stuff, but cache misses still have to reach an origin. When the origin is down, CloudFront returns 5xx to viewers. Origin failover is the setting that says 'try the backup origin first.' A backup origin is the easy decision; the work is in which failure conditions trigger failover, how quickly, and what 'the backup origin' should actually be.

Read article 13 Nov 2026 · 16 min read

High Performance Teams

Tools and Techniques for Improvement (preview)

You can assemble a brilliant team and measure all the correct things. If the team doesn't have a way to get better, the brilliance decays. Improvement is a practice, not an event.

Read article 12 Nov 2026 · 11 min read

Under the Hood

How Compression Works (preview)

Compression is the art of making data smaller without losing what matters. Every photograph you take, every video you stream, every web page you load is compressed. Understanding how it works — and the fundamental trade-offs behind it — is one of the quiet superpowers of computing.

Read article 11 Nov 2026 · 17 min read

Exam Room · Architecture

How to Migrate Three Petabytes Before the Lease Ends (preview)

Three petabytes of historical genomic data sit on a SAN in a leased datacentre. The lease expires in nine months. The pipe out of the building is 1 Gbps shared with everybody else. Snow Family, DataSync, and Storage Gateway each solve a different piece of this, but only when you're precise about what kind of movement you need. Picking a transfer service comes down to matching the shape of the data, the shape of the deadline, and the shape of the ongoing read pattern.

Read article 10 Nov 2026 · 7 min read

The Greenbox Story · What Growth Takes

The Planning Onion: From Weekly Habits to Yearly Vision (preview)

Three squads, three cities, great weekly habits — and no shared strategy connecting them. The planning layer that ties everything together.

Read article 09 Nov 2026 · 15 min read

Exam Room · Architecture

Instance Store or EBS for Scratch Disk (preview)

Instance Store is fast, local, and free with the instance, and it evaporates the moment the instance stops. EBS survives, snapshots, encrypts, and charges per GB-month. Picking between them for ephemeral data isn't about cost alone; it's about what 'ephemeral' actually means for the workload. Speed is the easy comparison; the harder one is which durability contract matches the data you're writing.

Read article 06 Nov 2026 · 5 min read

Consulting and Craft · In Practice

The Planning Onion: Every Layer in One Place (preview)

Vision, year, quarter, sprint, day, a planning framework with a different question, cadence, and toolkit at every time horizon.

Read article 05 Nov 2026 · 32 min read

Under the Hood

The Wonderful, Absurd Science of Colour (preview)

From ochre smeared on cave walls to pixels fired at your face by Apple Vision Pro: a tour of the science of seeing.

Read article 04 Nov 2026 · 15 min read

Exam Room · Architecture

ECS on Fargate or on EC2 (preview)

A team runs ECS today on EC2 instances they patch, size, and scale themselves. Fargate promises to take all of that away. But Fargate's per-hour pricing is higher, its networking model is different, and not every workload fits. The choice isn't Fargate or EC2 in the abstract, it's which launch type matches which workload, and where the crossover lines actually sit.

Read article 03 Nov 2026 · 15 min read

The Greenbox Story · What Growth Takes

Burnout: When the Backbone Cracks (preview)

Tom hasn't taken a day off in four months. He's still shipping code, still in every meeting, still the first to respond to a production alert. But the code he's writing isn't quite right any more, and he can't explain why.

Read article 02 Nov 2026 · 17 min read

Exam Room · Architecture

API Gateway or ALB in Front of Lambda (preview)

A Lambda function needs an HTTP front door. API Gateway is the textbook answer. An Application Load Balancer with a Lambda target is the less-textbook answer. They price differently, scale differently, authenticate differently, and they integrate with adjacent AWS services in different ways. Picking between them comes down to matching front-door behaviour to the shape of the workload.

Read article 30 Oct 2026 · 14 min read

High Performance Teams

Getting the Right Humans Together (preview)

The best team I ever worked on had a loudmouth, a worrier, a perfectionist, and someone who never stopped asking 'but why?' They drove each other mad. They shipped the best software of my career.

Read article 29 Oct 2026 · 15 min read

Under the Hood · Money

How Fraud Detection Works (preview)

Every card transaction is a bet: is this the real cardholder, or someone who stole their details? The system has about 100 milliseconds to decide, and getting it wrong in either direction costs money.

Read article 28 Oct 2026 · 16 min read

Exam Room · Architecture

Aligning ALB and Route 53 Health Checks for Regional Failover (preview)

Route 53 says the region is healthy and sends traffic there. The ALB in that region thinks every target is failing and returns 503. Users see errors. Both health-check systems were configured by different teams and nobody realised they disagree about what 'healthy' means. Both checks are correct on their own terms; the work is sorting out which layer protects you from which failure, and getting them to agree.

Read article 27 Oct 2026 · 14 min read

The Greenbox Story · Losing the Thread

Living Documentation: When the Wiki Starts Lying (preview)

The team wrote beautiful ADRs and meticulous decision tables. Six months later, a new developer follows one and breaks production.

Read article 26 Oct 2026 · 15 min read

Exam Room · Architecture

Designing Direct Connect Failover for Sub-Minute Recovery (preview)

A 10 Gbps Direct Connect into eu-west-1 carries the bulk of our on-prem-to-AWS traffic; a Site-to-Site VPN sits idle as a backup. The last outage proved the backup was aspirational. BGP didn't converge, half the traffic black-holed for four minutes, and the postmortem action item read 'make the failover actually work.' Having a backup isn't the debate; it's which of four hybrid-connectivity shapes gives us sub-minute failover without doubling the line-item cost.

Read article 23 Oct 2026 · 28 min read

The Workshop

The Workshop: Wardley Mapping (preview)

A pattern for laying the components of a value chain (the sequence of capabilities a user needs, top visible to the user, bottom foundational) against an evolution axis, so that build-buy-borrow arguments stop being about taste and start being about where each component actually sits.

Read article 22 Oct 2026 · 44 min read

Under the Hood

How Browsers Render a Page (preview)

You type a URL and a page appears. Between that keystroke and those pixels, your browser performs DNS resolution, TCP handshake, TLS negotiation, HTTP request, HTML parsing, CSS computation, layout calculation, paint, and compositing, and it has about 100 milliseconds to feel fast.

Read article 21 Oct 2026 · 16 min read

Exam Room · Architecture

How to Lock Down S3 for Seven-Year WORM Compliance (preview)

Compliance wants: write-once, read-many storage for seven years, across every environment, with immutability guaranteed against any credential, including the root account's, for the full retention. AWS has a small handful of mechanisms that claim some form of immutability; only some give the absolute guarantee the audit requirement actually demands. The work is sorting the absolute from the conditional, picking the retention shape, and deploying it so it stays compliant when someone accidentally rotates a bucket policy in year three.

Read article 20 Oct 2026 · 10 min read

The Greenbox Story · Losing the Thread

Continuous Discovery: Making It Stick (preview)

Discovery isn't a project phase — it's a weekly practice. How Greenbox embedded learning into the rhythm of delivery, and what Lee and Charlotte learned along the way.

Read article 19 Oct 2026 · 14 min read

Exam Room · Architecture

Picking an EBS Volume Type for Each Workload (preview)

A PostgreSQL database whose bottleneck is a steady 15,000 IOPS at 1 ms latency. A Cassandra cluster whose bottleneck is throughput, not IOPS. And a warehouse of CloudTrail logs that gets written once and read almost never. Three volumes, three workload profiles, and EBS has a family of volume types that each match one of them. 'Fastest' is the wrong axis; what matters is which dimension. IOPS, throughput, latency, or dollars per gigabyte, each volume type optimises for.

Read article 16 Oct 2026 · 14 min read

High Performance Teams

Defining and Measuring High Performance (preview)

Every team I've worked with wants to be high-performing. Almost none of them can tell me what that means.

Read article 15 Oct 2026 · 15 min read

Under the Hood · Money

How Payments Work (preview)

When you tap your card, a message crosses at least four organisations in under two seconds. None of them trust each other. Here's how the money actually moves.

Read article 14 Oct 2026 · 16 min read

Exam Room · Architecture

Backups That Survive the Region (preview)

Backups scattered across six services, each with its own schedule and retention. An audit question, 'prove you can recover any production resource, cross-Region, within 24 hours', that nobody can currently answer in fewer than three meetings. Which service has the best native backup isn't the problem; centralising backup policy across an organisation, copying to a second Region for durability, and ending up with one place to answer the audit question is.

Read article 13 Oct 2026 · 13 min read

The Greenbox Story · Losing the Thread

Outcome Over Output: The Feature Factory (preview)

The backlog has 340 items. The team shipped twelve features last quarter. Nobody can say whether any of them moved the needle.

Read article 12 Oct 2026 · 16 min read

Exam Room · Architecture

Composing Savings Plans Across a Portfolio (preview)

Fifteen workloads across four accounts and two Regions. A portfolio bill of three-quarters of a million dollars a year on compute. Finance has asked for a commitment plan. Cheapest-for-a-single-workload is the workload-level decision and it's already been made for the straightforward cases. The portfolio-level question is how to compose commitment shapes so the baseline is covered, the Spot-tolerant hours are cheap, and the flexibility budget hasn't been spent on something that can't bend.

Read article 09 Oct 2026 · 18 min read

Consulting and Craft · In Practice

Data-Informed, Not Data-Driven (preview)

'The data says we should do X' sounds rigorous. 'The data suggests X, what do we think?' is rigorous. The difference is the most important habit in modern decision-making.

Read article 08 Oct 2026 · 11 min read

Under the Hood

How HTTP Works (preview)

HTTP is the protocol the web is built on. It's a text-based request-response conversation between your browser and a server, and once you understand the conversation, everything else about the web starts making sense. Caching, cookies, APIs, REST, load balancers, CDNs: they're all ways of shaping or intercepting HTTP traffic.

Read article 07 Oct 2026 · 16 min read

Exam Room · Architecture

Choosing Between Aurora Replicas, RDS Read Replicas, and DMS (preview)

An Aurora writer pinned at 90% CPU because the reporting team's dashboards read live from production. A classic RDS MySQL with nightly batch jobs that read-lock the master. And a new multi-Region requirement for a PostgreSQL database that has to serve analytics in Sydney off a writer in Frankfurt. Three flavours of 'offload the reads'. Aurora replicas, RDS read replicas, and DMS, and they are not the same mechanism. Speed isn't what decides it; matching the replication technology to each workload's isolation and freshness requirement is.

Read article 06 Oct 2026 · 8 min read

The Greenbox Story · Losing the Thread

Subscribers Aren't Customers: Building the Greenbox Community (preview)

Greenbox has six thousand subscribers and a growing customer support backlog. Sam notices something odd in the data: the subscribers who churn the least are the ones who talk to each other. A conversation about community begins as a curiosity and ends as a strategy.

Read article 05 Oct 2026 · 14 min read

Exam Room · Architecture

SQS Standard or FIFO for Ordered Workloads (preview)

A click-tracking firehose that can drop duplicates at the app layer but must scale to 50,000 messages a second. A payment-state machine that must never process the same transaction twice and must apply updates in the order they were issued, per customer. Two workloads, one service called SQS, two flavours that are not interchangeable. 'Which is better' has no answer in the abstract; what does is what each flavour actually guarantees, what it costs, and what each workload can tolerate if the guarantee slips.

Read article 02 Oct 2026 · 30 min read

The Workshop

The Workshop: Retrospectives (preview)

A pattern for creating a structured, safe space to inspect how the team is working and decide one or two things to change, so that continuous improvement stops being a poster on the wall and starts being a thing that actually happens.

Read article 01 Oct 2026 · 49 min read

Under the Hood

How Maps Lie (preview)

Every map you've ever seen is wrong. It has to be: you can't flatten a sphere without distorting something. The question is what you choose to distort, and what that choice reveals about who made the map and why.

Read article 30 Sep 2026 · 15 min read

Exam Room · Architecture

ElastiCache: Redis or Memcached (preview)

Two caches on the same platform are asking to be decided between: a session store that has to survive a node reboot, a leaderboard that needs sorted sets, and a read-through cache for a recommendation API where any node can die without consequence. Redis and Memcached both show up as engine choices in ElastiCache. Microbenchmark speeds aren't the deciding factor here; what is, is which data model, durability story, and replication shape each workload actually needs.

Read article 29 Sep 2026 · 8 min read

The Greenbox Story · The Right Tool

Threat Modelling: What the LLM Didn't Think About (preview)

Payment data, customer PII, farm contracts. The more code LLMs generate, the more systematic security thinking matters.

Read article 28 Sep 2026 · 16 min read

Exam Room · Architecture

Choosing Between NAT Gateway, NAT Instance, and Egress-Only IGW (preview)

A fleet of private instances that needs outbound internet for package updates. An IPv6-only subnet that wants egress without advertising any IPv6 addresses to the world. And an accountant asking why a NAT Gateway bill has become one of the top line items. Three problems, three AWS components. NAT Gateway, NAT Instance, and egress-only Internet Gateway, and they are not the same thing despite sounding like they should be.

Read article 25 Sep 2026 · 6 min read

Consulting and Craft · In Practice

Retrospectives at Every Scale (preview)

Sprint retros, cross-team retros, quarterly retros, yearly retros — the feedback loop that keeps every planning layer honest.

Read article 24 Sep 2026 · 15 min read

Under the Hood · Money

What Money Actually Is (preview)

Everything you learned about money, the barter story, gold coins, the idea that it's a 'thing', is probably wrong. Money is a ledger, a promise, and a collective hallucination that runs the world.

Read article 23 Sep 2026 · 17 min read

Exam Room · Architecture

When Aurora Should Scale Itself (preview)

A SaaS tenant database with long-flat daytime load and sharp weekend spikes. A reporting replica that's idle 22 hours a day and pinned to 100% for two. A dev cluster that's on for an hour and off for twelve. Three Aurora clusters, three very different load profiles, and two ways Aurora can be sized: capacity you set, or capacity Aurora sets for you. Cheaper-in-the-abstract is a meaningless answer; what counts is which load profile each mode is shaped for, and where the cost crossover actually lands.

Read article 22 Sep 2026 · 7 min read

The Greenbox Story · The Right Tool

Cynefin: Not Everything Needs a Workshop (preview)

Simple, complicated, complex, chaotic — the framework that tells you which approach to use and when to stop over-thinking.

Read article 21 Sep 2026 · 16 min read

Exam Room · Architecture

Choosing Between EFS and the FSx Family for Shared Storage (preview)

A Linux web farm that needs a shared filesystem for user uploads. A Windows application tier that expects SMB and Active Directory. A genomics pipeline that wants to read the same 80 TB dataset from thousands of Lambda invocations. And a small team on macOS with hand-edited NFS mounts. One problem, four protocols, and a catalogue of services that can each solve some but not all of it. 'EFS or FSx' is the wrong framing; what matters is which filesystem protocol each workload actually speaks, and which AWS service speaks it natively.

Read article 18 Sep 2026 · 9 min read

Consulting and Craft · In Practice

Scaling Knowledge: From One Person's Head to Twenty-Five People's Practice (preview)

The hardest scaling problem isn't technical, it's knowledge. How domain knowledge moves from implicit to explicit to embedded, and which techniques unlock each step.

Read article 17 Sep 2026 · 36 min read

Under the Hood

Open Data: The Case for Sharing (preview)

The best datasets in the world are free. Government budgets, transport timetables, weather observations, property boundaries — published openly, used by millions, and funded by your taxes. Here's why open data matters, how to publish it well, and when you might choose not to.

Read article 16 Sep 2026 · 17 min read

Exam Room · Architecture

Choosing Between ALB, NLB, and Gateway Load Balancer (preview)

A public HTTPS API that needs path routing and WAF. A TCP service that must preserve the client IP down to the instance. A fleet of third-party firewall appliances that every packet leaving the VPC has to pass through. Three problems, three load balancers, and they are not interchangeable. None of them is 'best' in the abstract; what matters is which OSI layer each sits at, and what that layer lets you do that the others can't.

Read article 15 Sep 2026 · 9 min read

The Greenbox Story · The Right Tool

Ensemble Programming: The Team Navigates, the LLM Types (preview)

When the LLM does the typing, the whole team can focus on thinking. Mob programming reimagined for an AI-augmented world.

Read article 14 Sep 2026 · 16 min read

Exam Room · Architecture

CloudFront Caching for Mixed Static and Personal Content (preview)

A single domain serves a marketing shell that barely changes, a product catalogue that changes hourly, and a logged-in account area that's different for every visitor. One CloudFront distribution in front of all of it. Caching isn't in doubt; the work is letting CloudFront cache aggressively where it can while telling it, precisely, where it must not.

Read article 11 Sep 2026 · 20 min read

The Greenbox Story · Faster Together

Sam, Jas, and the Invisible Work (preview)

The Greenbox story is told through Maya's decisions and Tom's code. But a startup runs on the people who handle the work that doesn't make it into architecture diagrams or board packs.

Read article 10 Sep 2026 · 16 min read

Under the Hood · Trust

How Reputation Systems Work (preview)

eBay stars, Amazon reviews, Uber ratings, PageRank, every platform that connects strangers needs a way to manufacture trust from thin air. How do they do it? And why do all of them eventually break?

Read article 09 Sep 2026 · 17 min read

Exam Room · Architecture

Right-Sizing S3 Storage Classes Without Profiling (preview)

Forty terabytes of S3 across hundreds of buckets. Some datasets are read every morning; some are read once a quarter; some were written once and nobody has touched them since. The ops team has never run a per-bucket access profiler and isn't about to start. They want the bill down from $950 a month without hand-engineering a lifecycle rule per prefix. One S3 feature is designed for exactly that, and two others combine with it in the cases where it would waste money.

Read article 08 Sep 2026 · 21 min read

The Greenbox Story · Faster Together

Strategic Alignment: The Session That Changed the Roadmap (preview)

Anika inherits a stalled logistics partnership that Maya set up months ago. Her instinct is to surface every risk in a spreadsheet. Lee helps her see that the same information, delivered differently, builds trust instead of breaking it.

Read article 07 Sep 2026 · 17 min read

Exam Room · Architecture

Designing RDS for AZ Failover and Cross-Region Recovery (preview)

A SaaS team runs RDS PostgreSQL in a single AZ and wants two things at once: sub-minute failover when an AZ dies, and a recovery option in another region with under five minutes of data loss if the whole region goes. Neither problem has one clean answer on its own. The combination that does is a three-AZ cluster plus a replica waiting on the other side of the continent.

Read article 04 Sep 2026 · 11 min read

Consulting and Craft · Through the Kitchen

Mise en Place (preview)

Everything in its place before the first flame. Professional kitchens prep for hours before service. The ones that don't prep are the ones that burn the food.

Read article 03 Sep 2026 · 33 min read

Under the Hood

How OAuth Works: Delegating Access Without Sharing Secrets (preview)

You click 'Sign in with Google' and somehow a recipe app gets access to your calendar without ever seeing your password. OAuth is the protocol that makes this possible — and its history starts with a blog platform and a frustrated engineer.

Read article 02 Sep 2026 · 18 min read

Exam Room · Architecture

Auto-Scaling for Planned and Surprise Spikes (preview)

An e-commerce site has two kinds of spike to survive: Black Friday, which you can see coming months in advance, and the viral moment at 14:07 on a Tuesday, which you cannot. No single auto-scaling feature solves both without overspending the rest of the year. The correct answer layers four of them, and the interesting part is how they hand off.

Read article 01 Sep 2026 · 15 min read

The Greenbox Story · Faster Together

On-Call and Incident Response: When the Pager Goes Off (preview)

The allergen incident was a wake-up call. The question isn't whether something will go wrong again — it's whether the team will be ready when it does.

Read article 28 Aug 2026 · 15 min read

Consulting and Craft · In Practice

Code Is Read More Than It Is Written (preview)

The Law of Demeter says don't chain through objects you don't own. The real reason isn't about object-oriented purity, it's that order.Customer.Address.City is a sentence that forces the reader to hold four concepts in their head to understand one line of code.

Read article 27 Aug 2026 · 15 min read

Under the Hood · Trust

How Certificates Work (preview)

Every time your browser shows a padlock icon, a chain of trust is doing the work, from your bank's server, through intermediate authorities, up to a root certificate baked into your operating system. How that chain is built, why it sometimes breaks, and how Let's Encrypt changed everything.

Read article 25 Aug 2026 · 12 min read

The Greenbox Story · Faster Together

API Contracts: Two Squads, One Direction (preview)

Perth and Melbourne have their own backlogs, their own sprint rhythms, and their own priorities. They keep surprising each other with dependencies nobody saw coming.

Read article 21 Aug 2026 · 11 min read

Consulting and Craft · In Practice

The Language of Tests (preview)

Your test says 'should return 200.' RFC 2119 says 'should' means 'there may exist valid reasons to ignore this.' If it's actually a hard requirement, your test has a specification bug, and the LLM that wrote it doesn't know the difference.

Read article 20 Aug 2026 · 29 min read

Under the Hood

How Databases Actually Work (preview)

You write SQL and data appears. But between your query and the disk, there's a query planner, a buffer pool, a write-ahead log, and a concurrency control system that's been refined over forty years. PostgreSQL is the default for good reason — but knowing why helps you know when it isn't.

Read article 18 Aug 2026 · 10 min read

The Greenbox Story · Faster Together

Wardley Mapping: Build, Buy, or Borrow? (preview)

Should Greenbox build its own delivery tracking or use a third party? When building is cheap, the calculus changes — but not always in the direction you'd expect.

Read article 17 Aug 2026 · 15 min read

Exam Room · Advanced GenAI

How to Wire Function Calling Through Bedrock (preview)

An assistant that knows the answers but can't act on them is half a tool. Function calling, letting the model invoke tools we define, with arguments it chooses, turns understanding into action. Bedrock's Converse API has native tool-use support; so does Anthropic's Messages API via Bedrock; so do Bedrock Agents as a higher-level wrapper. Each exposes function calling through a different surface, and picking wrong makes the simple case hard.

Read article 14 Aug 2026 · 24 min read

The Workshop

The Workshop: Decision Tables (preview)

A pattern for extracting a tangled business rule from a domain expert's head and laying it out as a conditions-and-outcomes grid, so the combinations nobody had thought of become visible before they become bugs.

Read article 13 Aug 2026 · 17 min read

Under the Hood · Trust

How Encryption Works (preview)

Every time you send a message, buy something online, or check your bank balance, encryption is working in the background. But what is it actually doing? From Caesar's shifted alphabet to the elliptic curves protecting your bank account, the surprisingly beautiful mathematics of keeping secrets.

Read article 12 Aug 2026 · 15 min read

Exam Room · Advanced GenAI

Caching LLM Responses Without Stale Answers (preview)

Thirty percent of the support assistant's queries are paraphrases of each other, 'how do I cancel?' 'can I cancel?' 'where's the cancel button?', and every one pays full model price. Caching LLM responses isn't as simple as hashing a prompt: exact-match, semantic, and prefix caching answer different questions, and getting the boundary wrong serves yesterday's answer to today's question.

Read article 11 Aug 2026 · 15 min read

The Greenbox Story · Faster Together

Value Stream Mapping: Finding Where to Focus (preview)

Lead time has tripled and nobody knows why. Charlotte maps the actual flow of work from idea to production — and discovers the code takes two days. Everything else takes nineteen.

Read article 10 Aug 2026 · 12 min read

Exam Room · Advanced GenAI

SageMaker JumpStart or Bedrock for the Same Model (preview)

A team that wants Llama 3.3 70B in production can take it through SageMaker JumpStart, pushing a button to deploy onto an endpoint they own, or through Bedrock's model catalog with per-token pricing and no infrastructure. Same model, sometimes the same base weights, two quite different operational shapes. The correct choice depends on how much of the serving layer you actually want to touch.

Read article 07 Aug 2026 · 27 min read

The Workshop

The Workshop: DDD Modelling (preview)

Half a day with a clarified Event Storming wall, a whiteboard, and the correct people — to land on aggregates, bounded contexts, and a context map the team will actually re-read. Where Event Storming an Architecture produces the first cut of boundaries, this is the session that sharpens them.

Read article 06 Aug 2026 · 25 min read

Under the Hood

How Git Works (preview)

Git isn't a file tracker with version numbers. It's a content-addressable object store with a directed acyclic graph on top. Understanding this changes git from a tool you fear into a tool you trust.

Read article 05 Aug 2026 · 15 min read

Exam Room · Advanced GenAI

Keeping PII Out of LLM Prompts and Logs (preview)

A claims assistant that has to answer questions about a customer's claim while keeping the customer's name, address, policy number, and medical details out of training data, out of logs, and out of anything a subpoena could touch later. PII redaction isn't one knob, it's four or five, in different places, each covering a different leak path. Comprehend, Bedrock Guardrails, custom redaction, and the prompt itself each handle a different shape of the problem.

Read article 04 Aug 2026 · 17 min read

The Greenbox Story · Growing Pains

Developer Onboarding: Ramping Up Without Slowing Down (preview)

Three new developers start in three consecutive weeks. By week four, the existing team is spending more time explaining things than building them, and the new people still feel lost.

Read article 03 Aug 2026 · 12 min read

Exam Room · Advanced GenAI

Provisioned Throughput vs On-Demand Bedrock (preview)

On-demand Bedrock is priced per token, throttled per minute, and latency-variable in a way that product hates. Provisioned Throughput buys predictability with a monthly commitment. The break-even between the two isn't obvious, and the answer changes with the shape of the workload, a 24/7 assistant and a burst-heavy report generator land on opposite sides of the same line.

Read article 31 Jul 2026 · 49 min read

The Workshop

The Workshop: C4 Modelling (preview)

A simple set of nested diagrams — Context, Container, Component, Code — that turns a wall of sticky notes and tribal knowledge into an architecture artefact that survives outside the room.

Read article 30 Jul 2026 · 15 min read

Under the Hood · Trust

How Identity Works (preview)

You prove who you are dozens of times a day, passwords, fingerprints, face scans, little codes texted to your phone. But what does 'proving identity' actually mean? From passports to passkeys, the surprisingly deep question of how we know who's who.

Read article 29 Jul 2026 · 14 min read

Exam Room · Advanced GenAI

Picking an Embedding Model for Retrieval (preview)

An index with 20 million chunks, queries that need to work in English, Spanish, Portuguese, and Japanese, and a budget that won't bear re-embedding every six months when someone decides the new model is better. The embedding model quietly caps what a retrieval system can ever do. Titan, Cohere, and a self-hosted model all trade different things, and the comparison is messier than the marketing suggests.

Read article 28 Jul 2026 · 15 min read

The Greenbox Story · Growing Pains

Hiring Mistakes: The Hire Who Doesn't Work Out (preview)

Jordan Chen's CV was perfect. The technical interview was flawless. Six weeks in, the team is quieter, the code is cleverer, and nobody can explain why things feel worse.

Read article 27 Jul 2026 · 14 min read

Exam Room · Advanced GenAI

When a Document Won't Fit the Context Window (preview)

A 400-page contract, a 200-page policy manual, and a legal team asking 'what clauses govern refund disputes across both?' A 200,000-token context window sounds like enough until you realise what goes in with the documents. Chunking, map-reduce, hierarchical summarisation, and sliding context windows each answer a different question, and getting the boundaries correct is most of the battle.

Read article 24 Jul 2026 · 26 min read

The Workshop

The Workshop: Event Storming an Architecture (preview)

Event Storming at its most precise. Take a Process Level wall and a team of developers; leave with aggregates, bounded contexts, and a design the room actually believes in. The session that turns 'we understand the flow' into 'we know where the code goes'.

Read article 23 Jul 2026 · 23 min read

Under the Hood

How Email Works (preview)

SMTP, MX records, SPF, DKIM, DMARC, spam filters, and email headers, how the internet's oldest killer app actually delivers a message, and why it's still terrible and still indispensable.

Read article 22 Jul 2026 · 15 min read

Exam Room · Advanced GenAI

Streaming Responses to Cut First-Token Latency (preview)

A chat interface where users wait four seconds for any response on long generations, abandon rate creeping up, product asking why we can't do the typing-animation thing that every other assistant does. Streaming isn't just a UX polish, it changes how the entire response path has to work, from the SDK call through API Gateway to the browser, and each hop has its own way of getting it wrong.

Read article 21 Jul 2026 · 14 min read

The Greenbox Story · Growing Pains

Service Design: Customer Support at Scale (preview)

At two hundred subscribers, Sam answered every email personally. At three thousand five hundred, the inbox is a wall of fire and Sam hasn't slept properly in weeks.

Read article 20 Jul 2026 · 15 min read

Exam Room · Advanced GenAI

Importing Custom Weights into Bedrock (preview)

A research team has fine-tuned an open-weights model for medical-notes summarisation on a private SageMaker cluster. The resulting weights live in S3; the production runtime wants Bedrock's ergonomics. Custom model import bridges that gap, but it only works for certain base architectures, comes with throughput minimums, and quietly changes the cost model compared to on-demand foundation models.

Read article 19 Jul 2026 · 7 min read

Cooking · Mains

Pork and Fennel Ravioli, Take Two (preview)

The second run at pork and fennel ravioli, fixing what the first batch taught me: 500g of pork for six or seven generous parcels each, the pasta rolled properly thin so it cooks through, and a brown butter lifted with lemon and toasted hazelnuts. Roasted cherry tomatoes on the side bring a sweet-sharp contrast. Serves four.

Read article 17 Jul 2026 · 8 min read

Consulting and Craft · Through the Kitchen

Resting the Meat (preview)

You take the steak off the heat and the hardest part begins: doing nothing. Most of what I've learned about shipping software, I learned from a piece of resting beef.

Read article 16 Jul 2026 · 13 min read

Under the Hood · Trust

Why Trust Is Hard (preview)

The fundamental problem of the digital age isn't speed or storage or bandwidth. It's trust. How do you verify identity, prevent impersonation, and establish confidence between strangers who will never meet? Humans have been working on this for thousands of years, and the internet broke most of what we'd figured out.

Read article 15 Jul 2026 · 15 min read

Exam Room · Advanced GenAI

How to Cut a Bedrock Bill Without Hurting Quality (preview)

A Bedrock bill that doubled in two months, a product roadmap that blames the retrieval service, a finance partner who'd like a straight answer. The cheapest token is the one you don't send; the next cheapest is the one you send to the correct model. Model routing, prompt compression, cached retrieval, and provisioned throughput each solve a different slice of the cost problem, and none of them is the silver bullet.

Read article 14 Jul 2026 · 15 min read

The Greenbox Story · Growing Pains

Capacity Planning: The Seasonal Crunch (preview)

Winter arrives and half of Dave's crops don't. The Greenbox team discovers that a subscription product built on fresh produce has a supply problem nobody modelled.

Read article 13 Jul 2026 · 15 min read

Exam Room · Advanced GenAI

How to Build a Multi-Modal Bedrock Assistant for Insurance Claims (preview)

A claims-processing assistant that reads a scanned invoice, listens to a voicemail, answers the customer's question in plain text, and, if asked, reads it back. Four modalities, one conversation. The model choice, the orchestration shape, and the ways different inputs fail each push the architecture in different directions, and the naive 'just use a multi-modal model' misses half of where the real work is.

Read article 10 Jul 2026 · 22 min read

Consulting and Craft · In Practice

Technical Debt Is a Loan, Not a Crime (preview)

Calling something 'tech debt' is usually an accusation. It shouldn't be. Used deliberately, debt is a tool: ship something imperfect, learn from the market, pay it back with what you learned. Used carelessly, it compounds until the interest swallows the principal.

Read article 09 Jul 2026 · 13 min read

Under the Hood

Past the Ten-Second Wall (preview)

Past the ten-second wall the user is gone unless you do something about it. There are two patterns for keeping them: narrate the work as it happens, or release them and come back. This post is about implementing both, the wire formats, the server shapes, the client shapes, and the trade-offs each one makes.

Read article 08 Jul 2026 · 14 min read

Exam Room · Advanced GenAI

Evaluating LLM Output With Bedrock Eval Jobs (preview)

Two thousand historical support tickets, a summarisation prompt, a new model candidate, and a product manager asking whether switching would hurt quality. Bedrock evaluation jobs offer automated scoring, human review through Ground Truth workflows, and model comparison side by side, but they answer different questions, and getting the right number out of the right job matters more than running more of them.

Read article 07 Jul 2026 · 14 min read

The Greenbox Story · Drawing the Lines

When a Container Earns Its Own Zoom (preview)

The substitution engine got built. It works. But it's complicated enough that Ravi can't explain it to a new joiner, and the C2 diagram just shows it as a single box called Supply Matching. Time for the level of C4 the team has been carefully not drawing until now.

Read article 06 Jul 2026 · 15 min read

Exam Room · Advanced GenAI

How to Manage Prompts Across Thirty Services on Bedrock (preview)

One prompt scattered across thirty services, no versioning, no tests, drift between the copy in the code and the copy in the docs, a silent regression when somebody changed 'concise' to 'brief' and retention on one response tanked. Prompt engineering at a hundred callers isn't prose discipline, it's configuration management. Bedrock Prompt Management, Git-backed templates, and parameterised prompts each solve a slice of the same problem.

Read article 03 Jul 2026 · 6 min read

Consulting and Craft · Through the Kitchen

The Probe in the Meat (preview)

I own four kitchen thermometers, each for a different kind of cooking, and the differences between them are the differences between the kinds of decisions they help me make. Most of what I've learned about observability, I could have learned from a brisket, a vat of hot oil, a batch of caramel, and a steak.

Read article 02 Jul 2026 · 3 min read

Cooking · Mains

Cod, Prawns & Fennel (preview)

Cod and king prawns poached in a pan of slow-sweated fennel and leek with white wine and crème fraîche. Twenty-five minutes, one pan, four servings.

Read article 02 Jul 2026 · 10 min read

Under the Hood

The Transformer Attention Budget (preview)

Three numbers govern every interactive interface ever built, 0.1, 1, and 10 seconds. They are fifty years old. They predate the personal computer. They have survived every revolution in how software gets to a human, because they describe the human, not the software. With LLMs, the numbers do not stop being true. They bend in ways the original numbers never anticipated.

Read article 01 Jul 2026 · 14 min read

Exam Room · Advanced GenAI

Picking a Vector Store for Bedrock RAG (preview)

Twelve million embedding vectors, a 50ms retrieval budget, hybrid queries that mix keyword and semantic, and a bill that should not double the Bedrock spend on its own. OpenSearch Serverless, Aurora with pgvector, and Pinecone Serverless all serve the same shape of query, but their pricing curves, operational shapes, and query surfaces diverge the moment the corpus grows beyond demo scale.

Read article 30 Jun 2026 · 9 min read

The Greenbox Story · Drawing the Lines

Architecture Decision Records: Why We Did It That Way (preview)

With LLMs writing more code and new people joining the team, nobody can explain why things are the way they are. Charlotte introduces ADRs — the institutional memory the codebase desperately needs.

Read article 29 Jun 2026 · 15 min read

Exam Room · Advanced GenAI

How to Wire an LLM to Side-Effecting Actions with Bedrock Agents (preview)

An assistant that has to look up a customer's subscription, pause it, refund a charge, and email confirmation. Not just answer, act. The glue between a language model and the rest of our systems is a solved problem three different ways: Bedrock Agents, a LangChain agent loop, or a hand-written tool router. Each of them handles tool definition, invocation, and error recovery, but they put the guardrails in very different places.

Read article 28 Jun 2026 · 3 min read

Cooking · Sweet Treats

Tarte Tatin (preview)

Apples caramelised in butter and sugar in a heavy ovenproof pan, lidded with a disc of rough puff, baked until the pastry is gold and the apples are deep amber. Flipped onto a serving plate while still warm. A pudding with a dramatic reveal and three ingredients of consequence (the apples, the caramel, the pastry), which is most of why it works.

Read article 27 Jun 2026 · 4 min read

Cooking · Sweet Treats

Lunchbox Popcorn (preview)

A bag of plain kernels is the cheapest snack in the cupboard and one of the best for a lunch box: pop a batch on the stove, flavour it three ways, and you have a week of crisp, nut-free fillers that don't go to mush by recess. The base is just kernels, a little oil and a lid. The flavours are a sweet honey-cinnamon, a cheesy savoury one, and a cocoa dusting the kids think is a treat. Make a big batch on Sunday, keep it airtight, and portion it as you pack.

Read article 27 Jun 2026 · 11 min read

Under the Hood · The AI Field Guide

When Not to Use an LLM (preview)

Nine posts in, the question this series has been circling: when is an LLM the wrong tool? A field-guide-style decision walk through the alternatives, encoder models, classical NLP, rules, search, logic, probability, with the symptoms that point to each, and the symptoms that point back to LLMs.

Read article 26 Jun 2026 · 23 min read

The Workshop

The Workshop: Prioritisation (preview)

A 60-to-90 minute pattern for picking the next order of work when the backlog has outgrown the room’s memory. MoSCoW, RICE, Cost of Delay, value-vs-effort — not as rival religions, but as lenses chosen to fit the constraint. Silent scoring, reveal, argue the fringes, land the order.

Read article 25 Jun 2026 · 23 min read

Under the Hood

How Containers Work (preview)

Containers aren't tiny virtual machines. They're processes with boundaries — namespaces for isolation, cgroups for resource limits, and layers for efficient storage. Understanding what's actually happening changes how you use them.

Read article 24 Jun 2026 · 16 min read

The Greenbox Story · Drawing the Lines

Decision Tables: The Substitution Engine in Go (preview)

Maya's substitution rules are captured in decision tables. The LLM generated a thousand lines of Go and four hundred tests from them. This is what that code actually looks like.

Read article 23 Jun 2026 · 9 min read

The Greenbox Story · Drawing the Lines

Decision Tables: Making Maya's Brain Explicit (preview)

The substitution rules Maya carries in her head now need to be captured for the new team. Decision tables express complex multi-condition logic that LLMs can implement comprehensively.

Read article 22 Jun 2026 · 5 min read

Cooking · Bread

Sourdough Boule (preview)

A 900g sourdough country boule, mostly strong white flour with a handful of wholemeal for character, 75% hydration, three sets of stretch-and-folds, a long bulk on the counter, an overnight cold retard in the fridge, baked Sunday morning in a preheated dutch oven. Pairs excellently with cream of mushroom soup: bake the boule Sunday, ladle the soup Monday.

Read article 22 Jun 2026 · 5 min read

Cooking · Mains

Cream of Mushroom Soup (preview)

A proper mushroom soup: a kilo of mushrooms browned hard, a handful of dried porcini for depth, sweated onions and garlic, a measure of whisky in the deglaze, blended just enough to be silky without losing the bite. The base freezes in single-serve portions; the cream goes in at reheat so it never splits. Pairs excellently with a sourdough boule baked the day before.

Read article 22 Jun 2026 · 1 min read

Cooking · Breakfast

Breakfast Smoothie (preview)

A two-glass breakfast smoothie built on banana and frozen fruit, with oats for staying power and yoghurt for body. It carries a scoop of creatine each without a hint of chalk; the frozen fruit and banana hide it completely. Sixty seconds in the blender and you're out the door.

Read article 22 Jun 2026 · 16 min read

Exam Room · Advanced GenAI

Building RAG When the Source Documents Change Daily (preview)

A support assistant that has to answer from a product manual which the product team edits weekly, a pricing sheet that changes at month-end, and an operational runbook that mutates hourly. The base model doesn't know any of it, and fine-tuning won't keep up. Retrieval is the answer; the question is how much of the retrieval plumbing we want to own, and Bedrock Knowledge Bases, a LangChain stack, and a hand-rolled pipeline each put the lines in different places.

Read article 21 Jun 2026 · 3 min read

Cooking · Sides

Refrigerator Dill Pickles (preview)

A 1L jar of dill pickles built in fifteen minutes Wednesday evening, ready to eat by Saturday and at their best by the following weekend. Vinegar brine rather than lactofermentation: quicker, sharper, fridge-stable for a month, and pleasantly crunchy when sliced thin onto a burger. Lacto pickles are a different project.

Read article 21 Jun 2026 · 1 min read

Cooking · Sides

Burger Sauce (preview)

A proper home-made burger sauce: good mayonnaise, a smaller hit of ketchup than you'd expect, yellow mustard for tang, pickle brine for acid, and smoked paprika for the colour and a faint backnote that pairs with anything off a fire. Five minutes of whisking, twenty-four hours in the fridge to come together, and a fortnight of fridge life. Roughly 250ml from one bowl.

Read article 21 Jun 2026 · 3 min read

Cooking · Mains

Smoked Brisket-Chuck Patties (preview)

Six 170g patties from a 50:50 brisket-and-chuck blend, ground at home through a 4.5mm plate, twice. Smoked on the offset at 110°C over oak until the centres hit 47°C, then seared hard in cast iron with cheese under a lid for the last thirty seconds. Two passes of heat for two different jobs: the smoke does the flavour, the sear does the crust.

Read article 21 Jun 2026 · 3 min read

Cooking · Bread

Milk Rolls for Burgers (preview)

Eight soft milk rolls built on a tangzhong base (a small flour-and-milk paste cooked to a gel before it joins the dough), which lets the loaf hold roughly 10% more liquid than it otherwise could. The extra hydration is the trick: the rolls stay pillowy for two days, and the crumb has enough structure to take a smoked patty, melted cheese, sauce, and pickles without going pasty.

Read article 21 Jun 2026 · 4 min read

Cooking

A Burger from Scratch (preview)

A burger built from the ground up over a week, with four recipe cards behind it: pillowy milk rolls with a tangzhong crumb, smashed patties ground from brisket and chuck and smoked on the offset, a proper home-made burger sauce, and refrigerator dill pickles that have had three days to come to themselves. The cheese is the only thing I bought.

Read article 20 Jun 2026 · 10 min read

Under the Hood · The AI Field Guide

Bayesian Reasoning (preview)

Probability is the third leg of classical AI. Bayesian networks, Kalman filters, particle filters, multi-armed bandits. The mathematics that fuses sensor noise into a position estimate, that decides whether to show a user this ad or that one, that diagnoses faults from intermittent symptoms.

Read article 19 Jun 2026 · 24 min read

Consulting and Craft · In Practice

The Price of Everything (preview)

Pricing isn't a finance decision; it's a product decision. The journey from a gut-feel launch number to validated unit economics is one of the hardest things a small team has to do, and one of the most consequential.

Read article 18 Jun 2026 · 15 min read

Under the Hood

How Estimation Works (And Why It Doesn't) (preview)

The cone of uncertainty, the planning fallacy, story points, throughput-based forecasting, and Monte Carlo simulation, why software estimation is so hard, and what actually works.

Read article 17 Jun 2026 · 15 min read

Exam Room · Advanced GenAI

Spreading Bedrock Load with Cross-Region Inference Profiles (preview)

A Bedrock-backed SaaS serving US, EU and APAC customers is hitting regional quota in us-east-1 during peak while the same model sits idle in eu-west-1. The team wants to spread load without fracturing the product into three regional deployments. Three letters on the front of the model ID do the job — if the model supports it and the geography fits the customer.

Read article 16 Jun 2026 · 19 min read

The Greenbox Story · Drawing the Lines

Drawing the System: From Event Storm to C4 (preview)

The wall told the team where the boundaries were. But the wall lives in one room, and the team doesn't. Charlotte introduces C4 — the diagrams that let anyone who wasn't in the room understand what Greenbox actually is.

Read article 15 Jun 2026 · 17 min read

Exam Room · Advanced GenAI

Configuring Bedrock Guardrails for PII, Topics, and Grounding (preview)

A consumer-facing chatbot on Bedrock has passed every red-team round on the obvious harms — no weapons, no hate, no CSAM — and is still shipping embarrassments: a card number pasted by one user echoing back in a reply, the bot cheerfully comparing the company's product with a named competitor, and a hallucinated policy line that nobody in the building wrote. Five different filter jobs wrap the same Bedrock invocation, and Guardrails is the one surface that does all five without five Lambdas.

Read article 14 Jun 2026 · 2 min read

Cooking · Sweet Treats

Mille-Feuille aux Framboises (preview)

Three layers of caramelised rough puff with vanilla pastry cream and fresh raspberries between. The classic mille-feuille (thousand-leaf pastry, cream, fruit) with the simplest fruit in the calendar. A weekend dessert: pastry and cream on Saturday, bake and assemble on Sunday.

Read article 14 Jun 2026 · 6 min read

Cooking · Pastry

Croissants and Pain au Chocolat (preview)

A yeasted laminated dough, three calendar days of slow work, and a freezer full of Monday-morning pastries. Two dozen from a single batch, twelve croissants and twelve pain au chocolat, shaped on Sunday and pulled from the freezer one at a time to thaw and proof overnight in a cool kitchen, then baked fresh whenever the coffee asks for one.

Read article 13 Jun 2026 · 11 min read

Under the Hood · The AI Field Guide

Knowledge, Logic, and Constraints (preview)

Reasoning from facts and rules, the way the AI textbooks of the 1980s thought we'd build minds. Propositional and first-order logic. Knowledge bases. Datalog. Modern descendants. SAT solvers, SMT solvers, and the production rule engines running insurance and banking. When if-this-then-that wins.

Read article 12 Jun 2026 · 4 min read

Cooking · Mains

Surf and Turf (preview)

The restaurant cliché made better at home: a thick sirloin brought to an even medium-rare in the water bath and then seared hard, king prawns and scallops caught quickly in the same pan, and a disc of tarragon butter, a faux béarnaise, melting over the lot. The butter does the work of a sauce without the fuss: tarragon, shallot and a little vinegar beaten into soft butter and chilled a day or two ahead. Serves four, with triple-cooked chips and a sharp salad to cut the richness.

Read article 12 Jun 2026 · 2 min read

Cooking · Sides

Triple-Cooked Chips (preview)

The chip worth the name: floury potatoes simmered until their edges crack, dried, then fried twice, once low to cook the inside, once hot to glass the outside. It sounds like a project, but most of it is make-ahead and hands-off, and the two fries are quick. A stockpot and a thermometer are all the kit you need. Enough for four alongside a steak.

Read article 12 Jun 2026 · 2 min read

Cooking · Mains

Mushroom Galette (preview)

A free-form rough-puff tart built for a Friday night: chestnut mushrooms cooked down hard with caramelised shallots, thyme and garlic, piled onto a ricotta-and-parmesan base and folded into a craggy galette. The pastry shatters, the mushrooms go deep and savoury, and it comes together while the kids' freezer food is in the oven. Serves two adults with a green salad.

Read article 12 Jun 2026 · 7 min read

Cooking · Mains

Pasties (preview)

Twenty-four pasties from a long weekend bench session: creamy chicken in béchamel, beef in rich brown gravy, and cheese and onion, eight of each. All folded as triangles, marked by one, two, or three steam slits so the fillings are tellable apart in the freezer and the lunchbox. Frozen raw and baked two or three at a time across the week, with cheese twists made from the puff offcuts as the day-one snack.

Read article 12 Jun 2026 · 3 min read

Cooking · Sides

Homemade Ricotta (preview)

Fresh ricotta from scratch in twenty minutes with three ingredients: whole milk, a splash of cream, and acid to set the curd. Strictly it's a fresh acid-set cheese rather than true whey ricotta, but it's soft, sweet and miles better than the tub, and you control how wet or firm it ends up by how long you drain it. Made for the mushroom galette, but just as good on toast with honey or folded through pasta.

Read article 12 Jun 2026 · 1 min read

Cooking · Breakfast

Bircher Muesli (preview)

The classic make-ahead breakfast: rolled oats soaked overnight in milk and apple juice with grated apple, yoghurt and a squeeze of lemon, soft and spoonable by morning. Mixed the night before, it carries a scoop of creatine without a hint of grit, the powder dissolves into the liquid while you sleep. Makes two bowls and scales up for the week.

Read article 12 Jun 2026 · 15 min read

The Greenbox Story · Drawing the Lines

Domain-Driven Design: The Anti-Corruption Layer in Go (preview)

When one context needs to talk to another, the temptation is to import its types. The anti-corruption layer says no, translate at the boundary, and let each context own its own language.

Read article 11 Jun 2026 · 18 min read

The Greenbox Story · Drawing the Lines

Domain-Driven Design: Events Across Boundaries in Go (preview)

Bounded contexts don't help if they can't talk to each other. Domain events are how the Subscription context tells Billing that something happened, without knowing Billing exists.

Read article 10 Jun 2026 · 2 min read

Cooking · Mains

Fish Chowder (preview)

A proper bowl of comfort: smoked cod and firm white fish poached gently in a milk-and-cream broth thick with potato and sweetcorn, crisp pancetta scattered over the top. Everything comes off the Woolworths seafood cabinet, nothing fancy. Serves four with crusty bread, fifteen minutes of prep and not much longer on the stove.

Read article 10 Jun 2026 · 18 min read

The Greenbox Story · Drawing the Lines

Domain-Driven Design: Modelling the Subscription Context in Go (preview)

Charlotte said draw the boundaries. Tom said fine. Now someone has to write the code. This is what DDD looks like in Go, value objects, entities, aggregates, and the type system doing half the work.

Read article 09 Jun 2026 · 13 min read

The Greenbox Story · Drawing the Lines

Domain-Driven Design: Drawing the Boundaries

The codebase is becoming a monolith. Charlotte helps the team define bounded contexts so the architecture matches the business — and so LLMs generate code that fits.

Read article 08 Jun 2026 · 2 min read

Cooking · Mains

One-Pot Fry-Up

The whole fry-up in a single pan: chorizo rendered down, potatoes fried crisp in the spiced fat, mushrooms and green capsicum browned, two tins of baked beans warmed through, and eggs baked into wells in the top until the whites set and the yolks stay soft. A grown-ups' lunch that goes from pan to table without a sink full of dishes.

Read article 08 Jun 2026 · 16 min read

Exam Room · Advanced GenAI

Designing Short-Term and Long-Term Memory for a Bedrock Chat Assistant

A customer-support assistant where the average conversation runs fifteen turns before it resolves, and returning users pick up two weeks later expecting the bot to remember they've been waiting on a refund. Two memory problems in one product — what's live in the current conversation and what persists across visits — and four plausible ways to build it. Bedrock Agents' built-in memory handles one half cleanly; the other half is where teams reach for DynamoDB or a knowledge base and get it wrong.

Read article 07 Jun 2026 · 2 min read

Cooking · Sweet Treats

Super Thick Hot Chocolate

Spanish-style thick drinking chocolate, four small cups from 200g of 70% dark chocolate, 800ml of whole milk, and a cornflour slurry. Fifteen minutes.

Read article 07 Jun 2026 · 7 min read

Cooking · Mains

Pork and Fennel Ravioli

Fresh egg pasta filled with a pork-and-fennel sausage mince you grind yourself from shoulder, so the toasted fennel, garlic and lemon work right through the meat. About thirty-six ravioli from a three-egg dough and 350g of pork, nine apiece for four. Dressed with brown butter and crisp sage. An autumnal Sunday lunch, with any spare ravioli freezing raw on a tray for a Tuesday night that needs rescuing.

Read article 06 Jun 2026 · 4 min read

Cooking · Sweet Treats

Scottish Tablet

Scottish tablet is sugar, condensed milk, butter and milk boiled to the soft-ball stage and then beaten as it cools until it grains. That graining is the whole point: tablet isn't fudge, it's harder, drier, and breaks into a brittle, melt-on-the-tongue crumb rather than bending. A sugar thermometer and a sturdy wooden spoon are the only special kit. The boil is quick, the beating is the arm work, and the squares keep for weeks in a tin.

Read article 06 Jun 2026 · 3 min read

Cooking · Mains

Pizza Night

A thin-crust dough scaled for a crowd: six 12-inch bases from one batch, raised with a pinch of commercial yeast while the sourdough starter is still finding its feet. Stretch it thin, spread a good canned sauce, top with whatever the week left in the fridge, and bake as hot as the oven will go. The dough wants a slow cold rise, so mix it the day before.

Read article 06 Jun 2026 · 7 min read

Cooking · Sweet Treats

Cherry Pie

A double-crust pie built around a bag of frozen pitted cherries. The trick with frozen fruit is the water it sheds: thaw the cherries in a colander, catch the juice, boil it down to a glossy syrup with the sugar, and thicken that rather than the raw fruit. The cherries go back in barely cooked, the pie bakes on a screaming-hot tray so the base sets crisp, and then it cools all the way down so the filling slices instead of running. All-butter pastry made the day before, fruit prepped while the oven heats.

Read article 06 Jun 2026 · 4 min read

Cooking · Pastry

Pâte Brisée

Plain all-butter shortcrust, the workhorse pie and tart pastry, no sugar to speak of, made by hand or in the processor in ten minutes. Rubbed coarse so flakes of butter survive into the bake, which is what gives it both tenderness and a little flake. This card makes the full double-crust quantity for a 23cm pie; halve it for a single blind-baked shell. The pastry behind the cherry pie, quiches, custard tarts, and any pie that wants a sturdy base and a lid.

Read article 06 Jun 2026 · 10 min read

Under the Hood · The AI Field Guide

Search and Planning

Most of what people called 'AI' before deep learning was search. A* finding the route, alpha-beta playing the chess move, STRIPS sequencing the plan. The algorithms that run your map app, your build system, your warehouse robot, and your game opponent.

Read article 05 Jun 2026 · 27 min read

The Workshop

The Workshop: Business Model Canvas

A pattern for laying out how a business creates, delivers, and captures value on a single page of nine boxes, filled in customer-first order, so the team can see whether the whole thing actually holds together.

Read article 04 Jun 2026 · 13 min read

Under the Hood · Time

Time Is Wrong Everywhere All at Once

Your laptop's clock is wrong. So is the server's. So is every other computer on the network. The question isn't how to make them agree, it's what to do when they can't. From Lamport clocks to Google Spanner, the story of time in distributed systems.

Read article 03 Jun 2026 · 18 min read

Exam Room · Advanced GenAI

Combining RAG and Fine-Tuning for a Legal Contract Assistant

A legal-tech team wants a contract review assistant that understands two hundred thousand past matters, speaks in the firm's voice with clause-by-section citations, and refuses anything off-domain. A hundred thousand dollars, three months. RAG, fine-tuning, and continued pre-training each solve a different half of that sentence — and the interesting answer is which two to pick, not which one.

Read article 01 Jun 2026 · 1 min read

Cooking · Sweet Treats

Boterkoek

A Dutch butter cake: equal-ish weights of flour, butter, and sugar pressed into a tin and pulled out of the oven while the centre is still slightly underdone.

Read article 01 Jun 2026 · 17 min read

Exam Room · Advanced GenAI

How to Build a Citations-Required RAG Over 50K Internal Documents

Fifty thousand internal documents, five gigabytes of text, weekly churn, a three-second latency budget, per-user access control, and a citation in every single answer. The RAG landscape on Bedrock is bigger than one product and the interesting part of the design is what falls away once you name the five things that actually decide it.

Read article 31 May 2026 · 6 min read

Cooking · Sweet Treats

Giant Marmalade Choux Buns

Four large choux buns under a crackled orange craquelin lid, filled through the base with marmalade chantilly around a hidden core of marmalade. Left whole, so the inside is a surprise. Start to finish in about ninety minutes, no overnight rests; a showstopper you can decide on at five and serve at half six.

Read article 30 May 2026 · 9 min read

Under the Hood · The AI Field Guide

Rules, Grammars, and Regex

Sometimes the correct answer to 'what model should I use?' is no model at all. Hand-written rules, regular expressions, finite-state transducers. They're deterministic, auditable, free at inference, and frequently the correct tool for the job.

Read article 29 May 2026 · 5 min read

Consulting and Craft · Through the Kitchen

The Salt in the Dish

There are four kinds of salt in my kitchen drawer and they are not interchangeable. This post is about salt. Most of it is actually about dependencies, and about the discipline of knowing what you're putting into the dish before you put it in.

Read article 28 May 2026 · 10 min read

Under the Hood · Time

Why Does Thursday Last Forever?

A watched pot never boils. Holidays vanish. Childhood lasted decades and your thirties were a weekend. The clock doesn't care, so why does your brain?

Read article 27 May 2026 · 15 min read

Exam Room · Advanced GenAI

Picking a Bedrock Model for High-Volume RAG

A million LLM requests a day, peaking at thirty per second, split across US and EU customers, with a P99 first-token target under 1.5 seconds and real reasoning over retrieved context. Bedrock has seven model families and four ways to buy capacity. Most of the landscape falls away once you name what actually decides it, and the real trick is what you do *after* you've picked the model.

Read article 26 May 2026 · 12 min read

The Greenbox Story · Finding the Fit

Prioritisation: What Changes First

The team knows what's wrong — the value prop, the pricing, the unit economics. They can't fix everything at once. Now they need to decide what changes first.

Read article 25 May 2026 · 1 min read

Cooking · Sweet Treats

Crème Anglaise

A pouring custard tuned for puddings that already bring their own sweetness.

Read article 25 May 2026 · 2 min read

Cooking · Sweet Treats

Apple Crumble

Apples chopped chunky by hand, a crumble lid rubbed together with cold fingers a minute later, and both into the fridge in separate tubs until baking. Forty-five minutes in a hot oven the next evening and pudding's done while we eat. The topping never touches the wet fruit until the oven, so the crumb stays dry and the bake comes out crisp.

Read article 25 May 2026 · 21 min read

Exam Room · GenAI

Choosing Between SageMaker, Bedrock, and Purpose-Built AI APIs

A platform team has five AI-shaped requests landing in a single sprint: transcribe call centre audio, detect anomalies in sensor data, extract text from scanned forms, summarise customer emails, and detect faces in CCTV. Someone has already typed 'use SageMaker' into three design docs. Someone else insists Bedrock is the answer. A third voice mutters about purpose-built services. AWS has at least three answers to every AI problem, so there's no single platform that wins; what matters is how to tell which layer of the stack each request lands on, and what that choice costs in time, money, and flexibility.

Read article 23 May 2026 · 11 min read

Under the Hood · The AI Field Guide

The Boring Baseline That Wins

TF-IDF, logistic regression, naive Bayes, k-means, LDA. The fifty lines of scikit-learn that beat your fancy model on the small problem you actually have. Why these baselines still win, and why the correct starting point in 2026 is often the same as it was in 2006.

Read article 22 May 2026 · 41 min read

Under the Hood

A Gentle Guide to Typography: From Chisels to Character Sets

A warm, thorough walk through the world of typography, from hand-carved letters and Gutenberg's press to fonts, glyphs, kerning, serifs, and Unicode. Everything you wanted to know about how written language gets its shape.

Read article 21 May 2026 · 9 min read

Under the Hood · Time

The Clock Inside You

Your body has its own clock, and it doesn't care about UTC. It runs on light, adenosine, and a cluster of 20,000 neurons behind your eyes. Jet lag, shift work, larks and owls, the biology of time is a different machine from the physics, and it keeps its own hours.

Read article 20 May 2026 · 19 min read

Exam Room · GenAI

Choosing Between Chains, Retrieval, and Agents for a GenAI Assistant

A product manager wants a 'GenAI assistant' for internal operations, something that can answer questions, look up customer records, draft emails, and file Jira tickets. Three architectural patterns keep coming up: chains, retrieval, and agents. They sound similar, they all use foundation models, and teams routinely reach for the most elaborate one when a simpler pattern would do. There's no single 'best' here; what matters is which one fits each piece of the assistant's workload, and when elaboration costs more than it earns.

Read article 19 May 2026 · 18 min read

The Greenbox Story · Finding the Fit

Pricing Experiments: The Right Box at the Right Price

Greenbox has been charging $25 for the small box and $45 for the large box since day one. Nobody complained. Maya thought that meant the prices were right. A conversation with Lee at the kitchen table suggests otherwise.

Read article 18 May 2026 · 3 min read

Cooking · Mains

Honey Soy Chicken Wings

Thirty minutes from packet to plate. Wings seared hard in a heavy pan until the skin crackles, then rolled through a sticky soy-honey glaze with ginger and garlic, served on jasmine rice with a scatter of spring onion and sesame. One pan, one rice pot, two bowls.

Read article 18 May 2026 · 21 min read

Exam Room · GenAI

How to Make a Bedrock Chatbot Audit-Ready with Guardrails and Watermarks

A fintech ships a customer-facing chatbot on Bedrock. Legal asks: can it give financial advice? Risk asks: can it leak customer account numbers? Compliance asks: if an auditor requests proof a response came from our model, can we demonstrate it? Three questions, three different controls, all of them Bedrock-native. The controls exist; the work is matching the right one to each question and figuring out what the shape of a 'responsible AI' configuration actually looks like when the auditor arrives.

Read article 17 May 2026 · 5 min read

Cooking · Sweet Treats

Sourdough Discard Choc Chip Cookies

Brown-butter chocolate chunk cookies with a hundred grams of sourdough discard folded through the dough: chewy edges, a centre that stays gooey, and just enough tang from the starter to keep them from cloying. The whole batch is built to scoop, freeze raw, and bake one or two from frozen on the night you want them.

Read article 17 May 2026 · 3 min read

Cooking · Sweet Treats

Poires Pochées

Whole pears simmered in a vanilla-lemon syrup until a knife slides in like cutting butter, then cooled in the same syrup overnight so the flavour reaches the core. Beurre Bosc is the May default in Perth, firm enough to hold their shape through a long poach. Standalone or sliced into a mille-feuille.

Read article 17 May 2026 · 2 min read

Cooking · Pastry

Cheese Twists

Rough puff rolled thin, dressed with grated parmesan and a turn of black pepper, folded over and rolled again, sliced into strips and twisted. Baked until deep gold and shattery. The quickest way to turn a batch of pastry into an afternoon snack: twenty-five minutes from fridge to plate, eaten warm with a glass of something cold.

Read article 17 May 2026 · 3 min read

Cooking · Pastry

Crème Pâtissière

Vanilla pastry cream cooked on the stove in twelve minutes: milk, yolks, sugar, cornflour, vanilla. The engine of half of French pastry: the inside of an éclair, the filling under fruit tarts, the cream between the layers of a mille-feuille. Makes about 600g.

Read article 17 May 2026 · 3 min read

Cooking · Pastry

Rough Puff Pastry

Quick puff: four turns instead of six, cubes of butter rolled into the dough instead of a hand-shaped butter block. Most of the flake, half the work. The pastry behind mille-feuille, palmiers, sausage rolls, and anything that wants crisp golden layers without a Friday-night beurrage.

Read article 17 May 2026 · 2 min read

Cooking

A Weekend in the Kitchen

Three things from a busy weekend in the kitchen, each with its own recipe card: a tray of cheese twists from a fresh batch of rough puff, a saucepan of pears poached in vanilla-lemon syrup, and a freezer bag of cookie dough balls that bake one or two at a time when wanted.

Read article 16 May 2026 · 9 min read

Under the Hood · The AI Field Guide

Before the Transformer

n-grams. HMMs. CRFs. The language models and sequence taggers that ran the internet before deep learning, and that quietly still do, in autocomplete, spam filters, biomedical NER, speech recognition. What they are, why they still ship, and when they're the correct answer.

Read article 15 May 2026 · 25 min read

The Workshop

The Workshop: Assumption Mapping

A pattern for surfacing the beliefs hiding underneath a plan, plotting them on an evidence/impact grid, and deciding which ones to test before you commit. Facilitator's playbook, failure modes, and what to do with the grid afterwards.

Read article 14 May 2026 · 12 min read

Under the Hood · Time

Can You Turn Back Time?

General relativity permits time loops. Quantum mechanics hints that the future can influence the past. Hawking threw a party for time travellers and nobody came. The physics of time travel is stranger, and more serious, than science fiction suggests.

Read article 13 May 2026 · 2 min read

Cooking · Preserves

Sweet-Orange Marmalade

A fast one-day marmalade: whole oranges and lemons rough-chopped in the blender, simmered with jam sugar and plain sugar for about an hour. No overnight soak, no separate pectin bag. Just citrus, sugar, heat.

Read article 13 May 2026 · 18 min read

Exam Room · GenAI

Forecasting Without Writing Python

A category manager has 18 months of weekly sales data for 400 SKUs and a deadline to forecast next quarter. She doesn't code. The ML team is booked until Q3. The ask is a tool that lets her build a forecast herself, importable, reviewable, explainable, without waiting for engineering. Which AWS box she clicks matters less than what kind of problem this actually is, what features of the data can honestly feed into a model, and what the business user has to understand for the output to be defensible when finance asks ''why this number?''.

Read article 12 May 2026 · 12 min read

The Greenbox Story · Finding the Fit

Business Model Canvas: Does This Actually Work?

Maya needs to convince investors that Greenbox can reach 1,000 subscribers. But when the team maps the business model, they discover the unit economics don't add up — and Lee admits he's out of his depth.

Read article 11 May 2026 · 20 min read

Exam Room · GenAI

Grounding a Chatbot in Your Own PDFs

A facilities team has 600 PDFs, equipment manuals, safety procedures, maintenance schedules, sitting on a SharePoint drive. Engineers want a chatbot that answers 'how do I reset the chiller on floor 4?' in seconds instead of a ten-minute PDF hunt. Retrieval-augmented generation can do this; whether it does it well depends on what the corpus actually looks like, what kinds of questions the engineers really ask, and which configuration knobs decide whether the answers are any good once a managed service is on the table.

Read article 10 May 2026 · 3 min read

Cooking · Sweet Treats

Lemon Meringue Pie

A blender pâte sucrée scented with lemon zest, a smooth lemon curd, and an Italian meringue with peaks blowtorched gold. A weekend pie, two component recipes assembled across Saturday and Sunday.

Read article 10 May 2026 · 2 min read

Cooking · Mains

Fish Pie

Rick Stein's Cornwall fish pie, with Australian fish. White fish, smoked fish, prawns under a buttered mash, baked until the top is gold.

Read article 10 May 2026 · 2 min read

Cooking · Pastry

Pâte Sucrée

Sweet shortcrust pastry made in the food processor in five minutes, perfumed with lemon zest, ready for any tart shell that needs a crisp golden base. The pastry behind the lemon meringue and a dozen other things.

Read article 10 May 2026 · 1 min read

Cooking · Preserves

Lemon Curd

A bright, smooth lemon curd cooked over a pan of water, set with cold butter, and good for everything from a tart filling to a swirl through yoghurt. Makes about 600g.

Read article 09 May 2026 · 10 min read

Under the Hood · The AI Field Guide

After the Transformer

Transformers have ruled language modelling for nearly a decade. They have a known weakness, and several research lines are trying to replace them. Mamba, RWKV, RetNet, Hyena, diffusion-for-text, what they are, what they fix, and which ones are likely to matter.

Read article 08 May 2026 · 26 min read

The Workshop

The Workshop: Jobs to be Done

A half-day pattern for finding out what your customers actually hired your product to do. Switch interviews, the four forces, shaping the material into job statements, and what to do when the room wants to list features instead.

Read article 07 May 2026 · 16 min read

Under the Hood · Time

Does Time Even Exist?

The arrow of time isn't in the equations. "Now" isn't a location. The most fundamental theories of physics may contain no time variable at all. A tour of the foundations, from the block universe to the holographic principle.

Read article 06 May 2026 · 19 min read

Exam Room · GenAI

Choosing Between Prompting, RAG, and Fine-Tuning

A legal-ops team wants a model that answers questions about their 4,000 in-house contract templates. The first prototype, a plain Claude call with the question in the prompt, hallucinates clause numbers. Someone suggests fine-tuning; someone else suggests RAG. They solve different problems, so 'which is better' is the wrong frame; what matters is which problem the team actually has, and what each adaptation technique costs in time, data, and recurring spend.

Read article 05 May 2026 · 14 min read

The Greenbox Story · Finding the Fit

Assumption Mapping: Testing What You Believe

The team lists everything they believe but haven't validated, ranks by risk, and discovers that their most confident assumption is the one most likely to be wrong.

Read article 04 May 2026 · 19 min read

Exam Room · GenAI

How to Take a Foundation Model from Pick to Production Endpoint

A product team wants a chatbot that summarises support tickets. They have the tickets, a cloud account, and no ML background. Somebody says 'use a foundation model'. Between that sentence and a working endpoint sit roughly seven distinct stages, each with its own AWS service and its own decisions. Picking the model is the easy part; the real work is figuring out which stages this team can skip, which they absolutely cannot, and what AWS gives them at each step.

Read article 02 May 2026 · 10 min read

Under the Hood · The AI Field Guide

The Reranker You Didn't Know You Needed

RAG explanations stop at 'embed the query, look up the nearest documents, hand them to the LLM.' That's the demo. In production, there's a second pass between the lookup and the LLM, and it's the one that actually makes retrieval work.

Read article 01 May 2026 · 7 min read

Consulting and Craft · Through the Kitchen

The Knife in My Hand

A sharp knife is safer than a dull one. Pick the right tool for the job, then put in the practice. Speed comes from practice, not pressure. A few focussed, well-maintained, well-practised tools will more predictably take you further than the latest beautiful shiny offering.

Read article 30 Apr 2026 · 12 min read

Under the Hood · Time

Time Is Weirder Than You Think

Time doesn't flow at the same rate everywhere. It slows near massive objects, dilates at high speeds, and might not 'flow' at all. From GPS corrections to black holes, the physics that makes time strange.

Read article 29 Apr 2026 · 16 min read

Exam Room · GenAI

Picking the AWS AI Service Tier for Each Feature

A product manager with no ML background has been told to add AI to a SaaS product, and has heard of Bedrock, SageMaker, Comprehend, Translate, Textract, Rekognition. AWS has three different shapes of AI offering, and the shortest path depends entirely on whether a ready-made service already does the job.

Read article 28 Apr 2026 · 16 min read

The Greenbox Story · Finding the Fit

Jobs to Be Done: Why Subscribers Actually Stay

Greenbox discovers that subscribers don't stay for fresh vegetables. They stay because they don't have to think about dinner on Tuesday.

Read article 27 Apr 2026 · 15 min read

Exam Room · Architecture

Choosing an S3 Storage Class for Cold Archives

Some data exists for compliance, not for use. Tens of terabytes of records sitting untouched until an auditor wants them. S3 has eight storage classes; only one of them is built for that pattern, and getting it wrong can cost an order of magnitude in a year you weren't paying attention to the bill.

Read article 25 Apr 2026 · 12 min read

Under the Hood · The AI Field Guide

The Other Transformers

BERT and T5 are transformers too, but they aren't trying to be ChatGPT. They're trying to be the boring layer underneath, classifiers, embeddings, structured transformations, and they're often a better answer than an LLM.

Read article 24 Apr 2026 · 27 min read

The Workshop

The Workshop: User Story Mapping

A pattern for laying out the whole user experience as a left-to-right narrative and then slicing it into releases, so the team can see both the shape of the thing they're building and the thinnest honest version they can ship first.

Read article 23 Apr 2026 · 17 min read

Under the Hood · Time

Ticks or Tocks?

A second used to be a fraction of the day. Now it's defined by the vibrations of a caesium atom, and even that might not be precise enough. From quartz watches to optical lattice clocks, the story of how we learned to count time.

Read article 22 Apr 2026 · 15 min read

Exam Room · Architecture

Routing to the Closest Healthy Region

A multi-region application needs to route requests to the closest healthy region, failing over automatically when the preferred one drops out, with no client-side retries and no extra health-check plumbing to maintain. Route 53 can do all of that in a single record set. Finding the correct combination means touring all seven routing policies and the attributes that separate them.

Read article 21 Apr 2026 · 8 min read

The Greenbox Story · Shipping What Matters

Accessibility: A Product Decision, Not a Compliance Tick

A subscriber sends Greenbox an email that changes how they think about the signup flow. Not a complaint, a question the team didn't know they needed to answer.

Read article 20 Apr 2026 · 13 min read

Under the Hood · Time

What Day Is It?

The date next to the time on your phone is its own kind of fragile. Calendars argue with the moon, the sun, and each other; whole days have been deleted by decree; and the year number on your screen depends on which monk's arithmetic your ancestors trusted.

Read article 19 Apr 2026 · 22 min read

The Workshop

The Workshop: Impact Mapping

A pattern for tracing a business goal through the people whose behaviour has to change, through the changes themselves, to the things you might build — so the team can tell the difference between work that moves the number and work that just feels productive.

Read article 18 Apr 2026 · 12 min read

The Greenbox Story · Shipping What Matters

User Story Mapping: Seeing the Whole

The Greenbox team has a growing backlog of stories but no sense of the whole. User Story Mapping lays out the full user journey, shows the gaps, and makes release planning obvious instead of political.

Read article 17 Apr 2026 · 21 min read

The Workshop

The Workshop: Sprint Planning

A pattern for turning a refined backlog into a sprint the team believes in: a goal, a set of stories, task breakdown, and an explicit commitment, in one focused session.

Read article 16 Apr 2026 · 23 min read

Under the Hood · Time

What Time Is It?

The hour on your phone is a fragile compromise between the sun and politics. Sundials, shipwrecks, railway time, DST, and the volunteer-maintained database that keeps the world's clocks roughly honest.

Read article 15 Apr 2026 · 18 min read

The Workshop

The Workshop: Example Mapping

A pattern for breaking a user story into rules and concrete examples in twenty-five minutes, so the team knows whether it's ready to build before the sprint starts. Facilitator's playbook, failure modes, and what to do with the cards afterwards.

Read article 14 Apr 2026 · 13 min read

The Greenbox Story · Shipping What Matters

Impact Mapping: Connecting Work to Goals

The Greenbox team is shipping features, but are they the correct ones? Impact Mapping connects engineering work to business goals, and reveals that the obvious next feature isn't always the important one.

Read article 13 Apr 2026 · 22 min read

The Workshop

The Workshop: Event Storming a Process

The default Event Storming session and the one you'll run most often. One process, one wall, everyone who touches it in the room, three hours. You leave with a precise shared model of the flow and a short list of the questions it raised. Where Big Picture looks for shape across a whole domain, Process Level looks for precision within one flow.

Read article 11 Apr 2026 · 25 min read

The Workshop

The Workshop: Event Storming a Domain

The widest zoom, and Brandolini's own entry point to Event Storming. Stand a whole business (or a whole product) in front of one wall with everyone who touches it. The output isn't a design or a plan; it's one picture that every department recognises, and a shortlist of the places worth digging into next.

Read article 09 Apr 2026 · 9 min read

The Greenbox Story · Shipping What Matters

Teaching Your LLM the Codebase: CLAUDE.md and AGENTS.md

The Greenbox team's actual CLAUDE.md and AGENTS.md files. What goes in them, how they're structured, and how they shape the code an LLM generates.

Read article 08 Apr 2026 · 9 min read

The Greenbox Story · Shipping What Matters

Teaching Your LLM the Codebase

Tom and Priya are both using LLMs to write code. They're getting different results. Not wrong, different. CLAUDE.md is how the team teaches the LLM to write code like them.

Read article 07 Apr 2026 · 18 min read

The Greenbox Story · Shipping What Matters

Behaviour-Driven Development: From Stories to Working Software

Example Maps become stories. Stories become tests. Tests drive code. And in a world where LLMs can write the code, the discovery work matters more than ever, because the bottleneck isn't implementation any more; it's knowing what to build.

Read article 05 Apr 2026 · 5 min read

Consulting and Craft · Through the Kitchen

The Quiet Jar in the Fridge

My last sourdough starter died through a quiet chain of postponed feeds. I'm starting a new one today. Most of what I'm learning as I begin again, I wish I'd known a decade earlier about codebases.

Read article 04 Apr 2026 · 20 min read

The Greenbox Story · From Chaos to Clarity

Sprint Planning: Turning Sticky Notes into Delivery

Walls of sticky notes, a prioritised backlog, concrete examples, and six weeks until the funding deadline. The team has done the discovery. Now they need a rhythm for delivery.

Read article 02 Apr 2026 · 26 min read

Under the Hood · The AI Field Guide

To LLMs… and Beyond!

LLMs are one corner of a much larger field. Diffusion models, reasoning models, multimodal systems, open-weight vs closed — what they are, how they differ, and how to choose.

Read article 31 Mar 2026 · 17 min read

The Greenbox Story · From Chaos to Clarity

Example Mapping: Making Stories Concrete

Four colours of card, twenty-five minutes, and a vague story becomes something you can actually build. Example Mapping is the bridge between understanding and implementation, and the technique you'll use more than any other.

Read article 26 Mar 2026 · 36 min read

Under the Hood · The AI Field Guide

How LLMs Actually Work

Tokens, transformers, attention, and the training pipeline: what large language models actually do when they 'predict the next token', why they hallucinate, and why they're so good at code.

Read article 24 Mar 2026 · 28 min read

The Greenbox Story · From Chaos to Clarity

Event Storming: Building Shared Understanding

The Greenbox team covers a wall in sticky notes and discovers they've been building on assumptions. A deep dive into Event Storming, the workshop that gets an entire domain out of one person's head and into shared understanding.

Read article 17 Mar 2026 · 19 min read

The Greenbox Story · From Chaos to Clarity

Retrospectives: Catching the Wrong Kind of Fast

A small team, a good idea, LLMs generating code at lightning speed, and four weeks of building the wrong thing. The first part of a series on getting from discovery to delivery without wasting everyone's time.

Read article 12 Mar 2026 · 3 min read

Consulting and Craft · In Practice

The Value Is in Ideas, Not Code

LLMs have made code implementation almost trivial. The bottleneck has shifted from writing code to knowing what to ask for. Your library of patterns, concepts, and hard-won experience is now your competitive advantage.

Read article 10 Mar 2026 · 15 min read

The Greenbox Story · From Chaos to Clarity

Minimum Viable Product: The First Box

Twenty-two people signed up from a flyer at the Margaret River farmers market. The first delivery was a disaster: wrong addresses, wilted spinach, and a box that arrived on Wednesday instead of Thursday. Maya cried in her car. Then she drove to the next address.

Read article 03 Mar 2026 · 12 min read

The Greenbox Story · From Chaos to Clarity

Customer Discovery: Before the First Line of Code

Maya grew up on a farm outside Margaret River. She left to study computer science, spent a decade in consulting, and came back to Perth with an idea that wouldn't let go: what if the best produce in Western Australia could reach people's doors every week?

Read article 02 Oct 2014 · 1 min read

All articles

Running an IPv6-Only VPC in Practice (preview)

Choosing the Layer for Egress Filtering (preview)

Designing Centralised Packet Inspection With GWLB (preview)

Picking the Right Tool to Debug VPC Reachability (preview)

How to Carry Market Data Multicast Into AWS (preview)

Plugging SD-WAN Into a Transit Gateway (preview)

Picking CloudFront or Global Accelerator at the Edge (preview)

When to Use VPC Lattice for Service-to-Service Traffic (preview)

Steering Traffic With BGP Attributes (preview)

How to Allowlist Egress by Hostname (preview)

Setting Up Hybrid DNS Resolution Both Ways (preview)

Expose One Service, Not the Whole VPC (preview)

Choosing Between Cloud WAN and a Transit Gateway Mesh (preview)

Designing Direct Connect to Survive a Fibre Cut (preview)

How to Hit Four Nines on Direct Connect (preview)

Cloud WAN for Multi-Region Transit Gateway Sprawl (preview)

How to Externalise Authorisation With Verified Permissions (preview)

How to Scan EC2 and S3 With GuardDuty Malware Protection (preview)

How to Encrypt a Card Number From the Edge to the Processor (preview)

How to Rate-Limit GraphQL Operations Through WAF (preview)

Designing a Hierarchy in AWS Private CA (preview)

Choosing Between KMS, CloudHSM, and KMS Custom Key Stores (preview)

How to Grant Cross-Account Access to Humans and Workloads (preview)

Picking Between Network Firewall, WAF, Shield, and DNS Firewall (preview)

Choosing Between Config Conformance Packs and Audit Manager (preview)

Aggregating Findings With Security Hub (preview)

How to Use Detective to Cut Incident MTTR (preview)

Operationalising Inspector Findings (preview)

CloudTrail Lake for Long-Term Audit Queries (preview)

When Shield Advanced Earns Its Subscription Over Shield Standard (preview)

Choosing Between Managed, Rate-Based, and Custom WAF Rules (preview)

When to Reach for Flow Logs, GuardDuty, or Detective (preview)

How to Share KMS Key Material Across Two Regions (preview)

Choosing Between IAM Access Analyzer's Three Analyzer Types (preview)

Scanning S3 for PHI With Macie (preview)

Prioritising Inspector Findings by Reachability (preview)

How to Investigate One Incident With GuardDuty, Flow Logs, and Detective (preview)

Choosing Between Parameter Store and Secrets Manager Across a Mixed Inventory (preview)

KMS Grants for Per-Execution Decrypt (preview)

How to Federate Okta to AWS Across an Organization Without Long-Lived Keys (preview)

Cross-Account S3 With Customer-Managed KMS (preview)

Application Insights Without Dashboards (preview)

Chaos Engineering With AWS Fault Injection Service (preview)

Recovery Objectives Made Measurable (preview)

Patching On-Premises Alongside Cloud (preview)

Always-On Vulnerability Scanning (preview)

How to Automate Incident-Response Runbooks With Step Functions and SSM (preview)

Service Catalog vs AppConfig (preview)

How to Reconcile CloudFormation Stack Drift Without Losing Fixes (preview)

Choosing CodeBuild Compute: On-Demand, Reserved Fleets, or Lambda (preview)

Backup Compliance Reports From AWS Backup (preview)

Choosing a Container Runtime for Mixed Workload Shapes on AWS (preview)

GitOps on EKS: Pull vs Push (preview)

AWS Config vs Audit Manager (preview)

CodeArtifact and Build Caching for CI (preview)

How to Produce Auditable Patch Compliance Reports for EC2 Fleets (preview)

How to Share a CodePipeline Artefact KMS Key Across Accounts (preview)

How to Set Up Continuous Audit Evidence Collection Across an AWS Organization (preview)

EventBridge Pipes vs Glue Lambdas (preview)

How to Build Multi-Account Multi-Region Pipelines With CDK Pipelines (preview)

How to Set Up Blue/Green ECS Deploys With CodeDeploy (preview)

How to Route Operational Events Across AWS Accounts With EventBridge (preview)

How to Deploy a Security Baseline Across an AWS Organization (preview)

How to Choose a CodeDeploy Config for Lambda With Auto-Rollback (preview)

How to Train a Reinforcement Learning Policy on SageMaker (preview)

Shadow, Canary, or Linear: Rolling Out a New Model (preview)

How to Serve Thousands of Per-Tenant Models on One Endpoint (preview)

How to Train a Classifier on Heavily Imbalanced Data (preview)

How to Right-Size a SageMaker Endpoint With Inference Recommender (preview)

When SageMaker Training Compiler Actually Helps (preview)

How to Use SageMaker Lineage Tracking to Answer Audits (preview)

How to Add a Custom Metric to SageMaker Model Monitor (preview)

Choosing Between SageMaker Real-Time Endpoints and Batch Transform (preview)

When To Reach for Async SageMaker Inference (preview)

XGBoost, LightGBM, or DeepAR: Picking the Right Built-In (preview)

How to Deploy a SageMaker Endpoint Across Multiple Regions (preview)

Choosing Bedrock or SageMaker JumpStart for a RAG Chatbot (preview)

How to Build a Multilingual Invoice Pipeline With Textract, Translate, and Comprehend (preview)

How to Run SageMaker Training Jobs on Spot Capacity (preview)