Archive
All articles
Exam Room · Networking
Running an IPv6-Only VPC in Practice (preview)
IPv4 space has become a cost line item. Public IPs bill by the hour. RFC1918 is exhausted in the M&A portfolio. The question of whether to run an IPv6-only VPC has moved from 'interesting experiment' to 'concrete ask.' The dataplane supports it. The services mostly support it. The edge cases are where the work lives: legacy clients that only speak IPv4, third-party services that haven't caught up, instance-types that do or don't have IPv6. Understanding what an IPv6-only VPC looks like in practice is the difference between a clean launch and a post-migration support queue.
Read articleExam Room · Networking
Choosing the Layer for Egress Filtering (preview)
A requirement to constrain outbound traffic to an allowlist of destinations. The interesting choice isn't which AWS service to buy, it's which OSI layer to stop the traffic at. The lower we inspect, the less we see and the less we can break; the higher we inspect, the more granular the policy and the more we have to terminate. Which layer matches the policy decides how much we see, how much we can control, and what we pay in complexity.
Read articleExam Room · Networking
Designing Centralised Packet Inspection With GWLB (preview)
An organisation has decided every packet leaving or entering a VPC must pass through a vendor firewall for deep inspection. Running firewall appliances per VPC is a licensing nightmare; having each VPC route traffic through a central firewall VPC is the plan. AWS has a primitive for centralised inspection; what matters is what that primitive actually has to deliver for deep-packet inspection to work in practice: the appliance must see the real 5-tuple (no NAT in the way), both directions of a flow must land on the same appliance, the pool must scale horizontally, multiple VPCs must share one fleet, and return paths have to stay symmetric or stateful rules will silently drop legitimate traffic.
Read articleExam Room · Networking
Picking the Right Tool to Debug VPC Reachability (preview)
A ticket comes in: 'the app in the prod VPC can't reach the RDS instance in the data VPC, worked yesterday, nothing changed.' Nothing ever changed. AWS ships more than one tool that answers 'can this thing reach that thing', and the tools answer subtly different questions. Knowing which one matches the question on the ticket is the difference between a one-minute answer and an afternoon of tcpdump.
Read articleExam Room · Networking
How to Carry Market Data Multicast Into AWS (preview)
A market data pipeline that ingests multicast feeds from an exchange's colocation facility. On-prem, this is routine: IGMP on the switch, multicast routing in the core, consumers join groups and receive. In AWS, multicast has historically been absent from the VPC fabric. There are a handful of ways to get multicast working in the cloud, ranging from rewriting every consumer to keeping the protocol exactly as it is on-prem, and the right one depends on whether the network can carry multicast natively or whether we build the replicator ourselves.
Read articleExam Room · Networking
Plugging SD-WAN Into a Transit Gateway (preview)
An SD-WAN vendor's appliance running in the VPC, carrying branch-office traffic. Today it's stitched to the Transit Gateway via IPsec over a VPN attachment, which means encrypting twice and accepting a throughput ceiling AWS imposes on VPN. The TGW has more than one way to ingest routes from a customer-owned appliance, and which attachment type fits depends on whether traffic between the appliance and the TGW is already inside AWS's trust boundary.
Read articleExam Room · Networking
Picking CloudFront or Global Accelerator at the Edge (preview)
A global application whose users complain about latency from Sydney, intermittent connection failures from Mumbai, and a jittery experience everywhere that isn't Europe. The origin is in eu-west-1. AWS has more than one edge service that can put a user closer to the origin, and they do very different jobs. Picking between them is a question of protocol, cacheability, and whether the workload needs a static IP that customer firewalls can allowlist.
Read articleExam Room · Networking
When to Use VPC Lattice for Service-to-Service Traffic (preview)
Dozens of microservices across half a dozen VPCs, each one exposed via its own NLB or ALB, each one reachable via PrivateLink endpoints in every consumer VPC, each one tied into a per-service IAM policy for who can call it. The network team is managing load balancers; the platform team is managing endpoints; nobody owns the overall shape. AWS has a primitive that bundles service discovery, IAM-based authorisation, and cross-VPC routing into one thing; whether that consolidation earns its place depends on how many services we're actually running.
Read articleExam Room · Networking
Steering Traffic With BGP Attributes (preview)
Two Direct Connect links, one primary and one backup, and a business requirement that says 'backup is backup, outbound traffic goes out the primary, return traffic comes back the primary, and everything fails over cleanly.' BGP has five attributes that decide which route wins, and only two of them are levers we can pull with AWS. Understanding local preference, AS-path prepending, MED, and AWS's Direct Connect BGP community tags is what gets you there.
Read articleExam Room · Networking
How to Allowlist Egress by Hostname (preview)
Compliance has a new requirement: egress from every production workload must be allowlisted by destination hostname, not IP address, and every denied connection must leave an audit trail. Security groups and NACLs are the wrong abstraction; they work on IPs and ports. The interesting question is which layer of the egress path actually sees a hostname, and which AWS service does the matching with the right blend of managed-ness and audit depth.
Read articleExam Room · Networking
Setting Up Hybrid DNS Resolution Both Ways (preview)
A hybrid estate where Windows-based corporate services still live on-prem under corp.example.com, and new AWS workloads resolve cloud.example.com via Route 53 Private Hosted Zones. An EC2 instance needs to look up both. Today, it works for cloud names and silently fails for on-prem ones. Hybrid DNS is two directions of resolution. AWS to on-prem and on-prem to AWS, and the work is figuring out where the forwarders live, who owns the configuration, and how the split-horizon edge cases behave.
Read articleExam Room · Networking
Expose One Service, Not the Whole VPC (preview)
A payments service in one VPC. A dozen consumer teams across half the company who need to call it. The first instinct is to attach everything to a Transit Gateway and let the route tables sort it out. The better question is whether the consumers need a whole network path into the producer's VPC, or just a TCP socket against the one service. Which abstraction is correct depends on whether we're running a service or sharing a subnet.
Read articleExam Room · Networking
Choosing Between Cloud WAN and a Transit Gateway Mesh (preview)
A company with 30 AWS Regions, a handful of branch offices on every continent, and a Transit Gateway tangle that nobody can draw on a whiteboard anymore. Peering meshes across Regions, route tables copy-pasted between accounts, and the network team has finally said 'no more' to any new region request. The interesting question isn't whether to keep scaling the mesh, it's whether the operational debt is now large enough that a different abstraction earns its place, and what we give up to switch.
Read articleExam Room · Networking
Designing Direct Connect to Survive a Fibre Cut (preview)
A single 10 Gbps Direct Connect link carrying every byte between the on-prem data centre and AWS. Finance is happy, the network team is nervous, and the business is about to depend on this path for a quarterly close that cannot slip. Four resilience shapes exist in the Direct Connect catalogue, each with a different blast radius, a different price tag, and a different SLA. Every shape is most resilient at something; the work is figuring out which failure modes we're actually buying insurance against.
Read articleExam Room · Networking
How to Hit Four Nines on Direct Connect (preview)
Five gigabits of replication traffic over a single 10 Gbps Direct Connect, a compliance team asking for four nines, and five plausible topologies on the table. The one that lives up to the paperwork isn't the cheapest and isn't the most expensive, it's the one where a whole building can vanish and the traffic keeps moving. Picking it means knowing what each Direct Connect resilience model survives, and which tier of SLA each one actually signs you up for.
Read articleExam Room · Networking
Cloud WAN for Multi-Region Transit Gateway Sprawl (preview)
Eight Transit Gateways, five regions, two hundred attachments, and three engineers spending half a release window arguing about route table entries. The way out isn't a cleverer Terraform module, it's a higher layer that takes declarative segment policy and materialises the routes itself. Finding the right layer means touring what AWS actually offers above raw TGW route tables.
Read articleExam Room · Security
How to Externalise Authorisation With Verified Permissions (preview)
Cognito handles who is signed in. What they can do inside the app (edit this document, view that folder, approve that workflow) is a different question, usually answered by code scattered across services. Verified Permissions is the managed policy engine AWS built for exactly that: Cedar policies, hierarchical resources, batched evaluation, and a clean separation between authentication and authorisation.
Read articleExam Room · Security
How to Scan EC2 and S3 With GuardDuty Malware Protection (preview)
GuardDuty sees the packets. Inspector sees the installed packages. Neither looks at what's actually on disk: whether a cryptominer has been written to /tmp, whether a reverse shell is sitting in /usr/local/bin. GuardDuty Malware Protection is the extra tier that takes a snapshot, spins up a dedicated scanner, and tells us what's in the filesystem. Knowing when it fires and what it can't see is the difference between a real defence and a tick-box.
Read articleExam Room · Security
How to Encrypt a Card Number From the Edge to the Processor (preview)
TLS protects every field in a form in transit; it doesn't protect any of them at the origin. A credit-card number posted through the edge arrives at the ALB in plaintext, at the application in plaintext, in application logs and error traces and APM spans in plaintext. PCI says encrypt the sensitive fields; the harder work is deciding where in the request path the encryption should happen, which keys live where, and how to shrink the PCI scope without rewriting every service the request passes through.
Read articleExam Room · Security
How to Rate-Limit GraphQL Operations Through WAF (preview)
Every GraphQL request is a POST to the same URL with the query inside the JSON body. WAF's pattern rules that trigger on URI path don't help; rate-based rules by endpoint don't distinguish a cheap query from an expensive one. The answer is JSON body inspection: WAF rule statements that parse the POST body, navigate into the query string, and pattern-match or rate-limit on what's actually happening at the GraphQL layer.
Read articleExam Room · Security
Designing a Hierarchy in AWS Private CA (preview)
A root CA with a 20-year key, offline in an HSM, that signs two subordinate CAs: one for production workloads, one for developer environments. Each subordinate signs its own leaf certificates with short lifetimes. A managed private CA is a foregone conclusion; the alternative is the Windows server nobody can log into. The decisions that matter are how many tiers the hierarchy needs, what key types and templates each tier issues with, and what revocation model keeps a single compromise from costing us the whole PKI.
Read articleExam Room · Security
Choosing Between KMS, CloudHSM, and KMS Custom Key Stores (preview)
KMS is the default answer for "we need a key." CloudHSM is the answer for "we need a key AND we need to be the only ones who can touch it, AND we need FIPS 140-3 Level 3, AND we need a PKCS#11 interface for that legacy Java client." The difference between them is less about capability and more about where the raw key material actually lives, who can get to it, and how much of the operational work is yours versus AWS's.
Read articleExam Room · Security
How to Grant Cross-Account Access to Humans and Workloads (preview)
Forty accounts, two hundred engineers, and the perennial question: how do they get into the accounts they need and nothing else? IAM Identity Center permission sets and cross-account IAM roles both answer it, but in different shapes. One is a scalable, identity-centric pattern that works for humans; the other is the building block that's older, lower-level, and still necessary for programmatic access. Picking the correct one is less about which is better and more about what the caller is.
Read articleExam Room · Security
Picking Between Network Firewall, WAF, Shield, and DNS Firewall (preview)
Network Firewall, WAF, and Shield sound like overlapping answers to the same question. They aren't. Each sits at a different layer of the network, each sees a different slice of traffic, each is good at a different kind of threat. Picking the correct one (or the correct combination) starts with what we're trying to protect and works outward to what the traffic looks like when it arrives.
Read articleExam Room · Security
Choosing Between Config Conformance Packs and Audit Manager (preview)
An auditor wants evidence the org is meeting PCI DSS 4.0.1. One team reaches for AWS Config conformance packs; another reaches for Audit Manager. Both produce dashboards, both produce reports, and both cost money, but they're doing very different work. Config is the continuous compliance engine; Audit Manager is the evidence-collection and assessment workflow on top. Picking the wrong one leaves the other half of the job undone.
Read articleExam Room · Security
Aggregating Findings With Security Hub (preview)
GuardDuty in twelve accounts. Inspector finding vulnerabilities across EC2, ECR, and Lambda. Macie flagging sensitive S3 objects. IAM Access Analyzer catching public exposure. Five different consoles, five different finding formats, one security team that doesn't have time to open five tabs. Security Hub is the aggregator that turns finding streams into a single normalised queue, and using it well is mostly about what you route, how you score, and who you route it to.
Read articleExam Room · Security
How to Use Detective to Cut Incident MTTR (preview)
A GuardDuty finding on an EC2 instance that leads to questions no single log source answers: what else was that role used for, which accounts did the session touch, is this user's behaviour unusual for a Tuesday? Those are graph questions. Adding an investigation tool is the easy part; the harder work is shaping the graph so it can answer 'what else' under incident pressure, knowing what it still can't tell us, and walking from an entity profile to a containment decision in the time the incident gives us.
Read articleExam Room · Security
Operationalising Inspector Findings (preview)
A Lambda function written three years ago, still in production, still using a library with a known RCE. A container image that passed review in 2024 and has quietly rotted since. An EC2 AMI that was hardened at launch and hasn't been looked at since. Three workloads, one vulnerability story. The org-wide scanner has been on for months and the findings are piling up; the work now is triaging what it found, what it can and can't tell us, and who owns remediation when a 2023 vulnerability lands in a 2026 inbox.
Read articleExam Room · Security
CloudTrail Lake for Long-Term Audit Queries (preview)
An auditor asking for every AssumeRole call into the production account for the last seven years. A SIEM bill dominated by CloudTrail forwarding. An incident responder grepping gzipped JSON in an S3 bucket at 2am. Three jobs that look like they want the same thing. CloudTrail data, and three jobs that pull in opposite directions when the storage format is just gzipped JSON in S3. Lake is what happens when AWS stops making customers build that pipeline themselves.
Read articleExam Room · Security
When Shield Advanced Earns Its Subscription Over Shield Standard (preview)
Shield Standard is on by default, free, and genuinely useful. Shield Advanced is $3,000 a month per organisation, a lot of paperwork, and, on the quarter when a volumetric attack lands on a revenue-bearing endpoint, probably the cheapest insurance AWS sells. Picking between them is less about the attack surface and more about the budget, the team, and whether the DDoS response team on speed-dial is worth more than the subscription.
Read articleExam Room · Security
Choosing Between Managed, Rate-Based, and Custom WAF Rules (preview)
A login endpoint under a credential-stuffing run, a news site whose comment form is getting SQL-injected, and a tier-one PCI application that has to prove to an auditor that it blocks the OWASP Top Ten. Three problems, three kinds of WAF rule, three different ways the rule decides whether a request is hostile. Picking the wrong kind wastes effort at best and lets the attacker through at worst; picking the correct kind is almost the whole job.
Read articleExam Room · Security
When to Reach for Flow Logs, GuardDuty, or Detective (preview)
VPC Flow Logs tell us a packet moved; GuardDuty tells us it looks suspicious; Detective tells us what else that actor has been doing. Same investigation, three stages, three services that talk to each other so precisely that people often assume they are one thing. They aren't, and the difference matters when a finding fires at 03:00 and the on-call has to decide which console to open first.
Read articleExam Room · Security
How to Share KMS Key Material Across Two Regions (preview)
A SaaS company runs active-active in us-east-1 and eu-west-1. Every ciphertext is wrapped in us-east-1 and replicated to eu-west-1, where the decrypt fails because the customer-managed KMS key is regional. They refuse to stand up a decrypt proxy and refuse to route every read back across the Atlantic. This scenario asks for a solution that puts the same key material in both places, one key id, two regional arms, symmetric decrypt local to whichever side the reader is on.
Read articleExam Room · Security
Choosing Between IAM Access Analyzer's Three Analyzer Types (preview)
A security team at a fintech wants three things: every resource shared outside the AWS Organization in one list; a way to catch an accidental 'Principal: *' before a change merges; and a report of IAM roles and permissions nobody has used for ninety days. Three asks that sound like three products turn out to be three analyzer types of one service, enabled from one delegated administrator and reading across every account.
Read articleExam Room · Security
Scanning S3 for PHI With Macie (preview)
A healthcare SaaS has 4,000 S3 buckets across 30 accounts and an auditor who wants proof that no PHI has turned up anywhere it shouldn't. The ask is continuous scanning for SSNs, medical record numbers, dates of birth, and card numbers; automated alerting when a match appears; and a dashboard that ranks buckets by sensitivity rather than by filename. One managed service covers all three, enabled once at the organisation level.
Read articleExam Room · Security
Prioritising Inspector Findings by Reachability (preview)
Four hundred EC2 instances, eighteen hundred Lambda functions, two hundred ECR images, and no vulnerability scanner in the account. Compliance wants evidence of coverage and remediation timelines; the security team wants CVEs ranked by CVSS, by whether an exploit is known, and by whether the thing is even reachable from the internet. One managed service covers all three surfaces with a prioritisation signal built from the attributes that matter, enabled once across the whole organisation.
Read articleExam Room · Security
How to Investigate One Incident With GuardDuty, Flow Logs, and Detective (preview)
An EC2 instance allegedly pushed 4 GB of data to an unusual external IP over three hours. The security team has VPC Flow Logs in CloudWatch, a GuardDuty finding that flagged the behaviour, and Detective switched on across the organisation. Each of the three services does a different job: Flow Logs hold the raw per-flow evidence, GuardDuty raised the alarm with a classification, Detective pivots across the evidence to build a timeline. Pick the wrong one and you either drown in records or stare at a single finding with no context.
Read articleExam Room · Security
Choosing Between Parameter Store and Secrets Manager Across a Mixed Inventory (preview)
A platform team has 150 secrets scattered across Parameter Store, Secrets Manager, and env vars committed to S3. They want one canonical home with database rotation, hierarchical naming, cross-region replication, and a reasonable bill. Secrets aren't one kind of thing, rotating database credentials, vendor API keys, and signing keys behave differently, and the storage choice should follow the rotation lifecycle each one needs.
Read articleExam Room · Security
KMS Grants for Per-Execution Decrypt (preview)
A Lambda fleet across twelve spoke accounts needs to call kms:Decrypt on a shared customer-managed key, but for one execution, on one event, with a scoped permission that vanishes when the handler returns. Key policies and blanket IAM grants are both too coarse. The interesting work is matching the per-execution, per-envelope, audit-per-event shape to the right KMS authorisation instrument.
Read articleExam Room · Security
How to Federate Okta to AWS Across an Organization Without Long-Lived Keys (preview)
Thirty-five AWS accounts, four hundred engineers, one corporate Okta, and a pile of long-lived IAM access keys that nobody can fully account for. The brief is to collapse all of it into one Okta change per joiner and leaver, with MFA everywhere and no static credentials anywhere. Getting there means touring what federated access on AWS actually looks like, and what each service will and will not do.
Read articleExam Room · Security
Cross-Account S3 With Customer-Managed KMS (preview)
A BI team in one account needs to read encrypted objects in a data platform account. One bucket. One customer-managed key. Two accounts. Three policies, and every one of them has to say yes. Miss any single lock and the request dies in a different place, with a different error, depending on which lock shut.
Read articleExam Room · DevOps
Application Insights Without Dashboards (preview)
A .NET service on EC2 behind an ALB with a SQL Server database, a team that has drilled into CloudWatch Metrics three hundred times looking for 'why is this slow,' and an aggregate noise floor that makes a real regression hard to spot. Picking the right AWS box is the easy part; what an automated problem-detection layer actually has to do to be useful on this estate is correlate signals across components instead of firing one alarm per metric, know the stack well enough to monitor IIS workers and SQL queries without hand-rolled configuration, baseline the patterns that change by time-of-day, and feed the operational queue without being yet another thing the team maintains.
Read articleExam Room · DevOps
Chaos Engineering With AWS Fault Injection Service (preview)
Resilience reviews sign off the architecture; game days sign off the application. A managed fault-injection service sits between them: declarative experiments that stop an EC2 instance, throttle a DynamoDB table, or drop network traffic to an AZ, with safety rails so a test that starts hurting production aborts itself. Understanding experiment templates, actions, targets, and the abort surface is the difference between a controlled test and an incident.
Read articleExam Room · DevOps
Recovery Objectives Made Measurable (preview)
Every service the team runs claims a 4-hour RTO and 1-hour RPO in the design doc. Nobody has actually measured whether the numbers hold, and last year a Tier 2 app took eleven hours to recover. The interesting work isn't whether some AWS tool can produce a number (one can); it's what an honest, ongoing measurement of recovery posture has to cover: which disruption types, what the model can see in infrastructure-as-code, what it can't (application logic, runbook clock time), and how the resulting score becomes evidence the auditor will accept.
Read articleExam Room · DevOps
Patching On-Premises Alongside Cloud (preview)
Four hundred EC2 instances, two hundred on-prem servers in a data centre connected via Direct Connect, and a single compliance baseline that says 'patch cadence is the same everywhere.' The interesting work isn't whether AWS can patch a server it doesn't own (it can); it's what an 'identical policy, different maintenance windows' arrangement looks like in practice, where the evidence lives, and what happens when a hybrid node loses connectivity mid-patch.
Read articleExam Room · DevOps
Always-On Vulnerability Scanning (preview)
EC2 instances, Lambda functions, and container images sitting in ECR, each a potential surface for a known CVE, each with a different way to find out. Weekly third-party scans catch things after they land. Continuous CVE scanning is a given; the work is figuring out which workload types are covered, what quietly defers to other tooling, and how findings flow into a remediation pipeline that doesn't drown the security team.
Read articleExam Room · DevOps
How to Automate Incident-Response Runbooks With Step Functions and SSM (preview)
A GuardDuty finding fires; someone on call looks at the runbook in Confluence; twelve steps later the EC2 instance is quarantined, a snapshot has been taken, the on-call has typed the AWS console password three times. Turning the twelve-step runbook into code, with deterministic execution, human-approval pauses, retries, compensation, and an audit trail, is what an automated incident response wants. The interesting choice is which orchestration model fits the shape.
Read articleExam Room · DevOps
Service Catalog vs AppConfig (preview)
Twelve teams wanting to self-serve an internal platform (VPC, IAM roles, CI pipelines, logging baseline) without asking platform engineering every time. Feature flags and configuration that should update in seconds, without a redeploy. Two different jobs, two different timelines: provisioning happens once per request, configuration changes happen many times per running application. Treating them as the same problem is the mistake that takes weeks to recover from.
Read articleExam Room · DevOps
How to Reconcile CloudFormation Stack Drift Without Losing Fixes (preview)
Forty CloudFormation stacks, an auditor's drift report showing eighty modified resources, and a team that can't remember which of those were incident fixes and which were accidents. Detection names the resources; the remediation is where the work lives. Reverting, re-declaring, importing, and previewing all produce different outcomes, and picking the correct one depends on what caused the drift in the first place.
Read articleExam Room · DevOps
Choosing CodeBuild Compute: On-Demand, Reserved Fleets, or Lambda (preview)
Two thousand CodeBuild runs a day across forty services, queue depth that spikes on merge storms, nine-minute builds when build capacity is cold and ninety-second builds when it's warm. CI compute comes in several billing shapes (pay-per-build-minute, pay-for-the-fleet, pay-per-millisecond), each honest for a different build profile. The decision isn't about which is fastest but which matches the workload, and the right answer for a forty-project monorepo isn't one shape for all of it.
Read articleExam Room · DevOps
Backup Compliance Reports From AWS Backup (preview)
Three accounts, eight resource types, a backup policy that already runs and a compliance ask that nobody wants to answer with screenshots. The activity is happening; the audit trail isn't. Turning a running backup programme into an auditor-ready artefact takes three separate things wired up, and which artefact answers which question depends on what the auditor asks.
Read articleExam Room · DevOps
Choosing a Container Runtime for Mixed Workload Shapes on AWS (preview)
A small service mesh of ten applications, each a container image in ECR. One handles a sustained 2,000 requests per second; one is invoked 800 times a day by a webhook; one is the team's favourite long-running worker. AWS has more than one runtime that will accept the same container image, and the correct answer changes per application. The interesting work is mapping request shape, cold-start tolerance, and networking constraints to the runtime that fits.
Read articleExam Room · DevOps
GitOps on EKS: Pull vs Push (preview)
Three EKS clusters, sixty teams deploying into them, a CodePipeline stage that runs kubectl apply against a kubeconfig file that nobody wants to rotate. Moving to GitOps means one decision about the controller and a dozen smaller ones about repository layout, RBAC, secrets, and how the cluster gets told what to run. The mental models the GitOps options offer differ enough (in multi-tenancy, in observability, in failure modes) to be worth thinking through before reaching for a tool.
Read articleExam Room · DevOps
AWS Config vs Audit Manager (preview)
AWS Config already says every S3 bucket in the Organization has default encryption enabled. The auditor wants the same statement in a different shape: a signed PDF with a framework reference, an assessor sign-off, and evidence attached to each control. The live compliance state and the audit artefact are different jobs, and a working audit story needs both, because each one alone leaves a hole the other fills.
Read articleExam Room · DevOps
CodeArtifact and Build Caching for CI (preview)
A monorepo of forty Node services, npm install eating eight minutes of every CodeBuild run, and a security finding that half the internal packages are installed straight from a public registry. The two problems are related: a private artefact registry that the builder doesn't cache against is the same wound as a cache that doesn't know about private packages. Fixing both is one story about where dependency traffic flows, and which layer caches it.
Read articleExam Room · DevOps
How to Produce Auditable Patch Compliance Reports for EC2 Fleets (preview)
Eight hundred EC2 instances, five operating-system flavours, a compliance team that wants to know which hosts are behind on security patches and a change-management team that wants a record of exactly what landed where. AWS has the moving parts to do this without buying tooling, but wiring them into a report someone outside the team will read takes a specific set of decisions about what counts as 'patched', when patching happens, and where the evidence lives.
Read articleExam Room · DevOps
How to Share a CodePipeline Artefact KMS Key Across Accounts (preview)
A CodePipeline in a tooling account, artefacts in an S3 bucket in the tooling account, deploy stages that run CloudFormation in staging and prod accounts. The pipeline is green in the console and red in every deploy account: 'AccessDenied' on the artefact download. The fix isn't another role, it's the customer-managed KMS key that signs the bucket contents, and who is allowed to decrypt with it.
Read articleExam Room · DevOps
How to Set Up Continuous Audit Evidence Collection Across an AWS Organization (preview)
A compliance team is six weeks out from a SOC 2 Type II audit across sixty AWS accounts. Today their evidence lives in a shared drive full of screenshots and a spreadsheet tracking which control each screenshot is meant to prove. They need the evidence pulled automatically, mapped to the controls the auditor will actually ask about, packaged into a report a human can sign off, and monitored continuously so drift between audits is caught the day it happens.
Read articleExam Room · DevOps
EventBridge Pipes vs Glue Lambdas (preview)
A platform team maintains roughly twenty tiny Lambda functions whose only job is to pull events from SQS, drop the ones that don't matter, reshape the rest, and hand them to SNS, DynamoDB, Step Functions, or Kinesis. Each one is forty lines of boilerplate around three lines of value, and each one bills for every invocation whether the event survived the filter or not. AWS has more than one answer for source-filter-target wiring, and the interesting choice is which layer the wiring belongs in.
Read articleExam Room · DevOps
How to Build Multi-Account Multi-Region Pipelines With CDK Pipelines (preview)
A platform team wants one pipeline for a microservice that flows dev → staging → prod across us-east-1 and eu-west-1, knows about every target account, gates prod on a human, and updates itself when the pipeline definition changes. CodePipeline with hand-wired CloudFormation actions gets close but has to be rebuilt every time the shape changes. CDK Pipelines generates the whole thing from code, self-mutates on each run, and treats accounts and regions as first-class arguments.
Read articleExam Room · DevOps
How to Set Up Blue/Green ECS Deploys With CodeDeploy (preview)
An ECS Fargate platform took eight minutes of elevated 5xx from a rolling deploy that left old and new tasks serving the same path with incompatible contracts. The fix is blue/green via CodeDeploy: two target groups, a test listener for smoke tests, an alarm gate for rollback, and a task definition revision the new task set is pinned to. The rolling controller can't give that shape; the CodeDeploy controller can.
Read articleExam Room · DevOps
How to Route Operational Events Across AWS Accounts With EventBridge (preview)
Five AWS accounts emit operational events. Two accounts want to consume them, the observability account for dashboards, the security account for a SOAR pipeline. The events need versioned schemas, replay for the SOAR pipeline, and a dead-letter path. EventBridge custom buses with cross-account targets are the one answer that ticks every box without a control plane of their own.
Read articleExam Room · DevOps
How to Deploy a Security Baseline Across an AWS Organization (preview)
A four-person platform team owns a security baseline that has to live identically in fifty AWS accounts, auto-apply to any new account the Org picks up, and shout when someone meddles with it. Logging into fifty consoles is not a plan. CloudFormation StackSets with service-managed permissions is the one answer that ticks every box without a control plane of its own.
Read articleExam Room · DevOps
How to Choose a CodeDeploy Config for Lambda With Auto-Rollback (preview)
AWS ships nine predefined CodeDeploy configurations for Lambda. Three families, canary, linear, and all-at-once, and most of the variation is just timing. The interesting part isn't the percentages; it's the alarm gate that turns any of them into a self-rolling-back deploy.
Read articleExam Room · Machine Learning
How to Train a Reinforcement Learning Policy on SageMaker (preview)
An energy-storage operator wants to train a policy that decides when to charge and discharge a battery to maximise arbitrage profit across a volatile electricity market. The business problem is straightforward; the ML framing is anything but. Supervised learning needs labelled 'correct decisions' that don't exist; it's a sequential-decision problem under uncertainty, which is what reinforcement learning is for. SageMaker RL exists for exactly this shape of problem, and the interesting work is knowing what it provides, what it doesn't, and how the pieces fit together into something that trains and deploys.
Read articleExam Room · Machine Learning
Shadow, Canary, or Linear: Rolling Out a New Model (preview)
A recommendation model has a new version ready to ship. The old one serves 200 RPS of production traffic; the new one has passed offline evaluation but nobody's willing to point the live firehose at it on faith alone. There are three deployment patterns built into SageMaker for exactly this moment, shadow, blue-green with canary, and linear traffic shifting, and they're not interchangeable. Which one fits depends on what kind of confidence you need to build, and what you're willing to spend to build it.
Read articleExam Room · Machine Learning
How to Serve Thousands of Per-Tenant Models on One Endpoint (preview)
A personalisation team has trained 4,000 per-tenant churn models, one per customer account, and deploying each to its own endpoint would cost $24,000 a month in minimum-instance floors before a single inference is served. The interesting work isn't picking the lowest-cost row off a price sheet; it's understanding which traffic patterns suit a model-per-endpoint shape, which suit a many-models-shared-instance shape, and how to size whatever we land on so the working set fits in memory and nobody's tenant notices their model had to be paged in.
Read articleExam Room · Machine Learning
How to Train a Classifier on Heavily Imbalanced Data (preview)
A fraud detection model trained on a year of transaction data reports 99.3% accuracy and catches almost no fraud. The training set is 0.4% fraudulent and 99.6% legitimate, and the model has learned the safest possible strategy: predict 'not fraud' every time. The correct answer isn't 'try harder', it's knowing that imbalanced classification has a specific set of techniques, each with a different cost, and picking the one that matches both the data shape and the business cost of each error type.
Read articleExam Room · Machine Learning
How to Right-Size a SageMaker Endpoint With Inference Recommender (preview)
A team is about to ship a fine-tuned language model to production. They have a trained artifact, a sample payload, a latency target, and no idea which of the 25 inference instance types is the correct one. The cheapest fit is probably six times cheaper than the most-expensive-that-works. SageMaker Inference Recommender runs the benchmark so we don't have to, but the interesting work is knowing which question to ask it, reading the results it gives back, and knowing when its answer is wrong for our actual workload.
Read articleExam Room · Machine Learning
When SageMaker Training Compiler Actually Helps (preview)
A Hugging Face transformer is trained for 6 hours on 8 GPUs, three times a week, and the data scientists would like it to be faster. They've already tuned the hyperparameters; the bottleneck is the model itself running through PyTorch. SageMaker Training Compiler promises a 30-50% speedup on exactly this shape of workload by rewriting the training graph before it runs, but the docs hide when it helps and when it doesn't, and a naive drop-in makes some jobs slower. The interesting work is knowing which models and which training shapes it actually accelerates.
Read articleExam Room · Machine Learning
How to Use SageMaker Lineage Tracking to Answer Audits (preview)
A regulator asks a compliance team to prove which training dataset produced the model that scored a specific loan application six months ago. The answer lives across S3 buckets, Git commits, training-job logs, and somebody's Jupyter notebook. SageMaker has a service specifically for this question (Lineage Tracking), but it's not useful as a report after the fact. It's useful as a graph built automatically while the training pipeline runs, and the interesting work is knowing which pieces it captures for free, which we have to add ourselves, and how the graph gets queried when the regulator actually asks.
Read articleExam Room · Machine Learning
How to Add a Custom Metric to SageMaker Model Monitor (preview)
A recommender-system endpoint is drifting in a way the built-in Model Monitor reports don't see. Feature distributions look fine, ground-truth labels arrive too late to catch the regression, and the data quality metrics are all green. The signal that matters is a domain-specific one (the top-10 hit rate against a held-out validation set replayed hourly), and the way to get Model Monitor to watch for it is to teach it your own metric. The interesting work is in what counts as a metric, how the built-in jobs evaluate it, and where the boundary between 'just report the number' and 'also alarm on it' actually sits.
Read articleExam Room · Machine Learning
Choosing Between SageMaker Real-Time Endpoints and Batch Transform (preview)
A customer-churn model is re-scored nightly across 20 million accounts, and a lead-scoring model is called the moment someone submits a form. Same model family, same training data, same accuracy target, and two completely different correct answers about how to deploy it. Real-time endpoints and batch transform jobs have overlapping names and very different bills. Getting the match correct turns out to be less about latency and more about what the downstream consumer actually does with the prediction.
Read articleExam Room · Machine Learning
When To Reach for Async SageMaker Inference (preview)
A contract-review model takes 45 seconds to score a 30-page document, payload is 8 MB, and the front end team's XHR times out at 30 seconds every time. Throwing a bigger instance at it makes each call slightly faster but leaves 90% of the expensive GPU idle between requests. SageMaker has a third endpoint type that fits exactly this shape (not synchronous, not a batch job), and its trade-offs explain themselves once the workload is laid out properly.
Read articleExam Room · Machine Learning
XGBoost, LightGBM, or DeepAR: Picking the Right Built-In (preview)
Three forecasting problems on the same desk. Next-quarter fraud rate by product line. Daily SKU-level demand across 4,000 products. Ad-spend to conversion attribution where explanations matter as much as numbers. SageMaker's catalogue has a built-in algorithm for each; the names (XGBoost, LightGBM, DeepAR) are not interchangeable. Each one was built for a particular shape of problem, and the cost of picking the wrong one is what we're trying to avoid.
Read articleExam Room · Machine Learning
How to Deploy a SageMaker Endpoint Across Multiple Regions (preview)
A fraud-scoring model that runs in a single eu-west-1 endpoint earns its first European customer in Sydney, and then Tokyo, and then São Paulo. Latency goes from 40 ms to 340 ms and the product manager wants to know what we're going to do about it. Multi-region deployment sounds like a single decision; it's actually four decisions stacked (where the model lives, where the endpoints run, how traffic finds the nearest one, and how the weights stay in sync). Each one has options, and the wrong pairing turns a latency win into a disaster-recovery problem.
Read articleExam Room · Machine Learning
Choosing Bedrock or SageMaker JumpStart for a RAG Chatbot (preview)
A product team wants a customer-facing chatbot that answers questions over our own documentation, cites its sources, and doesn't cost a fortune per conversation. Two AWS services have foundation-model in their marketing copy: Bedrock and SageMaker JumpStart. They sound interchangeable; they are not. Neither is better in the abstract. They're built for different shapes of problem, and picking one means committing to what we'd be willing to operate.
Read articleExam Room · Machine Learning
How to Build a Multilingual Invoice Pipeline With Textract, Translate, and Comprehend (preview)
A stack of scanned vendor invoices in six languages lands in an S3 bucket every hour. Somebody needs to pull out the supplier name, line-item totals, and the tax rate, translate the freeform notes into English, and route the whole thing to accounts payable. Three AWS services have their names on that job, and each does a slice of it well and the other slices badly. The interesting work is lining up which service answers which question, rather than asking any one of them to do the whole pipeline.
Read articleExam Room · Machine Learning
How to Run SageMaker Training Jobs on Spot Capacity (preview)
A nightly training pipeline burns 8 × A100 for four hours per run, which at on-demand prices is enough to have finance asking pointed questions. The training is idempotent, checkpointed every 30 minutes, and has no deadline tighter than 'done by the morning standup.' SageMaker training jobs can run on Spot capacity for roughly a 70% discount, provided the pipeline tolerates interruption, the instance choice is broad enough, and the checkpoint-and-resume logic actually works when AWS reclaims the instance halfway through epoch 7.
Read articleExam Room · Machine Learning
How to Roll Out a New SageMaker Endpoint With Auto-Rollback (preview)
A fraud-detection model retrained on six more months of data is ready for production. The last time the team deployed a new version by flipping the endpoint over, it took three days to notice the new model was regressing on a specific merchant segment. They want the next rollout to be observable, reversible, and automated to roll back on its own. SageMaker endpoint variants with shadow testing and traffic shifting are the tool for that, and the shape of the safety is in the deployment-config choices.
Read articleExam Room · Machine Learning
When to Use SageMaker Autopilot for Tabular Classification (preview)
A data scientist is staring at a new tabular classification problem: 50 features, 150k rows, balanced classes. Could be XGBoost. Could be CatBoost. Could be a small neural net. Iterating through five model families with hyperparameter tuning by hand is several engineer-weeks. SageMaker Autopilot is the automated alternative, it trains and tunes a set of candidate pipelines, picks the best, and hands back a model plus a notebook showing everything it tried. Worth understanding what it does automatically, what it doesn't, and when it's the correct first reach.
Read articleExam Room · Machine Learning
Building a Churn Model in SageMaker Canvas Without Python (preview)
A business analyst with ten years of domain expertise and zero Python experience wants to build a customer-churn model. They have the data in Redshift, they have a reasonable understanding of which features matter, and they have no one on the data team with cycles to help for the next quarter. SageMaker Canvas is the no-code option that turns that into something they can actually ship; understanding what Canvas does (and what it hides) is the real story.
Read articleExam Room · Machine Learning
Choosing a Bedrock Knowledge Base Vector Store (preview)
A team building a Bedrock Knowledge Base for 8,000 product documents is being asked the first infrastructure question of any RAG system: where do the embeddings live? Bedrock offers four backing stores. OpenSearch Serverless, Aurora PostgreSQL with pgvector, Pinecone, and Redis Enterprise Cloud, plus the Neptune-based graph store for graph-RAG. They look interchangeable on a diagram and diverge sharply once you ask about scale, cost shape, and operational model.
Read articleExam Room · Machine Learning
Bedrock Agents vs Lambda Orchestration (preview)
A customer-facing assistant needs to look up order status, create a return, check inventory, and talk to the user in a coherent thread across all of it. Two AWS options for the orchestration: Bedrock Agents, which gives the LLM the keys to call tools and plan multi-step flows; or a Lambda-based state machine where the LLM is one step among many. The trade isn't about capability, both can do the work, but about where the intelligence sits, where the bugs can hide, and who's on call when a user gets a strange answer.
Read articleExam Room · Machine Learning
Choosing Between RAG, Fine-Tuning, and Prompting (preview)
An internal-support team wants a chatbot that answers questions using the company's own product documentation, changelogs, and a decade of support tickets. A general-purpose foundation model knows none of it. The team has three levers to make the model look like it knows the domain: retrieval-augmented generation, fine-tuning, and careful prompting. The levers trade against each other on latency, cost, freshness, and how much the model actually internalises, picking between them is more about the nature of the knowledge than the quality of the model.
Read articleExam Room · Machine Learning
Choosing Between Glue and SageMaker Processing for Feature Pipelines (preview)
A feature pipeline reads a few terabytes of raw transaction events, joins to customer and merchant reference data, computes rolling windows, and writes a parquet dataset ready for training. AWS has at least two very different ways to run that pipeline: Glue (ETL-shaped, Spark-backed, Data Catalog aware) and SageMaker Processing (ML-shaped, script-backed, in the same job graph as the training). Same work, two tools, the pick is about where the pipeline lives in the team's mental model as much as what it does.
Read articleExam Room · Machine Learning
Bringing Your Own Container to SageMaker (preview)
A research team has been running a JAX + Flax model on their own EC2 instances and wants to migrate onto SageMaker for training, hyperparameter tuning, and inference. SageMaker ships first-party containers for PyTorch, TensorFlow, MXNet, HuggingFace, and XGBoost, and nothing for JAX. Building a bring-your-own-container image turns out to be almost the entire migration, and the shape of that container is more constrained than it looks from outside.
Read articleExam Room · Machine Learning
How to Profile a Slow SageMaker Training Job (preview)
A training run on a `p4d.24xlarge` is taking twice as long as it did a month ago. The throughput metric says ~50% of expected; nvidia-smi shows GPUs sitting at 40% utilisation instead of the 90%+ they used to hold. Something is starving them, and it could be anything, the data loader, the network, a CPU pre-processing step, IO latency, a single layer allocating too much. SageMaker Debugger's profiler is designed for this exact hunt, and it breaks the training step into parts that let you answer 'where did the time go' rather than 'is it slow.'
Read articleExam Room · Machine Learning
Choosing a Distributed Training Strategy When One GPU Isn't Enough (preview)
A model that was comfortably fitting on one `p4d.24xlarge` six months ago won't fit on eight A100s today. The dataset has grown, the parameter count has grown, and a single training run takes thirty-six hours if it completes at all. SageMaker supports at least four distributed-training strategies and they scale along different axes, one for when the data doesn't fit, one for when the model doesn't fit, one for when neither fits, and one for jobs that don't really need any of this but would like to be faster.
Read articleExam Room · Machine Learning
From Notebook Instances to SageMaker Studio (preview)
A data science team that started with one `ml.t3.medium` SageMaker notebook instance now has fourteen of them, each bookmarked by a different person, each holding state nobody else can see. When a new hire joins, the onboarding doc says 'ask Priya which notebook to use.' SageMaker Studio is sold as the replacement, but the transition isn't notebook-for-notebook; it's a different model of who owns what, where the files live, and how the team shares kernels.
Read articleExam Room · Machine Learning
Hosting Seventy Models Behind One SageMaker Endpoint (preview)
Seventy per-merchant fraud models, each trained independently, each needed on demand but almost never all at once. A separate endpoint per model would be an eye-watering monthly bill; squashing them into one model would waste months of per-merchant tuning. SageMaker offers at least three endpoint shapes for hosting many models together, and they look superficially similar until you ask which ones share containers, which ones share hardware, and what happens when a model isn't hot in memory.
Read articleExam Room · Machine Learning
How to Tell Which Kind of Model Drift You Have (preview)
A fraud model that's been quiet for nine months starts throwing a different shape of alerts. Recall is down, precision is down, and the feature distributions look off in places that don't match any product change we shipped. 'The model is drifting' is a sentence that means four different things depending on which part of the pipeline moved. SageMaker Model Monitor names the four and gives each one a baseline, a schedule, and a metric, which is the part that matters when it's 3am and a dashboard is red.
Read articleExam Room · Machine Learning
Running Edge ML Without SageMaker Edge Manager (preview)
Two hundred industrial cameras run defect detection on factory floors, ships with steel hulls, remote mining sites, patchy satellite uplinks. Inference has to happen locally and sync to the cloud when the network cooperates. The textbook answer for this shape used to be SageMaker Edge Manager. That service reached end-of-life on 26 April 2024. Working out what replaces it is the real scenario.
Read articleExam Room · Machine Learning
Picking the Right Auto-Scaling Signal for GPU Endpoints (preview)
A SageMaker real-time endpoint serves an image classifier on a GPU instance. Auto-scaling is wired to CPU utilisation at a 70% target. When traffic spikes, latency spikes with it, and the scale-out fires several minutes after the queue has already blown out, because the GPU saturates long before the CPU does. The fix is choosing a signal that actually tracks demand.
Read articleExam Room · Machine Learning
Governed Model Promotion on SageMaker (preview)
A regulated-industry ML team promotes models by copying S3 files and hoping. Legal wants a signed human approval, auditors want the holdout metrics on file, engineering wants to know which dataset trained the thing, and when production catches fire somebody wants a way back to whatever was running yesterday. Four requirements, one promotion gate, and a clear picture of which corner of AWS is built for this and which corners are almost-but-not-quite.
Read articleExam Room · Machine Learning
Hosting a 70B LLM with Business-Hours Traffic (preview)
A fine-tuned Llama 3 70B, a traffic pattern that sits idle overnight and peaks at lunch, and three AWS hosting paths that look similar in the console. One is provisioned-only and wrong the second the day ends. One is full BYO and a quarter's worth of work. The middle path is the one that reads the traffic shape correctly.
Read articleExam Room · Machine Learning
How to Label 500K Images with 100 Hours of Human Time (preview)
Half a million product images, a hundred hours of human annotation budget, and a requirement for bounding boxes around every product. Labelled by hand that's roughly ten seconds per image, less than a third of what's needed. The trick is to let a model label the easy ones and save the humans for the uncertain edges, then work out whether the numbers survive contact with reality.
Read articleExam Room · Machine Learning
Orchestrating a Nightly Model Retrain (preview)
A nightly XGBoost retrain, a conditional push to the Model Registry if AUC clears a bar, a deploy Lambda downstream, and a hard requirement to know which dataset produced which model. Five orchestrators can run that DAG. Only one of them tracks the lineage for free.
Read articleExam Room · Machine Learning
How to Avoid Train/Serve Skew with SageMaker Feature Store (preview)
A fraud model trains on historical data overnight and scores transactions in real time under a 50 ms budget. The features it needs are identical in both paths, and if the two paths compute them differently, the model gets train/serve skew and quietly loses accuracy. SageMaker Feature Store solves this by backing one feature definition with two stores: one for training, one for serving.
Read articleExam Room · Machine Learning
Choosing a SageMaker Hyperparameter Tuning Strategy (preview)
An XGBoost fraud model, twenty hyperparameters, a two-hundred-job tuning budget, and twenty-five minutes per run. SageMaker Automatic Model Tuning ships four search strategies for exactly this shape of problem, and picking the correct one is worth roughly a 4x cut in wall-clock time for the same final AUC.
Read articleExam Room · Machine Learning
Choosing a SageMaker Inference Mode (preview)
SageMaker offers four ways to serve a model. Pick by latency and you'll get the answer wrong. The interesting axis here is payload size, processing time, and arrival pattern, and three of the four options exclude themselves before you've measured anything.
Read articleExam Room · Data
Choosing Between RA3, DC2, and Spectrum for Redshift Storage (preview)
A Redshift cluster that's been running DC2 nodes since 2019, a growing data lake in S3 the warehouse wants to reach into, and a new team that wants Redshift Serverless without any of the node-sizing conversation. RA3 with managed storage, DC2 with local storage, and Spectrum reading S3 directly are three different answers to 'how is compute and storage related in Redshift', and each makes a different trade.
Read articleExam Room · Data
Tuning Redshift WLM for Mixed Workloads (preview)
A Redshift cluster where short dashboard queries wait behind five-hour ETL jobs every morning, a WLM queue configuration that's been default for eighteen months, and a DBA who wants to make the ETL stop starving the dashboards without making the ETL slower. WLM is Redshift's workload-management layer: queues, slots, memory, and the rules that route queries between them.
Read articleExam Room · Data
Choosing the Right Glue Job Shape for the Workload (preview)
A Glue bill that's become the biggest line item in the data platform. Half the jobs are transforming hundred-gigabyte datasets; the other half are bouncing three hundred rows through a five-minute script. They're all the same job shape because that's what got written first. The work is figuring out which workloads match which shape on the Glue-job landscape, and where the wrong shape has been paying a cluster premium for work that doesn't need one.
Read articleExam Room · Data
Step Functions or Glue Workflows for Nightly ETL (preview)
A nightly ETL with eleven Glue jobs, dependencies between them, three of which can run in parallel, one that has to retry on transient failures, and one that requires human approval before it writes to production. Step Functions and Glue Workflows both orchestrate Glue jobs; the shape of the work determines which is the correct fit and which tries to fit but won't.
Read articleExam Room · Data
Choosing a Managed SaaS Connector for the Warehouse (preview)
A Salesforce tenant with thirty objects that need to land in the warehouse daily, a Zendesk instance adding another fifteen, and a team that's been writing a bespoke Python script per connector for each new source. Bespoke scripts aren't sustainable; the harder call is which shape of managed ingestion matches what we actually need, what the trade-offs look like between configured connectors and code, and where each option breaks down when the source isn't quite as well-behaved as the brochure assumes.
Read articleExam Room · Data
How to Diagnose a Slow DMS Replication Task (preview)
A DMS full-load task that's been running for thirty-six hours with sixteen hours to go, on tables that shouldn't take half that. The replication instance is showing CPU headroom and memory to spare, but the throughput graph is a flat line. DMS tuning is a set of knobs that affect different parts of the pipeline; the work is figuring out which part is the bottleneck and which knob moves it.
Read articleExam Room · Data
How to Tier OpenSearch Shards Across Hot, Warm, and Cold (preview)
Two years of application logs in OpenSearch, most of them never queried, all of them costing the same per GB per month. Storage tiering is how that bill gets sensible without losing the ability to search yesterday's and last year's data when it matters. The work is matching the access pattern of each age of data to the tier that serves it.
Read articleExam Room · Data
Choosing Between QuickSight SPICE and Direct Query (preview)
A QuickSight dashboard with three hundred daily viewers, backed by a Redshift cluster that groans under the concurrent load every Monday morning. SPICE is QuickSight's in-memory columnar engine that caches data so dashboards don't hit the warehouse for every view. Direct query bypasses SPICE and reads live. The decision isn't 'which is faster'. SPICE usually is, but which trade-offs each makes and which matches the dashboard's actual freshness and scale.
Read articleExam Room · Data
Choosing a Change Data Capture Pipeline on AWS (preview)
A Postgres order-book that needs to feed the warehouse, with every insert, update, and delete captured in near-real-time. Two shapes of pipeline on AWS can carry change data forward, one point-to-point, one fan-out through a shared log, and picking between them comes down to how many consumers the CDC stream has to feed, and what happens when a new consumer wants in.
Read articleExam Room · Data
How to Clean a Messy CSV Without Writing Code (preview)
A CSV of lead records from a trade show, a finance analyst who needs to normalise phone numbers and deduplicate contact emails before Monday, and a data engineering team that's three weeks deep in a warehouse migration. The work is figuring out who does the cleanup, what tool meets a non-programmer where she is, and whether the 'one-off' label hides a recurring job that justifies a repeatable recipe rather than a Spark script.
Read articleExam Room · Data
How to Govern Glue Catalogs Across Four Accounts (preview)
Four Glue catalogs across four accounts, an Excel spreadsheet of dataset owners that nobody keeps current, and a finance analyst who's joined her company's customer table against a sales table she wasn't supposed to read. The catalog is everywhere and the governance is nowhere. The work is figuring out what a governance layer above the catalogs actually has to do, discovery, ownership, access workflow, project isolation, and which approach on the landscape fits the shape we have.
Read articleExam Room · Data
How to Add Data Quality Gates to a Glue Pipeline (preview)
A Glue job that silently loaded a malformed CSV last Tuesday and nobody noticed until an analyst asked why Q3 revenue looked odd. The fix isn't a bigger dashboard; it's an upstream gate that refuses to promote data to the warehouse unless a stated set of expectations holds. Adding data quality checks is obviously the right call; the harder part is deciding what to assert, where in the pipeline the assertions should run, and what should happen when an assertion fails.
Read articleExam Room · Data
EventBridge Pipes or a Lambda for Filter-and-Enrich (preview)
A DynamoDB stream that needs to feed an EventBridge bus, with a filter that drops 80% of the events and an enrichment that adds customer-segment data. The classic answer is a function full of boilerplate around two meaningful transformations. The work is figuring out where on the integration landscape filter-then-enrich-then-fan-out can be expressed as configuration rather than code, and what trade-offs come with each option.
Read articleExam Room · Data
Choosing Between Iceberg, Delta Lake, and Hudi (preview)
A pile of Parquet in S3, an Athena catalog that knows about partitions but nothing about versions, and an analyst who keeps asking 'what did the table look like yesterday?'. Three open table formats have answers. Iceberg, Delta Lake, and Hudi, and they answer a subtly different question each. The work is figuring out which question we're actually asking before picking.
Read articleExam Room · Data
How to Enforce Schemas on a Kinesis Stream (preview)
Twelve producer services feed a Kinesis Data Stream read by eight downstream pipelines. Last week a producer quietly added a required field; three consumers crashed, one silently dropped the records, four carried on. The fix isn't a wiki page telling producers to be careful, it's a schema the producer has to register before it can serialise, a version ID embedded in every record, and a compatibility rule the registry enforces the moment the schema is published. Move the contract out of the heads of twelve teams and into a system that rejects a breaking change before a byte lands on the stream.
Read articleExam Room · Data
How to Scale Kinesis Reads Across Many Consumers (preview)
A 50-shard Kinesis Data Stream sprouted an eighth downstream consumer and immediately broke the six already reading it. Every GetRecords started throwing ProvisionedThroughputExceededException. Every consumer thought it had the stream to itself; Kinesis knew otherwise. The fix means walking the read-side landscape, understanding what shard-level read throughput really caps, and weighing which consumers earn dedicated capacity and which don't.
Read articleExam Room · Data
How to Enforce Column and Row Access on a Data Lake (preview)
A 12 TB S3 data lake, 40 Glue tables, and an IAM policy that says yes or no to whole prefixes. The data platform team wants analysts to read the sales tables, but not the commission column. Regional analysts in EMEA must see only EMEA rows. Every query must land in an audit log that the risk team can subpoena. Closing the gap means walking the data-lake access-control landscape and finding a policy surface that speaks the language of tables, columns, and rows, and that every engine consults before a single byte is returned.
Read articleExam Room · Data
How to Cut Athena Scan Costs with Partitioning (preview)
A finance team's Athena queries are scanning eight terabytes of CSV every time and costing forty dollars a pop, fifty times a day. The data they actually want is one day's worth, about thirty gigabytes. Closing the gap means understanding the Athena cost model, the difference between a registered partition and a projected one, and what columnar encoding does on top of partition pruning.
Read articleExam Room · Data
How to Ingest Telemetry from 50,000 IoT Sensors (preview)
A fleet of 50,000 IoT sensors pushing one-kilobyte events once a second needs a real-time Lambda consumer, a seven-day replay window, external Java consumers, and Parquet-in-S3 for analytics. No single AWS streaming service ticks all four boxes cleanly, picking well means walking the streaming-ingestion landscape and weighing which pieces buy back the ones that don't fit.
Read articleExam Room · Data
Choosing Athena, Glue, or EMR for Parquet on S3 (preview)
A product analytics team with five terabytes of Parquet in S3, a handful of ad-hoc questions a week, and zero appetite for a Spark cluster. Three AWS services could answer them. Athena, Glue, EMR, and only one of them charges nothing between query bursts. Finding the correct fit means walking the whole analytics-on-S3 landscape and understanding where each service is cheap, where each is wasteful, and where each is flat-out wrong.
Read articleExam Room · Operations
How to Inject Secrets and Scope Task Roles in ECS (preview)
A container image rebuilt from the Dockerfile in a git branch, shipped yesterday, with the database password in an environment variable. A task definition whose IAM role grants too much because nobody's audited it. A third-party API key passed via ECS environment variables in plain text. Three ways ECS workloads hold secrets, two of them wrong. Secrets management is a given; what's worth working through is how the task role and secrets injection fit together, which one carries which credentials, and what an audit should be able to see.
Read articleExam Room · Operations
How to Migrate EC2 Fleets to the Unified CloudWatch Agent (preview)
An old EC2 fleet still running the legacy CloudWatch Logs agent. A newer fleet running the unified CloudWatch agent with metrics and logs. A Prometheus exporter sidecar on some hosts. Three agents, two configuration files per host, and documentation that contradicts itself on which is preferred. The unified agent is plainly better; the work is figuring out what each legacy piece was doing, what migrating feels like, and where the unified agent's knobs sit once it's installed.
Read articleExam Room · Operations
Choosing Between CloudWatch, Grafana, and Datadog Dashboards (preview)
CloudWatch dashboards in production, tweaked from three tutorials; Grafana in staging, stood up by a team that wanted to plot alongside Prometheus data; a Datadog subscription for the parts nobody wants to move off. Three dashboarding surfaces, three bills, three user populations. They all draw graphs, so 'which tool is better' isn't a useful frame; what matters is which question each surface is best for answering, and where staying on two surfaces is cheaper than consolidating.
Read articleExam Room · Operations
How to Search 45,000 Resources Across 22 AWS Accounts (preview)
'Find me every S3 bucket tagged Env=prod across our accounts and Regions.' Thirty tabs later, a half-finished spreadsheet, and a nagging feeling one Region got missed. An estate of 22 accounts and 15 Regions has made the Describe-and-diff approach untenable. There are five ways to search AWS; what's worth working out is which one gives an answer in seconds without running a scheduled crawler or building a search index.
Read articleExam Room · Operations
How to Act on AWS Health Events Automatically (preview)
An EBS volume is degraded in one availability zone. A TLS certificate on an older API expires in two weeks. A planned maintenance window will reboot an RDS instance tomorrow night. AWS knows all three; the team finds out from the volume's first I/O error, the certificate warning in Slack, and a post-mortem. 'Get better monitoring' is too vague to act on. The specific work is turning AWS's own health signal into a first-class input that reaches the right people before the incident does.
Read articleExam Room · Operations
How to Turn Structured Logs Into CloudWatch Metrics (preview)
A service that logs structured JSON but publishes no CloudWatch metrics. A dashboard of graphs that each pull from Logs Insights queries, taking 30 seconds to load. An on-call rotation that correlates 'error rate' with 'deploy time' by squinting. Adding metrics is the obvious move; the work is deciding where they should come from: a new instrumentation layer in the code, a CloudWatch agent, or the log lines that are already being written.
Read articleExam Room · Operations
SNS Access Policy vs Filter Policy (preview)
An SNS topic that half the org has permission to publish to. Eighteen subscribers, each receiving every message, each filtering on their own side with hand-written if-statements. A service that joined last month and is getting flooded with events it doesn't care about. SNS is the correct tool here; what's worth pinning down is which policy answers 'who can publish' and which answers 'who actually receives this specific message.'
Read articleExam Room · Operations
How to Prove Backup Compliance With AWS Backup Audit Manager (preview)
An auditor arrives with a simple question: show me that every RDS instance in production was backed up last night. The on-call engineer opens the console, picks the first instance, clicks Backups, takes a screenshot, and moves to the second. Four hours later they're still going. Whether the backups ran is barely in doubt; they almost certainly did. The work is getting the evidence from 'true but unprovable' to 'exportable in a minute' without an engineer ever touching a console.
Read articleExam Room · Operations
EC2 Image Builder vs Packer (preview)
A golden AMI rebuilt by a shell script on a bastion host, last updated seven months ago. A CVE announcement this morning that requires a patched kernel across the fleet. A team that wants to adopt Packer 'because HashiCorp.' Image Builder versus Packer in the abstract isn't a useful frame; both build images. What matters is which one owns the pipeline, which owns the recipe, and what it costs when the choice is wrong.
Read articleExam Room · Operations
How to Fan Out CloudWatch Logs to Real-Time Consumers (preview)
A security team wants every `AccessDenied` line copied into a SIEM within two minutes. A product team wants a near-real-time stream of structured business events into Kinesis. An ops team wants stack traces routed to an incident bot. Three consumers, one source: CloudWatch Logs, already holding the data. Getting logs out of CloudWatch isn't the problem (there are five ways). The work is figuring out which shape fits a line-at-a-time fan-out, which fits bulk archival, and where subscription filters earn their keep.
Read articleExam Room · Operations
How to Monitor Service Quotas Before They Bind (preview)
A deploy fails at 03:14 because the account hit its `RunInstances` quota. A Lambda concurrency limit triggers throttles on a spike the team saw coming. An ops engineer filed a quota increase in the console and the ticket sat unassigned for three days. Quotas are soft limits the organisation should be steering; today they are hard walls discovered at 3 AM. Raising limits isn't the missing piece (AWS has an API for that). The work is choosing which limits to track, automating the request path, and pinning down where the responsibility for 'we're about to hit a ceiling' sits.
Read articleExam Room · Operations
Picking a Scheduler on AWS (preview)
A nightly report that needs to fire at 02:00 local to each Region. A billing batch whose steps must run in sequence, with retries and branches. A per-customer reminder that should fire exactly 24 hours after signup. Three scheduling shapes, three very different tools, and a lot of code that reaches for cron on an EC2 instance because that's what the team knows. 'Which scheduler' isn't the right frame; the work is figuring out which scheduling shape each tool was actually built for, and what happens when you use the wrong one.
Read articleExam Room · Operations
How to Catch Customer-Facing Outages With Synthetics Canaries (preview)
A checkout flow has gone down three times this year. Each time, the first page to notice has been Twitter. ELB health checks were green; CloudWatch metrics looked normal; the alarm that finally fired was disk-space on an unrelated host. More alarms isn't the missing piece, we have plenty. What's missing is a test that would have caught each failure before the customers did, and a way to run that test forever without paying someone to watch it.
Read articleExam Room · Operations
Trusted Advisor vs Compute Optimizer (preview)
A new FinOps lead asks 'what's Trusted Advisor telling us?' Two hours later they ask the same question about Compute Optimizer. Both services sit in the console with 'save money' in their pitch, both read usage data, and both produce recommendations. They are not the same thing. Enabling both is almost certainly right; the work is knowing which class of advice each one gives, and how to stop the two services from contradicting each other on the same instance.
Read articleExam Room · Operations
How to Roll Out Config Conformance Packs Across an Org (preview)
Twenty-two member accounts, forty compliance rules, and a security team that wants one report across all of it. Bolting Config rules on per-account is how the previous regime got to a spreadsheet that nobody trusted. Config is obviously the right tool; what's worth working through is what a conformance pack buys over the raw rules, where it lives in the org, and how remediation fits into something that was designed to only observe.
Read articleExam Room · Operations
Run Command or Runbook (preview)
A one-shot patch across 400 instances. A four-step release pipeline with approvals. A nightly log-rotation job that needs to run everywhere. Three operational jobs, two Systems Manager features, and a lot of blog posts that treat them as interchangeable. Neither is 'better', they solve different problems, so the work is figuring out which shape of job belongs in Run Command, which belongs in an Automation runbook, and what we give up when we pick the wrong one.
Read articleExam Room · Operations
How to Warm Up and Drain Instances With Lifecycle Hooks (preview)
A fleet behind an ALB scales on CPU. When new instances launch they take traffic instantly, before their JVM has settled, before their caches are warm, before Consul has a chance to register them, and users see 2-3 minutes of elevated errors. When instances terminate they lose whatever is in-flight and drop whatever is queued. Two symptoms, one missing mechanism: a pause on the way into service and a pause on the way out, held open by Auto Scaling lifecycle hooks while the application does the work the schedule didn't allow for.
Read articleExam Room · Operations
How to Quiet Noisy Pagers With Composite Alarms (preview)
CPU spikes at 02:00 every night because the nightly backup does what nightly backups do. Latency spikes when the downstream payments API has a wobble. Error rate occasionally prints a one-off 5xx for reasons nobody can ever reproduce. Each signal alone pages the on-call. The on-call stops answering. A real incident. CPU climbing, latency climbing, errors climbing, arrives and looks like yet another cry-wolf. The fix is two layers: adaptive per-metric alarms that learn the service's daily pattern, and a composite alarm that only pages when all three agree.
Read articleExam Room · Operations
How to Replace SSH Bastions With Session Manager (preview)
Three bastion hosts, shared SSH keys rotated quarterly, CloudTrail that records a login but not a keystroke. Security wants all of it gone: no inbound SSH, no shared secrets, per-engineer shell audit with input and output captured. Session Manager is the AWS-native answer, once you know which IAM role sits on the instance, which on the engineer, which document shapes the session, and which SCP keeps the whole org honest.
Read articleExam Room · Operations
How to Patch a Fleet With Systems Manager Patch Manager (preview)
800 EC2s across four regions, 150 on-prem VMs in two data centres, three operating systems, three engineers, and a SOC2 auditor. SSH-ing into 950 hosts isn't a plan. Systems Manager Patch Manager is, knowing which pieces do which job is the skill.
Read articleExam Room · Operations
How to Triage a 5xx Spike With CloudWatch Logs Insights (preview)
The pager goes at 03:00 for elevated 5xx on a public ALB. Logs are already in CloudWatch. The question is which path, which target group, which clients, and whether the spike correlates with anything — answered in minutes, from the console, without standing anything new up. CloudWatch Logs Insights is the tool the scenario is pointing at; getting the query correct is the skill it's testing.
Read articleExam Room · Development
Lambda URL or HTTP API (preview)
A single webhook receiver that needs an HTTPS URL, nothing more. A small service with three routes that could do without API Gateway's feature list. A pair of endpoints with different authorisers, rate limits, and Lambda targets. Three different service shapes, two very similar AWS surfaces, and enough overlap that the wrong choice costs a little money today and a lot of migration work next year. Both options are cheap; what decides it is which fits the shape, and when 'just a URL for a Lambda' is genuinely enough.
Read articleExam Room · Development
Choosing an API Gateway Authoriser (preview)
A public API open to the internet for a consumer-facing mobile app. A partner API behind an API key with usage plans. An internal API reachable only by other AWS services in the same account. A machine-to-machine API where a bank's back-office calls ours with a signed JWT. Four authorisation shapes, a handful of API Gateway authoriser types, and surprisingly different trade-offs depending on which mix lands on the same gateway. Different authorisers are right for different callers; the work is matching authoriser to caller profile, and seeing why IAM-auth isn't quite what the phrase suggests.
Read articleExam Room · Development
How to Add Sidecars to an ECS Task (preview)
An application container that needs structured logs shipped somewhere other than stdout. A service-mesh proxy that must be co-located with the app to handle mTLS. A secrets-injection helper that reads from Parameter Store and writes to a shared volume. Three common sidecars, one ECS task definition, and a handful of primitives, container definitions, dependsOn, essential, shared volumes, link modes, that decide whether the task behaves the way you expect. Sidecars work fine on ECS; the work is knowing which primitives hold the pattern together, and which corner cases bite.
Read articleExam Room · Development
How to Filter SNS Messages per Subscription (preview)
A single SNS topic carrying every event the platform emits. A subscriber that cares only about high-priority customer events. Another subscriber that needs events for one specific region. A third that wants everything except test traffic. The naive answer is a topic per filter; the better answer is one topic with SNS message filter policies, and the details of attribute-based matching, message-body filtering, and the 100-subscription limit repay a careful read before you commit.
Read articleExam Room · Development
How to Handle Errors in a Step Functions Workflow (preview)
A state machine that calls a flaky third-party API. A workflow that must roll back partial work if a later step fails. A long-running process whose failures should reach a human reviewer rather than vanish into logs. Three failure shapes, one state machine language, and a set of error-handling primitives worth walking through before relying on them. Step Functions can catch errors, the work is figuring out which primitive handles which shape, and what the workflow would have to look like to stop thinking about errors.
Read articleExam Room · Development
CloudFront Signed URLs or Signed Cookies (preview)
An e-book download link that should work once, for one customer, for ten minutes. A premium video library where the whole archive should open up to paying subscribers but stay closed to everyone else. A firmware update a device fetches directly by serial number over HTTPS. Three access-control shapes on private content served through CloudFront, and two different mechanisms for authorising them. Both are secure; the work is matching the shape to the mechanism, and seeing what the consumer would have to look like in each case.
Read articleExam Room · Development
How to Host Private Packages in CodeArtifact (preview)
An internal TypeScript library that four services already depend on. A Python package the data team wants to share. Every build pulling open-source dependencies straight from npmjs.com and PyPI, trusting whatever got published last. A compliance deadline says nothing goes through unvetted public registries. CodeArtifact is AWS's answer, and the clean pattern, upstream proxy plus internal-publish repository, covers more than most teams first realise.
Read articleExam Room · Development
How to Extend CloudFormation with Custom Resources or Constructs (preview)
A CloudFormation stack that needs to provision a third-party SaaS resource on every deploy. A CDK team that wants 'our standard service' to become a single line in every new stack. A resource AWS doesn't model natively but we need to manage with the rest of our infrastructure. Three problems, two mechanisms, and a lot of confusion about which is which. CloudFormation has to be extended either way; the work is sorting out which tool is doing which job, because they solve genuinely different things.
Read articleExam Room · Development
How to Pick Between Env Vars, Parameters, and Secrets (preview)
A Lambda that needs to know the name of its DynamoDB table. A Lambda that reads a feature-flag threshold that changes weekly. A Lambda that must authenticate to a third-party API with a key that rotates monthly. Three pieces of configuration, three different lifetimes, three different secrecy requirements. AWS gives us three places to put them and enough overlap that they get used interchangeably. Each option is correct for something; what matters is matching configuration to shape, and knowing what each choice costs.
Read articleExam Room · Development
S3 Event Notifications or EventBridge (preview)
An image-processing pipeline that needs to kick off a thumbnail Lambda on every upload. A compliance service that cares about every object mutation, creates, deletes, replication events, lifecycle expirations, across fourteen buckets. S3 has two event delivery mechanisms that look almost identical at first and differ in ways that matter. Both are correct ways to wire up S3 events; what matters is which shape each fits, and what happens when you accidentally enable both.
Read articleThe Greenbox Story · What Came Next
Coaching at Scale: Charlotte's Pattern (preview)
Charlotte has done this six times now. Help a team grow up, teach them the practices, watch them succeed, and move on. The longest engagement she's ever had is over. She opens her notebook to a fresh page.
Read articleExam Room · Development
Rotating the Database Password (preview)
A Postgres master password that''s been the same since the project started. A compliance deadline that says ''rotate every 30 days, no exceptions.'' An application fleet of eighteen services that all read the credential at startup. Rotation isn''t in doubt, it has to happen, so the work is rotating without taking the fleet down, and understanding why Secrets Manager''s rotation Lambda works the way it does.
Read articleExam Room · Development
AppSync or Apollo on Lambda (preview)
A mobile client team that wants one endpoint and a strong schema. A backend team whose data is spread across DynamoDB, Aurora, and three internal HTTP services. A requirement for real-time subscriptions when a record changes. The shape calls for GraphQL; the question is whether to lean on AppSync's managed resolvers or a Lambda running Apollo behind API Gateway. Both can serve the same query; they pay very different tolls.
Read articleThe Greenbox Story · What Came Next
Knowledge Transfer: Three Generations (preview)
Dave Morrison's family has farmed the same land since 1962. His son Ben runs the Greenbox farm programme. His grandson helps stack crates on Saturday mornings. The 'local' promise that started with a handshake now has a corporate contract behind it.
Read articleExam Room · Development
EventBridge, SQS, or SNS for Fan-Out (preview)
A payment-captured event that needs to update four internal services. An order-placed event that kicks off five independent downstream workflows. A deeply integrated SaaS that emits events we want to route by schema. Three fan-out shapes, three services with overlapping jobs, and enough similarities that it's easy to pick the wrong one. None of the three services is best on its own, each is best for something, so the work is matching each fan-out shape to a service, and seeing what the consumers would have to look like in each case.
Read articleExam Room · Development
How to Trace Distributed Requests with X-Ray (preview)
A p99 latency that climbed from 200ms to 2 seconds last week, nobody knows where. Logs in fourteen different CloudWatch log groups, each with its own request ID scheme. An API call that passes through a Lambda, a Step Functions state machine, two DynamoDB tables, and a third-party HTTP vendor before it returns. The problem is not a lack of data, the problem is that the data is scattered. Tracing is the obvious move; the work is in which pieces of the request chain carry a correlation ID forward natively, which ones drop it, and what the application code has to do to make the gaps close.
Read articleThe Greenbox Story · What Came Next
Engineering Leadership: Tom in a Suit (preview)
Tom's title is now 'VP Engineering, Greenbox Division, Hartland Group.' His calendar has a recurring meeting called 'Change Advisory Board.' He misses the days when deploying meant SSH-ing into a server from his laptop.
Read articleExam Room · Development
How to Shape API Gateway Traffic with Throttles, Quotas, and Usage Plans (preview)
A public API whose downstream can't survive a burst. A free tier that should top out at 1,000 calls per day without turning paying customers away. A partner integration whose contract says a thousand requests per second. Three traffic-shaping jobs, three different knobs in API Gateway, and enough overlap that it's easy to configure the wrong one. There isn't one 'the' throttle, there are at least four, so the work is figuring out which knob protects what, and what the traffic would have to look like to reach each one.
Read articleExam Room · Development
Cache-Aside or Write-Through (preview)
A product-catalogue API returning the same items millions of times, an inventory counter the warehouse is constantly adjusting, and a leaderboard that must be strictly consistent on read. Three workloads, three read-write shapes, and four different caching patterns to pair them with. Each pattern is best for something; the harder call is which one matches which read-write shape, and what the application would have to tolerate to pick each one.
Read articleThe Greenbox Story · What Came Next
Founder Transition: Maya's Next Thing (preview)
Six months after the partnership closed, Maya is sitting in a different cafe, with a different notebook, and a familiar itch. The produce box was never really about vegetables.
Read articleExam Room · Development
SAM, CDK, or Serverless Framework (preview)
A Lambda-heavy microservice that needs to ship today, a platform team deploying a hundred services next quarter, and a multi-cloud greenfield product trying to stay portable. Three ownership shapes, three different definitions of 'infrastructure as code', and three tools that each solve a real problem. Each one is best for something; what matters is which tool matches which ownership shape, and what the team would have to believe about the future to pick each one.
Read articleThe Greenbox Story · New Ground
CoPs Across the Merge (preview)
Hamish walks into Thursday in July, sits through forty minutes, and leaves feeling like he has trespassed. Priya figures out, sitting on a step outside the office, why the original Thursday is not a thing anyone can import, and what can be, instead.
Read articleExam Room · Development
Step Functions Standard vs Express (preview)
A fintech runs two Step Functions workloads on the same flavour and wonders why the bill is what it is. One is an IoT data-processing flow firing about 800 times a second, short, idempotent, nobody looks at it after 24 hours. The other is an account-opening workflow that waits days for human approvals, survives weeks of KYC back-and-forth, and must be inspectable by auditors long after it finishes. One wants Express. One wants Standard. Picking the correct flavour per workload is a two-order-of-magnitude cost decision and a very different durability contract.
Read articleThe Greenbox Story · New Ground
Post-Merger Integration: Where Acquisitions Go to Die (preview)
Greenbox bought Harvest Box for $800K and got 3,000 Sydney subscribers, a codebase nobody understood, and eight employees who weren't sure they wanted to stay. Six months later, half the team had left, the systems were still running in parallel, and the team was learning why 70% of acquisitions fail to deliver their expected value.
Read articleExam Room · Development
DynamoDB Streams for Reliable Fan-Out (preview)
A deposit is recorded in DynamoDB. The application then writes an audit entry to S3, pushes a document to OpenSearch, and emits an event to EventBridge for the fraud team. Most of the time, all four places agree. Occasionally one of the three followers silently fails, the balance stands alone, and three days later a reconciliation job reports a ledger that says one thing and an audit log that says another. The fix is to stop doing the fan-out in application code and let DynamoDB itself be the source of the downstream stream.
Read articleThe Greenbox Story · New Ground
Meeting Tax and Maker Time (preview)
Tom has five hours of focus time in a working week. Priya has four. Ifeoma has three. There is no meeting in which to discuss the meeting problem, which is, it turns out, the meeting problem.
Read articleExam Room · Development
Cognito User Pools vs Identity Pools (preview)
A mobile-first social app needs email-and-password sign-up, Google and Apple login, direct-to-S3 photo uploads without a server in the middle, and different bucket prefixes for free-tier and paid-tier users. The team has heard of Cognito User Pools and Cognito Identity Pools and has been using the two names interchangeably. They aren't the same thing, they don't do the same job, and getting them confused is the difference between a working upload and a 403. One is about proving who you are. The other is about turning that proof into the AWS credentials that let you touch a bucket.
Read articleThe Greenbox Story · New Ground
Two Maps, One Territory (preview)
The Greenbox–Hartland partnership is a week old and the first technical integration meeting is tomorrow. Greenbox engineering has a diagram. Hartland engineering has a diagram. Neither diagram knows the other one exists. Charlotte draws the thing she calls a System Landscape.
Read articleExam Room · Development
How to Stop SQS Duplicate Processing with Visibility Timeout (preview)
A fulfilment Lambda reads an order off SQS, calls a slow partner API, and most of the time everything works. A few times a week the same order ships twice. The queue is on default settings, the function timeout is generous, and nothing in the logs looks unusual. The bug is the arithmetic between three numbers that almost never come up together in isolation, visibility timeout, function timeout, and how long processing actually takes.
Read articleExam Room · Development
How to Fix DynamoDB Hot Partitions on a Leaderboard (preview)
A viral game sends every write for every player at one DynamoDB partition key. The table is provisioned for 30,000 WCU, only a fraction of that actually reaches the hot partition, and the application sees ProvisionedThroughputExceededException on writes it is entitled to. The answer is not a bigger provisioned number. It is a partition key that is no longer a single point.
Read articleThe Greenbox Story · The Big Table
Post-Acquisition: After the Handshake (preview)
The Greenbox story ends where it began — at the Margaret River farmers' market. Same words, different meaning. Same people, different scale.
Read articleExam Room · Development
Lambda Cold Starts Under a P99 Budget (preview)
A customer-facing API responds in 80 milliseconds once it's warm, and four full seconds on the first request after a quiet spell. The second trace is the one every user sees first. Hitting a P99 under 500 ms means touring every cold-start lever Lambda exposes, pricing the ones that survive, and accepting that the cheapest answer isn't always the correct one.
Read articleThe Greenbox Story · The Big Table
Term Sheets and Earnouts: The Offer (preview)
The term sheet arrives. Earnout, retention bonuses, strategic partnership. Charlotte has been through this before. Tom is excited. Priya is worried. Dave is suspicious.
Read articleThe Greenbox Story · The Big Table
Exit Strategy: The Founder's Dilemma (preview)
A major food group approaches. Maya faces the question every founder dreads: sell and walk away, or keep building? Nadia asks the only question that matters.
Read articleExam Room · Foundations
Picking An AWS Support Plan (preview)
Three teams, three different relationships with AWS support. A developer with a quota question on a Monday morning. A payments platform that cannot be down for more than fifteen minutes. A bank-backed acquisition where the architect wants a phone number, not a web form. Same vendor, three different support plans, and a pricing model that rewards picking the correct one. No plan is best in the abstract; what matters is what each team needs when the pager fires, and what they're quietly paying for the silence between fires.
Read articleThe Greenbox Story · The Big Table
Due Diligence: Every Decision Under a Microscope (preview)
Whether Greenbox expands internationally or considers selling, every decision they've ever made is about to be examined. The ADRs save them. The handshake deals don't.
Read articleExam Room · Foundations
How to Run a Well-Architected Review Across Six Pillars (preview)
An architecture review on a Friday afternoon. One service, six people around the whiteboard, and the quiet realisation that the question 'is this a good design?' has at least six separate answers. AWS's Well-Architected Framework gives those answers names. Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and Sustainability, and turns a vibes-based review into a systematic one. 'Is the design good?' isn't quite the right question; 'good at which things, and what are we trading to get there?' is.
Read articleExam Room · Foundations
How to Apply the AWS Shared Responsibility Model (preview)
A CVE drops in glibc. A misconfigured S3 bucket leaks customer data. A hypervisor in Dublin needs firmware. Three incidents, three very different pagers going off, and a shared responsibility model that draws a clean line between what AWS owns and what we own. Whether AWS is secure isn't really the question; which half of the contract we're holding up, and whether the service we picked moved the line, is.
Read articleThe Greenbox Story · New Ground
Data Residency: Where the Bytes Live (preview)
When Greenbox acquires a New Zealand company, a problem appears that nobody had to think about when the business was only in Australia. Where is the data legally allowed to live? Who owns it? And what changes when subscribers cross a border?
Read articleExam Room · Foundations
How to Match EC2 Pricing Models to Workload Rhythm (preview)
A 24/7 web tier whose load is boringly predictable. A nightly ETL that can survive being killed and restarted. A dev fleet nobody can forecast. Three workloads, three rhythms, and five different ways AWS will sell us a compute hour. Every option is cheapest for something; what we want to know is which rhythm pairs with which pricing model, and what we'd have to believe about the future to pick each one.
Read articleExam Room · Foundations
How to Stay Inside the AWS Free Tier (preview)
A student wants to host a small web app on AWS for under $5/month and keeps reading stories of people getting surprise bills. 'AWS Free Tier' names three different programmes with three different expiry rules. Telling them apart is how you avoid the bill.
Read articleThe Greenbox Story · New Ground
Platform Engineering: When Three Cities Break Everything (preview)
The technology Tom and Priya built for three cities hits its limits at six. Deploy queues, flaky tests, Conway's Law, and the first SRE hire.
Read articleThe Greenbox Story · New Ground
One Diagram, Two Countries, Three Regions (preview)
The container diagram tells you what Greenbox is. It does not tell you where Greenbox lives. The day after agreeing to acquire Harvest Circle, Sam asks a question that the existing diagrams have no answer for — and the team discovers a C4 view they've never drawn.
Read articleThe Greenbox Story · New Ground
Crossing the Ditch: Greenbox Goes to New Zealand (preview)
Greenbox is in three Australian cities. The Series B mandate was to expand, and the team has been debating whether the next step is a fourth Australian city or something bigger. A phone call from an Auckland founder changes the shape of the decision.
Read articleExam Room · Advanced Architecture
How to Filter Egress at Organisation Scale (preview)
Forty thousand EC2 instances across two hundred VPCs in forty accounts. Today each workload speaks to the internet through local NAT; tomorrow a compromised container could exfiltrate data to anywhere on the web and nobody would notice until the invoice. Egress filtering at scale isn't about turning on a firewall, it's about building one organisation-wide policy, routing every packet through it, scaling it past the single-TGW limit, and convincing forty teams the blast radius is worth it.
Read articleExam Room · Advanced Architecture
How to Build Change Feeds Across Regions (preview)
A DynamoDB table holds order state in eu-west-1. A warehouse-management service in us-east-1 and an analytics pipeline in ap-southeast-2 both need to react to every write. Global Tables solve 'the data exists in both Regions' but not 'Region B reacts to a change.' DynamoDB Streams plus Kinesis Data Streams plus cross-region delivery is the native answer, and knowing the shape lets us decide when Global Tables suffices and when we need the streaming overlay.
Read articleThe Greenbox Story · New Ground
Scaling Culture: When Half the Team Has Never Met You (preview)
Fifty-five people across six cities. Values posters on walls. People who've never met Maya. How do you maintain culture when you can't fit everyone in one room?
Read articleExam Room · Advanced Architecture
How to Meet Data Residency Without a Local Region (preview)
A financial regulator in a small country requires that all customer data physically reside inside the country's borders. AWS doesn't operate a Region there. Capex-scale on-prem hardware would work but the business doesn't want to run a data centre. Picking which AWS-branded box gets dropped into the country is the surface decision; underneath is what 'data residency' actually means in the regulator's eyes, which AWS extension models meet that bar, and what each one quietly gives up in exchange for putting compute close to the regulated jurisdiction.
Read articleExam Room · Advanced Architecture
Choosing Wavelength, Local Zones, or Outposts at the Edge (preview)
A factory-floor AR application needs sub-10ms round-trip latency between a tablet and its inference backend. The parent Region is 50ms away; the factory's uplink is 5G. AWS Wavelength sits compute inside the carrier's 5G network, one hop from the tablet's radio. Knowing when it earns its weight, and when a nearby AWS Local Zone or Outpost does the same job for less, is the difference between ultra-low-latency and ultra-pricey-but-not-faster.
Read articleThe Greenbox Story · New Ground
Acquisition Strategy: When Building Takes Too Long (preview)
Greenbox buys a smaller produce box company in Sydney. Integration is harder than anyone expected. Half the Sydney team quits in three months.
Read articleExam Room · Advanced Architecture
KMS or CloudHSM for FIPS 140-2 Level 3 (preview)
A payments regulator asks for the cryptographic keys that sign card transactions to live in hardware rated FIPS 140-2 Level 3. KMS is 'hardware-backed'; is that enough? CloudHSM is 'single-tenant hardware'; is that necessary? The answer sits in the gap between what each service lets you do, what FIPS certifies, and what a specific regulator will accept. Getting it right means knowing when KMS is sufficient and when CloudHSM earns its weight.
Read articleExam Room · Advanced Architecture
Choosing a Service Mesh on EKS (preview)
Forty microservices in an EKS cluster. Three teams have added their own Istio sidecars; one team has installed Linkerd; the platform team has a half-working AWS App Mesh deployment from two years ago. The CTO wants one mesh or none. Choosing is less about 'which mesh is best' than about what problems each solves, which of them the team is willing to run, and whether the answer should be a service mesh at all or something thinner.
Read articleThe Greenbox Story · Going National
Supply Chain Design: The Fifty-Kilometre Promise (preview)
Regional Australia doesn't have the farm density of Perth. The fifty-kilometre local promise can't work everywhere, and the compromise tests everything Maya believes in.
Read articleExam Room · Advanced Architecture
When to Put AWS Outposts on the Warehouse Floor (preview)
A fulfilment centre in a town with bad internet has to run warehouse management software with sub-10ms latency to the robots on the floor. Cloud isn't an option, the fibre link drops four times a month. On-prem hardware works but the team doesn't want another fleet to patch. AWS Outposts is the third answer: AWS-managed hardware sitting in the warehouse, running the same services as the cloud, connected back to the parent Region. Knowing what Outposts can and cannot do is how we avoid a million-pound rack collecting dust.
Read articleThe Greenbox Story · Going National
Three CoPs, Two Survived (preview)
Patricia asks a question at the May board. Maya answers it by seeding three communities on purpose. One dies quietly in three weeks. Maya sits down to write Patricia an update in a numbered list, re-reads the list, and deletes it.
Read articleExam Room · Advanced Architecture
How to Build Hybrid DNS with Route 53 Resolver (preview)
A Fargate service in AWS needs to resolve hostnames from the on-prem AD domain; an on-prem application needs to resolve internal AWS service endpoints. Today each side has its own DNS and they don't talk. The clean answer is Route 53 Resolver with inbound and outbound endpoints plus forwarding rules, one piece gets AWS-side queries to on-prem, the other gets on-prem queries to AWS. Knowing which endpoint points which way is what the design hinges on.
Read articleThe Greenbox Story · Going National
Hiring at Scale: Beyond the Referral Network (preview)
Thirty-five people to fifty-five in two years. The culture that worked when everyone knew everyone doesn't work when half the company has never met Maya.
Read articleExam Room · Advanced Architecture
How to Set Up Distributed Tracing Across Accounts (preview)
One customer request touches seven services in five accounts. When it times out, three teams look at three dashboards and disagree about whose fault it is. Distributed tracing is the obvious move; the harder part is how the trace follows a request across account boundaries, how spans get aggregated without each team reinventing the wheel, and what has to be true of the SDK, the sampling rate, and the IAM model for this to stop being a lunchtime-every-day conversation.
Read articleThe Greenbox Story · Bets That Learn
Who Owns This on a Tuesday at 3 a.m. (preview)
The pause-risk model has been in full production for two weeks. Dina's agreement rate has slipped four points. The pager has not gone off, which is the problem. The team is going to have to learn to page themselves.
Read articleExam Room · Advanced Architecture
How to Plan On-Prem DR with Elastic Disaster Recovery (preview)
A data-centre full of VMware VMs, the on-prem half of the estate nobody has refactored yet, needs a disaster recovery plan that covers actual data-centre fires, not just cloud-Region outages. The traditional answer is 'replicate to another data-centre, hope the failover runbook works.' AWS Elastic Disaster Recovery is the cloud-native answer: block-level replication to AWS, minutes of RTO, tested by actually failing over. Understanding it means understanding the replication agent, the staging subnet, the launch template, and what separates DRS from Backup.
Read articleThe Greenbox Story · Going National
Expansion Playbook: Scaling What Worked (preview)
Greenbox can't Event Storm their way into every new city from scratch. They need a repeatable expansion playbook, and Adelaide is the first test.
Read articleExam Room · Advanced Architecture
How to Aggregate Logs Across an Organisation (preview)
The auditor wants a single place to query 'who accessed the payroll database' across twenty accounts. The SOC wants alerts on a failed login from any account in under a minute. The data team wants the same logs for anomaly detection without paying for them twice. Aggregating logs across an organisation is less about shipping bytes than about deciding which log goes through CloudTrail Organization Trail, which through central CloudWatch subscriptions, which through S3 Athena queries, and which through an observability platform sitting on top.
Read articleExam Room · Advanced Architecture
Choosing Private Connectivity Between VPCs (preview)
Two teams in the same organisation want to call each other's services. Today they do it over the public internet, through NAT, TLS, and each other's ALBs. It works. It's also logging into Flow Logs as 'traffic to a well-known cloud-provider IP range' and the security team has views. The options for keeping internal calls off the public internet, private endpoints, VPC peering, transit hubs, service meshes, each trade privacy against coupling differently. Once you understand who initiates, who sees whom, and who allow-lists whom, the picture clicks into place.
Read articleThe Greenbox Story · Building the Company
Series B Fundraising: The Pitch (preview)
Maya presents to Series B investors. They ask questions she can answer, questions she can't, and one question that changes everything.
Read articleExam Room · Advanced Architecture
How to Deploy One Template Across Many Accounts with StackSets (preview)
A security baseline stack. GuardDuty, Config, CloudTrail, a handful of IAM roles, has to exist in every account in the organisation. Twenty accounts today, one per week for the foreseeable future. Clicking through twenty CloudFormation deploys is how you get drift; CloudFormation StackSets promises to do it once and have the organisation maintain it. The promise is real; the details, service-managed vs self-managed, stack instance targets, drift detection, the dreaded failed-deployment rollback, are where it earns the Pro.
Read articleThe Greenbox Story · Building the Company
Founder Commitment: Skin in the Game (preview)
Maya remortgaged her apartment. Tom deferred his student loan repayments. Sam took a 60% pay cut. Before the seed round, Greenbox ran on personal risk and a spreadsheet that Maya checked every morning at 5am.
Read articleConsulting and Craft · In Practice
Directional, Not Definitive (preview)
Most teams running A/B tests don't have the traffic to do them properly. The honest answer isn't 'don't test'. It's knowing what your test can and can't tell you, and picking techniques that match the evidence you can actually collect.
Read articleUnder the Hood
How Multi-Factor Authentication Works: From Passwords to Time-Based One-Time Passwords (preview)
A warm, thorough walk through authentication, from single-factor passwords to TOTP codes, the maths behind them, why they can be phished, and what comes next.
Read articleExam Room · Advanced Architecture
How to Build Self-Service Provisioning with AWS Service Catalog (preview)
Product teams want new RDS databases on demand. Security wants every database encrypted, tagged, in the right subnet, with the right backup plan. Platform wants the request to stop coming through Slack. A self-service portal with vetted building blocks sits between all three. AWS Service Catalog is the shape, once you know what's a Product, what's a Portfolio, what a launch constraint does, and why StackSets are the deployment engine underneath.
Read articleThe Greenbox Story · Building the Company
Sustainability Reporting: What the Board Asked For (preview)
Three days before the Series B pitch, Diane raises a question Maya has been avoiding. The board needs a sustainability story — and the story has to be true. Greenbox has been measuring everything except the thing the whole business is supposed to be about.
Read articleExam Room · Advanced Architecture
How to Roll Out AWS Backup Across an Organisation (preview)
Twenty member accounts. Each team owns its own data, its own retention policy, and its own idea of what 'backed up' means. The auditor wants one number for every resource: when was it last backed up, where does the backup live, and who can delete it. AWS Backup policies through Organizations are the answer, if you understand what they enforce, what they don't, and where the SCPs have to pick up the slack.
Read articleHigh Performance Teams
Two Deploys to Ship One Feature (preview)
The first time I noticed it, I thought I was looking at a coincidence. The pattern repeated for a quarter. By then it had a name.
Read articleUnder the Hood
A Guide to DNS: How the Internet Finds Anything by Name (preview)
A thorough, narrative walk through the Domain Name System, from the single text file that used to hold every hostname on the internet, through the distributed hierarchy of root servers and resolvers, to the security nightmares of cache poisoning and the privacy promises of encrypted DNS. Everything you wanted to know about how your browser turns a name into an address.
Read articleExam Room · Advanced Architecture
Choosing Between Security Groups, NACLs, and Network Firewall (preview)
A security auditor sits down with the VPC diagram and asks what stops an instance in the web tier from talking to the payroll database. Three answers are on the table: the security group attached to the database, the NACL on its subnet, and the Network Firewall in the egress VPC. They all say 'no'. They say it differently, they fail differently, and the real work is picking which one carries the real policy rather than being the belt-and-braces on something else.
Read articleThe Greenbox Story · Building the Company
Governance Evolution: How the Board Grew Up (preview)
The first board meeting was Maya, Tom, and Angela at a cafe in Fremantle. The most recent one had an independent director, a Series B investor, a governance framework, and a 47-page board pack. The distance between those two meetings is the distance between a startup and a company.
Read articleExam Room · Advanced Architecture
How to Set Up S3 Cross-Region Replication for Compliance (preview)
The regulator has told us customer records must be stored in two jurisdictions, the auditor has told us deletes must be irrecoverable for seven years, and finance has told us the Glacier bill is already too high. S3 Cross-Region Replication sounds like the answer to the first problem. It usually is, once you understand what it does and doesn't replicate, what happens to existing objects, and what it costs to undo a mistake.
Read articleThe Workshop
The Workshop: The Planning Onion (preview)
A 90-minute pattern for making the layers of planning visible at once — strategy, portfolio, quarter, sprint, day — and finding the initiatives that live on one layer with nothing above or below to anchor them. With a vertical-slicing test that checks whether a sprint item can actually trace back to the strategy.
Read articleUnder the Hood
How the Internet Routes Traffic (preview)
BGP, autonomous systems, peering, IXPs, the 2021 Facebook outage, how AWS routes your traffic, anycast, undersea cables, and traceroute: the infrastructure that decides how your packets get from here to there.
Read articleExam Room · Advanced Architecture
Choosing Among the Four AWS DR Strategies (preview)
A regulator asks for a disaster-recovery plan. The CFO asks what it will cost. The CTO asks how often we'll test it. Three questions, one answer, and the answer depends on which of four DR strategies we pick. Backup and restore at one extreme; multi-site active-active at the other. Between them, pilot light and warm standby, each with a different daily cost, a different recovery time, and a different amount of infrastructure sitting idle.
Read articleThe Greenbox Story · Building the Company
Succession Planning: Key Person Insurance (preview)
If Maya, Tom, or Dave disappeared tomorrow, what would break? The answer applies a 20% discount to the company's value.
Read articleExam Room · Advanced Architecture
How to Centralise Egress for Many VPCs (preview)
Forty VPCs across twelve accounts, each with its own NAT gateway and its own outbound firewall rules. The bill for NAT alone is eye-watering, and every time a new SaaS domain needs allow-listing there are forty places to update it. Centralised egress promises one place to log, inspect, and pay, but it also promises a single choke point, a new hub account to run, and a Transit Gateway in the middle of every packet's life. The question is which boxes belong on the path and which don't.
Read articleThe Workshop
The Workshop: Threat Modelling (preview)
A pattern for walking a system diagram with a deliberately hostile eye, using STRIDE as a checklist, so that the ways a system can be abused become visible before the people who'd like to abuse it find them.
Read articleUnder the Hood
How WiFi Actually Works (preview)
WiFi is the most successful networking technology ever deployed, and most people — including most developers — have no idea how it works. It isn't just 'wireless ethernet.' It's an elaborate negotiation between devices sharing a medium they can't reserve, carried by radio waves on frequencies shared with your microwave.
Read articleExam Room · Advanced Architecture
How to Design Multi-Region Failover Under a Minute (preview)
A payments API that has to keep taking traffic through a whole-Region outage, with a recovery-time objective of under a minute and zero planned data loss. That sentence sounds like a shopping list until you start pricing what each word costs. The real work isn't choosing active-active over active-passive; it's picking which of six data-replication shapes you can live with, and which latency, conflict, and cost you'll accept to get there.
Read articleThe Greenbox Story · Building the Company
Board Governance: The Board Room (preview)
Informal governance got them to 12,000 subscribers. Series B investors want KPIs, board packs, and an independent director who asks uncomfortable questions.
Read articleExam Room · Advanced Architecture
How to Build Ransomware-Proof Backups Across an Organisation (preview)
A financial services firm has a ransomware-resilience mandate with six absolutes: backups immutable against anyone with production credentials, stored in a separate account under separate IAM control, copied to a second region, retained for seven years, centrally managed across forty accounts, and auditable through a named reporting service. Each absolute, alone, has two or three plausible AWS answers. Taken together, only one arrangement survives contact with all six.
Read articleThe Workshop
The Workshop: Cynefin Sensemaking (preview)
A 75-minute pattern for sorting a pile of problems into the kinds of actions they reward. Clear, complicated, complex, chaotic, confusion — and the realisation that disagreement about which domain a problem belongs in is often the problem itself.
Read articleUnder the Hood
How TLS Works: From Trusted Networks to Trust-No-One (preview)
A thorough, approachable guide to TLS, from the first computer networks through to modern encryption. The history, the maths, the politics, and the practical reality of securing the web.
Read articleExam Room · Advanced Architecture
Choosing PrivateLink for SaaS Vendor-to-Customer Connectivity (preview)
A SaaS vendor sells an observability product to enterprises who won't let production traffic touch the public internet, not even the vendor's HTTPS endpoint behind a WAF. The vendor runs the service from their own AWS account. Customers consume it from their own VPCs, in their own accounts, with overlapping RFC1918 space. Several AWS connectivity primitives could in principle solve a cross-account reach problem; the interesting work is filtering out the ones whose shape doesn't fit a vendor-to-customer relationship between separate companies.
Read articleThe Greenbox Story · Building the Company
Go-to-Market: Selling Without Maya (preview)
Greenbox has never done outbound sales. The first sales hire speaks a language the engineering team doesn't understand.
Read articleExam Room · Advanced Architecture
How to Share Hub-Account Resources Across Many Spokes (preview)
A platform with twenty-five accounts carves one out as shared-services and wants to put a Transit Gateway, Route 53 Resolver rules, customer-managed KMS keys, a Glue Data Catalog, and a tree of SSM parameters in it — visible to the other twenty-four. Four resources, one central account, and no single sharing mechanism that works for all of them. Pick the wrong primitive per resource and you end up either duplicating everything per spoke, or writing custom replication glue, or federating via a pile of resource policies that nobody audits. Pick the correct primitive per resource and the hub works.
Read articleExam Room · Advanced Architecture
Preventing Lost Writes in DynamoDB Global Tables (preview)
A SaaS platform runs DynamoDB Global Tables across three regions. Users in all three read and write concurrently. Then a subscriber's update made in eu-west-1 vanishes — another write landed on the same item in us-east-1 a few milliseconds later, and replication's tie-breaker picked the other one. Multi-region active replication in DynamoDB is last-writer-wins by item timestamp, and the lost write is silent. This scenario is really about knowing when that's acceptable, when it isn't, and what patterns keep both writes alive.
Read articleThe Greenbox Story · The Missing Layer
Cash Flow: The Bank Balance at 5am (preview)
Maya checks the bank balance every morning at 5am. She's done it since before the seed round. The number has grown from five figures to seven, but the anxiety hasn't scaled down with it.
Read articleExam Room · Advanced Architecture
How IAM Policy Evaluation Actually Works (preview)
AdministratorAccess that can't create an access key, a Permissions Boundary that looks restrictive but isn't, and a cross-account Lambda write that dies despite both sides seeming to grant it. Three AccessDenied mysteries with one answer: AWS evaluates every request against up to six policy types, in a fixed order, starting from a default deny. Learn the order and the mysteries disappear.
Read articleThe Greenbox Story · Bets That Learn
Shadow, Canary, Rollback (preview)
Six weeks of shadow mode. One canary week that stopped on the Wednesday. A rollback that took three minutes because it had been written in February by a woman who had decided that the next time, whatever it was, was not going to take them by surprise.
Read articleConsulting and Craft · In Practice
LLMs as Thinking Partners: How the Role Evolved (preview)
From code generator to thinking amplifier — how structured inputs transform LLM output, and why discovery techniques are the missing ingredient.
Read articleUnder the Hood
How Randomness Works (preview)
Entropy, PRNGs, cryptographic randomness, the Mersenne Twister, /dev/urandom, and why the quality of your random numbers determines the security of everything from TLS keys to TOTP codes.
Read articleExam Room · Advanced Architecture
Standing Up a Landing Zone for Eighty Accounts (preview)
A ten-year AWS customer with roughly eighty accounts has decided to formalise the multi-account foundation they should have built in 2018. Four engineers, six months, a prescriptive OU structure, centralised security tooling, guardrails, self-service provisioning, and SSO. Several foundations compete for the job; they differ on how much they ship out of the box, how much they expect the team to build, and how well they fit the regulatory shape of the work.
Read articleThe Greenbox Story · The Missing Layer
Business Valuation: What Are We Worth? (preview)
The board wants a valuation. The answer depends on things Maya has never thought about — and one discount she doesn't want to hear.
Read articleExam Room · Advanced Architecture
Choosing Aurora Global Database for Cross-Region DR (preview)
A 2 TB Aurora PostgreSQL fleet needs a cross-region copy with RPO under a minute and RTO under an hour, managed failover, and the secondary pulling low-priority reads during normal operations. AWS has several primitives that move database state across regions, with different RPO floors, different operational models, and different reasons to exist. The work is sorting which one matches each property the requirement actually demands.
Read articleThe Greenbox Story · Bets That Learn
The Boring Middle (preview)
Three weeks into the data audit, Kai stops sleeping properly. The labels are a mess. The event log lies in two different ways. And the model that wins, when they finally run the bake-off, is the one Priya least wanted to ship.
Read articleThe Greenbox Story · Bets That Learn
Framing Before the Model (preview)
Jas brings a bereavement case to the whiteboard and it ends the whole discussion about what goes on an at-risk list. The don't-call list gets built in pencil, in a notebook, in forty seconds, by a woman who has been writing about Thursdays since September.
Read articleThe Workshop
The Workshop: Ensemble Programming (preview)
A pattern for putting the whole team on one problem at one keyboard, rotating roles on a short timer, so that knowledge spreads as the code is written and review happens at the moment of writing instead of the week after.
Read articleUnder the Hood · Money
How Crypto Actually Works (preview)
Strip away the speculation, the scams, and the tribal wars, and cryptocurrency is a genuinely clever piece of computer science. Here's what it actually does, how it works, and what it solves, and doesn't.
Read articleExam Room · Advanced Architecture
VPC Peering or Transit Gateway (preview)
Two VPCs that need to talk to each other is one peering connection. Fifty VPCs that need to talk to each other is one thousand two hundred and twenty-five. The arithmetic is what forces a different connectivity shape at scale, and at multi-region scale, the shape that works for fifty VPCs is just one rung on a ladder that keeps going.
Read articleThe Greenbox Story · The Missing Layer
Strategic Focus: Saying No (preview)
The board wants a loyalty programme. Marcus wants a corporate gifting portal. A key customer wants a custom API integration. Maya has said yes to all three, and the engineering team has capacity for one.
Read articleExam Room · Architecture
DynamoDB On-Demand or Provisioned Capacity (preview)
DynamoDB can bill per-request or by reserved capacity. Per-request scales to zero; provisioned locks in discount in exchange for commitment. At the switching point, one month the team pays more on-demand than they would have on provisioned, and the question gets asked. Cheaper depends entirely on the traffic shape, and what the knobs look like if you get the match wrong.
Read articleThe Greenbox Story · The Missing Layer
Naming the Thursday Thing (preview)
Lee has a name for what Priya started in the kitchen five months ago. The name is older than Greenbox. Using it is more dangerous than not using it.
Read articleThe Greenbox Story · Bets That Learn
Impact Mapping an ML Bet (preview)
Pause rate is drifting. Priya and Kai walk in with a three-week plan to build a model. Lee makes them sit with the goal for an hour. Two of the three things they meant to ship do not survive the hour.
Read articleThe Workshop
The Workshop: Blameless Post-Mortems (preview)
A facilitator playbook for the meeting that turns an incident into a structural improvement. Five phases, three actions, no scapegoats. Pin the summary on the wall, keep the guide open on your tablet, and run the session.
Read articleUnder the Hood
How Computer Networks Work (preview)
A warm, thorough walk through computer networking, from the first telegraph wire to cat gifs, from the OSI model to tcpdump, from trust to firewalls.
Read articleExam Room · Architecture
How to Diversify a Spot Fleet to Survive Capacity Events (preview)
Spot Fleet gives up to 90% off on-demand, but the capacity comes and goes on AWS's terms. A fleet that all requests c6i.2xlarge in us-east-1a is a fleet that goes offline together. The whole point of a Spot Fleet is diversification: enough instance types in enough pools that when one pool gets reclaimed, the others absorb the load. The discount is the easy headline; the engineering question is how wide the diversification has to be before the fleet genuinely survives capacity events.
Read articleThe Greenbox Story · The Missing Layer
Delegation: The Management Gap (preview)
Maya is the bottleneck again — but this time at company level. Charlotte says: you've built a product, now you need to build a company.
Read articleExam Room · Architecture
Routing Many Sites Through One ALB (preview)
Four micro-frontends, three domains, and one shared ALB. Routing traffic to the correct target group is either a host-based decision, a path-based decision, or both. The correct rule order matters; the wrong one sends 'admin' requests to the marketing team. Host versus path is the wrong framing; the real work is how the two rule types compose, where priority comes in, and how to keep the rule table readable as the service catalog grows.
Read articleThe Greenbox Story · Bets That Learn
Voice in the Weights, Voice in the Fire (preview)
A small model, fine-tuned on four years of Maya's handwritten box notes, drafts the weekly note for every subscriber. Week three, it sends the wrong sentence to the wrong person. The GPU bill turns out to be the cheapest thing about the project.
Read articleThe Greenbox Story · What Growth Takes
The Thursday Thing (preview)
Three months after the allergen incident, Priya sends an invite with no name on it. Four people come. Twelve weeks later, the most useful meeting at Greenbox still doesn't have a name, and Priya is going to fight to keep it that way.
Read articleThe Workshop
The Workshop: Which Workshop When (preview)
Sixteen techniques for discovery, design, and delivery. A flowchart and a categorised index for choosing the one that fits the problem in front of you. Pick the smallest workshop that solves the actual problem, not the biggest one your team knows how to run.
Read articleUnder the Hood
How CI/CD Pipelines Work (preview)
From make and cron to GitHub Actions, continuous integration, continuous delivery, continuous deployment, artefacts, environments, feature flags, blue-green deployments, rollback, and the evolution from a shell script to a proper pipeline.
Read articleThe Greenbox Story · What Growth Takes
Technical Migration: The Strangler Fig in Go (preview)
Tom's second migration attempt uses the strangler fig pattern. This is what the feature flags, dual reads, dual writes, and discrepancy detection look like in Go, the code that makes a database migration boring.
Read articleThe Greenbox Story · What Growth Takes
Technical Migration: The One That Went Sideways (preview)
The subscription database needs to be split. Tom's plan is elegant. The migration starts on a Thursday afternoon, and by Friday morning, two thousand subscribers have duplicate billing records.
Read articleExam Room · Architecture
Replacing a Windows File Server With FSx (preview)
An on-prem Windows file server holds 40 TB of departmental shares used by 600 staff. The hardware is end-of-life, the office is moving, and finance wants the capital refresh gone. AWS has four file-storage services that might replace it, and they're not interchangeable. None of them is 'the cloud file server'; the work is figuring out which of FSx for Windows, FSx for NetApp ONTAP, EFS, or File Gateway matches the access pattern, the protocol expectations, and the Active Directory footprint this shared drive actually has.
Read articleConsulting and Craft · In Practice
Written First: Making Async Work (preview)
A daily standup that worked at five people in one room becomes a war crime at twenty-eight people across three timezones. Better video calls aren't the fix. Writing things down first is.
Read articleUnder the Hood · Money
How Central Banks Work (preview)
Central banks set interest rates, print money, and occasionally rescue entire economies. But how, exactly? The machinery is less mysterious than it sounds, and more powerful than most people realise.
Read articleExam Room · Architecture
Detecting Drift With AWS Config (preview)
Somebody opened a security group to 0.0.0.0/0 on port 22 at 02:14 on a Saturday. The infrastructure pipeline would have rejected that change. The console did not. The team needs a mechanism that's independent of the pipeline, catches drift whether it came through Terraform or a console click, and, for a short list of specific drift types, reverses it automatically. The work is in which mechanism shape catches which drift, and how quickly a bad change gets reversed.
Read articleThe Greenbox Story · What Growth Takes
Key Person Risk: The Quiet Departure (preview)
Ravi gives his two weeks' notice on a Tuesday morning. He's not angry, not burned out, not making a statement. He just got a better offer. And suddenly the subscription system has no owner.
Read articleExam Room · Architecture
Configuring CloudFront Origin Failover for Regional Outages (preview)
CloudFront caches the hot stuff, but cache misses still have to reach an origin. When the origin is down, CloudFront returns 5xx to viewers. Origin failover is the setting that says 'try the backup origin first.' A backup origin is the easy decision; the work is in which failure conditions trigger failover, how quickly, and what 'the backup origin' should actually be.
Read articleHigh Performance Teams
Tools and Techniques for Improvement (preview)
You can assemble a brilliant team and measure all the correct things. If the team doesn't have a way to get better, the brilliance decays. Improvement is a practice, not an event.
Read articleUnder the Hood
How Compression Works (preview)
Compression is the art of making data smaller without losing what matters. Every photograph you take, every video you stream, every web page you load is compressed. Understanding how it works — and the fundamental trade-offs behind it — is one of the quiet superpowers of computing.
Read articleExam Room · Architecture
How to Migrate Three Petabytes Before the Lease Ends (preview)
Three petabytes of historical genomic data sit on a SAN in a leased datacentre. The lease expires in nine months. The pipe out of the building is 1 Gbps shared with everybody else. Snow Family, DataSync, and Storage Gateway each solve a different piece of this, but only when you're precise about what kind of movement you need. Picking a transfer service comes down to matching the shape of the data, the shape of the deadline, and the shape of the ongoing read pattern.
Read articleThe Greenbox Story · What Growth Takes
The Planning Onion: From Weekly Habits to Yearly Vision (preview)
Three squads, three cities, great weekly habits — and no shared strategy connecting them. The planning layer that ties everything together.
Read articleExam Room · Architecture
Instance Store or EBS for Scratch Disk (preview)
Instance Store is fast, local, and free with the instance, and it evaporates the moment the instance stops. EBS survives, snapshots, encrypts, and charges per GB-month. Picking between them for ephemeral data isn't about cost alone; it's about what 'ephemeral' actually means for the workload. Speed is the easy comparison; the harder one is which durability contract matches the data you're writing.
Read articleConsulting and Craft · In Practice
The Planning Onion: Every Layer in One Place (preview)
Vision, year, quarter, sprint, day, a planning framework with a different question, cadence, and toolkit at every time horizon.
Read articleUnder the Hood
The Wonderful, Absurd Science of Colour (preview)
From ochre smeared on cave walls to pixels fired at your face by Apple Vision Pro: a tour of the science of seeing.
Read articleExam Room · Architecture
ECS on Fargate or on EC2 (preview)
A team runs ECS today on EC2 instances they patch, size, and scale themselves. Fargate promises to take all of that away. But Fargate's per-hour pricing is higher, its networking model is different, and not every workload fits. The choice isn't Fargate or EC2 in the abstract, it's which launch type matches which workload, and where the crossover lines actually sit.
Read articleThe Greenbox Story · What Growth Takes
Burnout: When the Backbone Cracks (preview)
Tom hasn't taken a day off in four months. He's still shipping code, still in every meeting, still the first to respond to a production alert. But the code he's writing isn't quite right any more, and he can't explain why.
Read articleExam Room · Architecture
API Gateway or ALB in Front of Lambda (preview)
A Lambda function needs an HTTP front door. API Gateway is the textbook answer. An Application Load Balancer with a Lambda target is the less-textbook answer. They price differently, scale differently, authenticate differently, and they integrate with adjacent AWS services in different ways. Picking between them comes down to matching front-door behaviour to the shape of the workload.
Read articleHigh Performance Teams
Getting the Right Humans Together (preview)
The best team I ever worked on had a loudmouth, a worrier, a perfectionist, and someone who never stopped asking 'but why?' They drove each other mad. They shipped the best software of my career.
Read articleUnder the Hood · Money
How Fraud Detection Works (preview)
Every card transaction is a bet: is this the real cardholder, or someone who stole their details? The system has about 100 milliseconds to decide, and getting it wrong in either direction costs money.
Read articleExam Room · Architecture
Aligning ALB and Route 53 Health Checks for Regional Failover (preview)
Route 53 says the region is healthy and sends traffic there. The ALB in that region thinks every target is failing and returns 503. Users see errors. Both health-check systems were configured by different teams and nobody realised they disagree about what 'healthy' means. Both checks are correct on their own terms; the work is sorting out which layer protects you from which failure, and getting them to agree.
Read articleThe Greenbox Story · Losing the Thread
Living Documentation: When the Wiki Starts Lying (preview)
The team wrote beautiful ADRs and meticulous decision tables. Six months later, a new developer follows one and breaks production.
Read articleExam Room · Architecture
Designing Direct Connect Failover for Sub-Minute Recovery (preview)
A 10 Gbps Direct Connect into eu-west-1 carries the bulk of our on-prem-to-AWS traffic; a Site-to-Site VPN sits idle as a backup. The last outage proved the backup was aspirational. BGP didn't converge, half the traffic black-holed for four minutes, and the postmortem action item read 'make the failover actually work.' Having a backup isn't the debate; it's which of four hybrid-connectivity shapes gives us sub-minute failover without doubling the line-item cost.
Read articleThe Workshop
The Workshop: Wardley Mapping (preview)
A pattern for laying the components of a value chain (the sequence of capabilities a user needs, top visible to the user, bottom foundational) against an evolution axis, so that build-buy-borrow arguments stop being about taste and start being about where each component actually sits.
Read articleUnder the Hood
How Browsers Render a Page (preview)
You type a URL and a page appears. Between that keystroke and those pixels, your browser performs DNS resolution, TCP handshake, TLS negotiation, HTTP request, HTML parsing, CSS computation, layout calculation, paint, and compositing, and it has about 100 milliseconds to feel fast.
Read articleExam Room · Architecture
How to Lock Down S3 for Seven-Year WORM Compliance (preview)
Compliance wants: write-once, read-many storage for seven years, across every environment, with immutability guaranteed against any credential, including the root account's, for the full retention. AWS has a small handful of mechanisms that claim some form of immutability; only some give the absolute guarantee the audit requirement actually demands. The work is sorting the absolute from the conditional, picking the retention shape, and deploying it so it stays compliant when someone accidentally rotates a bucket policy in year three.
Read articleThe Greenbox Story · Losing the Thread
Continuous Discovery: Making It Stick (preview)
Discovery isn't a project phase — it's a weekly practice. How Greenbox embedded learning into the rhythm of delivery, and what Lee and Charlotte learned along the way.
Read articleExam Room · Architecture
Picking an EBS Volume Type for Each Workload (preview)
A PostgreSQL database whose bottleneck is a steady 15,000 IOPS at 1 ms latency. A Cassandra cluster whose bottleneck is throughput, not IOPS. And a warehouse of CloudTrail logs that gets written once and read almost never. Three volumes, three workload profiles, and EBS has a family of volume types that each match one of them. 'Fastest' is the wrong axis; what matters is which dimension. IOPS, throughput, latency, or dollars per gigabyte, each volume type optimises for.
Read articleHigh Performance Teams
Defining and Measuring High Performance (preview)
Every team I've worked with wants to be high-performing. Almost none of them can tell me what that means.
Read articleUnder the Hood · Money
How Payments Work (preview)
When you tap your card, a message crosses at least four organisations in under two seconds. None of them trust each other. Here's how the money actually moves.
Read articleExam Room · Architecture
Backups That Survive the Region (preview)
Backups scattered across six services, each with its own schedule and retention. An audit question, 'prove you can recover any production resource, cross-Region, within 24 hours', that nobody can currently answer in fewer than three meetings. Which service has the best native backup isn't the problem; centralising backup policy across an organisation, copying to a second Region for durability, and ending up with one place to answer the audit question is.
Read articleThe Greenbox Story · Losing the Thread
Outcome Over Output: The Feature Factory (preview)
The backlog has 340 items. The team shipped twelve features last quarter. Nobody can say whether any of them moved the needle.
Read articleExam Room · Architecture
Composing Savings Plans Across a Portfolio (preview)
Fifteen workloads across four accounts and two Regions. A portfolio bill of three-quarters of a million dollars a year on compute. Finance has asked for a commitment plan. Cheapest-for-a-single-workload is the workload-level decision and it's already been made for the straightforward cases. The portfolio-level question is how to compose commitment shapes so the baseline is covered, the Spot-tolerant hours are cheap, and the flexibility budget hasn't been spent on something that can't bend.
Read articleConsulting and Craft · In Practice
Data-Informed, Not Data-Driven (preview)
'The data says we should do X' sounds rigorous. 'The data suggests X, what do we think?' is rigorous. The difference is the most important habit in modern decision-making.
Read articleUnder the Hood
How HTTP Works (preview)
HTTP is the protocol the web is built on. It's a text-based request-response conversation between your browser and a server, and once you understand the conversation, everything else about the web starts making sense. Caching, cookies, APIs, REST, load balancers, CDNs: they're all ways of shaping or intercepting HTTP traffic.
Read articleExam Room · Architecture
Choosing Between Aurora Replicas, RDS Read Replicas, and DMS (preview)
An Aurora writer pinned at 90% CPU because the reporting team's dashboards read live from production. A classic RDS MySQL with nightly batch jobs that read-lock the master. And a new multi-Region requirement for a PostgreSQL database that has to serve analytics in Sydney off a writer in Frankfurt. Three flavours of 'offload the reads'. Aurora replicas, RDS read replicas, and DMS, and they are not the same mechanism. Speed isn't what decides it; matching the replication technology to each workload's isolation and freshness requirement is.
Read articleThe Greenbox Story · Losing the Thread
Subscribers Aren't Customers: Building the Greenbox Community (preview)
Greenbox has six thousand subscribers and a growing customer support backlog. Sam notices something odd in the data: the subscribers who churn the least are the ones who talk to each other. A conversation about community begins as a curiosity and ends as a strategy.
Read articleExam Room · Architecture
SQS Standard or FIFO for Ordered Workloads (preview)
A click-tracking firehose that can drop duplicates at the app layer but must scale to 50,000 messages a second. A payment-state machine that must never process the same transaction twice and must apply updates in the order they were issued, per customer. Two workloads, one service called SQS, two flavours that are not interchangeable. 'Which is better' has no answer in the abstract; what does is what each flavour actually guarantees, what it costs, and what each workload can tolerate if the guarantee slips.
Read articleThe Workshop
The Workshop: Retrospectives (preview)
A pattern for creating a structured, safe space to inspect how the team is working and decide one or two things to change, so that continuous improvement stops being a poster on the wall and starts being a thing that actually happens.
Read articleUnder the Hood
How Maps Lie (preview)
Every map you've ever seen is wrong. It has to be: you can't flatten a sphere without distorting something. The question is what you choose to distort, and what that choice reveals about who made the map and why.
Read articleExam Room · Architecture
ElastiCache: Redis or Memcached (preview)
Two caches on the same platform are asking to be decided between: a session store that has to survive a node reboot, a leaderboard that needs sorted sets, and a read-through cache for a recommendation API where any node can die without consequence. Redis and Memcached both show up as engine choices in ElastiCache. Microbenchmark speeds aren't the deciding factor here; what is, is which data model, durability story, and replication shape each workload actually needs.
Read articleThe Greenbox Story · The Right Tool
Threat Modelling: What the LLM Didn't Think About (preview)
Payment data, customer PII, farm contracts. The more code LLMs generate, the more systematic security thinking matters.
Read articleExam Room · Architecture
Choosing Between NAT Gateway, NAT Instance, and Egress-Only IGW (preview)
A fleet of private instances that needs outbound internet for package updates. An IPv6-only subnet that wants egress without advertising any IPv6 addresses to the world. And an accountant asking why a NAT Gateway bill has become one of the top line items. Three problems, three AWS components. NAT Gateway, NAT Instance, and egress-only Internet Gateway, and they are not the same thing despite sounding like they should be.
Read articleConsulting and Craft · In Practice
Retrospectives at Every Scale (preview)
Sprint retros, cross-team retros, quarterly retros, yearly retros — the feedback loop that keeps every planning layer honest.
Read articleUnder the Hood · Money
What Money Actually Is (preview)
Everything you learned about money, the barter story, gold coins, the idea that it's a 'thing', is probably wrong. Money is a ledger, a promise, and a collective hallucination that runs the world.
Read articleExam Room · Architecture
When Aurora Should Scale Itself (preview)
A SaaS tenant database with long-flat daytime load and sharp weekend spikes. A reporting replica that's idle 22 hours a day and pinned to 100% for two. A dev cluster that's on for an hour and off for twelve. Three Aurora clusters, three very different load profiles, and two ways Aurora can be sized: capacity you set, or capacity Aurora sets for you. Cheaper-in-the-abstract is a meaningless answer; what counts is which load profile each mode is shaped for, and where the cost crossover actually lands.
Read articleThe Greenbox Story · The Right Tool
Cynefin: Not Everything Needs a Workshop (preview)
Simple, complicated, complex, chaotic — the framework that tells you which approach to use and when to stop over-thinking.
Read articleExam Room · Architecture
Choosing Between EFS and the FSx Family for Shared Storage (preview)
A Linux web farm that needs a shared filesystem for user uploads. A Windows application tier that expects SMB and Active Directory. A genomics pipeline that wants to read the same 80 TB dataset from thousands of Lambda invocations. And a small team on macOS with hand-edited NFS mounts. One problem, four protocols, and a catalogue of services that can each solve some but not all of it. 'EFS or FSx' is the wrong framing; what matters is which filesystem protocol each workload actually speaks, and which AWS service speaks it natively.
Read articleConsulting and Craft · In Practice
Scaling Knowledge: From One Person's Head to Twenty-Five People's Practice (preview)
The hardest scaling problem isn't technical, it's knowledge. How domain knowledge moves from implicit to explicit to embedded, and which techniques unlock each step.
Read articleUnder the Hood
Open Data: The Case for Sharing (preview)
The best datasets in the world are free. Government budgets, transport timetables, weather observations, property boundaries — published openly, used by millions, and funded by your taxes. Here's why open data matters, how to publish it well, and when you might choose not to.
Read articleExam Room · Architecture
Choosing Between ALB, NLB, and Gateway Load Balancer (preview)
A public HTTPS API that needs path routing and WAF. A TCP service that must preserve the client IP down to the instance. A fleet of third-party firewall appliances that every packet leaving the VPC has to pass through. Three problems, three load balancers, and they are not interchangeable. None of them is 'best' in the abstract; what matters is which OSI layer each sits at, and what that layer lets you do that the others can't.
Read articleThe Greenbox Story · The Right Tool
Ensemble Programming: The Team Navigates, the LLM Types (preview)
When the LLM does the typing, the whole team can focus on thinking. Mob programming reimagined for an AI-augmented world.
Read articleExam Room · Architecture
CloudFront Caching for Mixed Static and Personal Content (preview)
A single domain serves a marketing shell that barely changes, a product catalogue that changes hourly, and a logged-in account area that's different for every visitor. One CloudFront distribution in front of all of it. Caching isn't in doubt; the work is letting CloudFront cache aggressively where it can while telling it, precisely, where it must not.
Read articleThe Greenbox Story · Faster Together
Sam, Jas, and the Invisible Work (preview)
The Greenbox story is told through Maya's decisions and Tom's code. But a startup runs on the people who handle the work that doesn't make it into architecture diagrams or board packs.
Read articleUnder the Hood · Trust
How Reputation Systems Work (preview)
eBay stars, Amazon reviews, Uber ratings, PageRank, every platform that connects strangers needs a way to manufacture trust from thin air. How do they do it? And why do all of them eventually break?
Read articleExam Room · Architecture
Right-Sizing S3 Storage Classes Without Profiling (preview)
Forty terabytes of S3 across hundreds of buckets. Some datasets are read every morning; some are read once a quarter; some were written once and nobody has touched them since. The ops team has never run a per-bucket access profiler and isn't about to start. They want the bill down from $950 a month without hand-engineering a lifecycle rule per prefix. One S3 feature is designed for exactly that, and two others combine with it in the cases where it would waste money.
Read articleThe Greenbox Story · Faster Together
Strategic Alignment: The Session That Changed the Roadmap (preview)
Anika inherits a stalled logistics partnership that Maya set up months ago. Her instinct is to surface every risk in a spreadsheet. Lee helps her see that the same information, delivered differently, builds trust instead of breaking it.
Read articleExam Room · Architecture
Designing RDS for AZ Failover and Cross-Region Recovery (preview)
A SaaS team runs RDS PostgreSQL in a single AZ and wants two things at once: sub-minute failover when an AZ dies, and a recovery option in another region with under five minutes of data loss if the whole region goes. Neither problem has one clean answer on its own. The combination that does is a three-AZ cluster plus a replica waiting on the other side of the continent.
Read articleConsulting and Craft · Through the Kitchen
Mise en Place (preview)
Everything in its place before the first flame. Professional kitchens prep for hours before service. The ones that don't prep are the ones that burn the food.
Read articleUnder the Hood
How OAuth Works: Delegating Access Without Sharing Secrets (preview)
You click 'Sign in with Google' and somehow a recipe app gets access to your calendar without ever seeing your password. OAuth is the protocol that makes this possible — and its history starts with a blog platform and a frustrated engineer.
Read articleExam Room · Architecture
Auto-Scaling for Planned and Surprise Spikes (preview)
An e-commerce site has two kinds of spike to survive: Black Friday, which you can see coming months in advance, and the viral moment at 14:07 on a Tuesday, which you cannot. No single auto-scaling feature solves both without overspending the rest of the year. The correct answer layers four of them, and the interesting part is how they hand off.
Read articleThe Greenbox Story · Faster Together
On-Call and Incident Response: When the Pager Goes Off (preview)
The allergen incident was a wake-up call. The question isn't whether something will go wrong again — it's whether the team will be ready when it does.
Read articleConsulting and Craft · In Practice
Code Is Read More Than It Is Written (preview)
The Law of Demeter says don't chain through objects you don't own. The real reason isn't about object-oriented purity, it's that order.Customer.Address.City is a sentence that forces the reader to hold four concepts in their head to understand one line of code.
Read articleUnder the Hood · Trust
How Certificates Work (preview)
Every time your browser shows a padlock icon, a chain of trust is doing the work, from your bank's server, through intermediate authorities, up to a root certificate baked into your operating system. How that chain is built, why it sometimes breaks, and how Let's Encrypt changed everything.
Read articleThe Greenbox Story · Faster Together
API Contracts: Two Squads, One Direction (preview)
Perth and Melbourne have their own backlogs, their own sprint rhythms, and their own priorities. They keep surprising each other with dependencies nobody saw coming.
Read articleConsulting and Craft · In Practice
The Language of Tests (preview)
Your test says 'should return 200.' RFC 2119 says 'should' means 'there may exist valid reasons to ignore this.' If it's actually a hard requirement, your test has a specification bug, and the LLM that wrote it doesn't know the difference.
Read articleUnder the Hood
How Databases Actually Work (preview)
You write SQL and data appears. But between your query and the disk, there's a query planner, a buffer pool, a write-ahead log, and a concurrency control system that's been refined over forty years. PostgreSQL is the default for good reason — but knowing why helps you know when it isn't.
Read articleThe Greenbox Story · Faster Together
Wardley Mapping: Build, Buy, or Borrow? (preview)
Should Greenbox build its own delivery tracking or use a third party? When building is cheap, the calculus changes — but not always in the direction you'd expect.
Read articleExam Room · Advanced GenAI
How to Wire Function Calling Through Bedrock (preview)
An assistant that knows the answers but can't act on them is half a tool. Function calling, letting the model invoke tools we define, with arguments it chooses, turns understanding into action. Bedrock's Converse API has native tool-use support; so does Anthropic's Messages API via Bedrock; so do Bedrock Agents as a higher-level wrapper. Each exposes function calling through a different surface, and picking wrong makes the simple case hard.
Read articleThe Workshop
The Workshop: Decision Tables (preview)
A pattern for extracting a tangled business rule from a domain expert's head and laying it out as a conditions-and-outcomes grid, so the combinations nobody had thought of become visible before they become bugs.
Read articleUnder the Hood · Trust
How Encryption Works (preview)
Every time you send a message, buy something online, or check your bank balance, encryption is working in the background. But what is it actually doing? From Caesar's shifted alphabet to the elliptic curves protecting your bank account, the surprisingly beautiful mathematics of keeping secrets.
Read articleExam Room · Advanced GenAI
Caching LLM Responses Without Stale Answers (preview)
Thirty percent of the support assistant's queries are paraphrases of each other, 'how do I cancel?' 'can I cancel?' 'where's the cancel button?', and every one pays full model price. Caching LLM responses isn't as simple as hashing a prompt: exact-match, semantic, and prefix caching answer different questions, and getting the boundary wrong serves yesterday's answer to today's question.
Read articleThe Greenbox Story · Faster Together
Value Stream Mapping: Finding Where to Focus (preview)
Lead time has tripled and nobody knows why. Charlotte maps the actual flow of work from idea to production — and discovers the code takes two days. Everything else takes nineteen.
Read articleExam Room · Advanced GenAI
SageMaker JumpStart or Bedrock for the Same Model (preview)
A team that wants Llama 3.3 70B in production can take it through SageMaker JumpStart, pushing a button to deploy onto an endpoint they own, or through Bedrock's model catalog with per-token pricing and no infrastructure. Same model, sometimes the same base weights, two quite different operational shapes. The correct choice depends on how much of the serving layer you actually want to touch.
Read articleThe Workshop
The Workshop: DDD Modelling (preview)
Half a day with a clarified Event Storming wall, a whiteboard, and the correct people — to land on aggregates, bounded contexts, and a context map the team will actually re-read. Where Event Storming an Architecture produces the first cut of boundaries, this is the session that sharpens them.
Read articleUnder the Hood
How Git Works (preview)
Git isn't a file tracker with version numbers. It's a content-addressable object store with a directed acyclic graph on top. Understanding this changes git from a tool you fear into a tool you trust.
Read articleExam Room · Advanced GenAI
Keeping PII Out of LLM Prompts and Logs (preview)
A claims assistant that has to answer questions about a customer's claim while keeping the customer's name, address, policy number, and medical details out of training data, out of logs, and out of anything a subpoena could touch later. PII redaction isn't one knob, it's four or five, in different places, each covering a different leak path. Comprehend, Bedrock Guardrails, custom redaction, and the prompt itself each handle a different shape of the problem.
Read articleThe Greenbox Story · Growing Pains
Developer Onboarding: Ramping Up Without Slowing Down (preview)
Three new developers start in three consecutive weeks. By week four, the existing team is spending more time explaining things than building them, and the new people still feel lost.
Read articleExam Room · Advanced GenAI
Provisioned Throughput vs On-Demand Bedrock (preview)
On-demand Bedrock is priced per token, throttled per minute, and latency-variable in a way that product hates. Provisioned Throughput buys predictability with a monthly commitment. The break-even between the two isn't obvious, and the answer changes with the shape of the workload, a 24/7 assistant and a burst-heavy report generator land on opposite sides of the same line.
Read articleThe Workshop
The Workshop: C4 Modelling (preview)
A simple set of nested diagrams — Context, Container, Component, Code — that turns a wall of sticky notes and tribal knowledge into an architecture artefact that survives outside the room.
Read articleUnder the Hood · Trust
How Identity Works (preview)
You prove who you are dozens of times a day, passwords, fingerprints, face scans, little codes texted to your phone. But what does 'proving identity' actually mean? From passports to passkeys, the surprisingly deep question of how we know who's who.
Read articleExam Room · Advanced GenAI
Picking an Embedding Model for Retrieval (preview)
An index with 20 million chunks, queries that need to work in English, Spanish, Portuguese, and Japanese, and a budget that won't bear re-embedding every six months when someone decides the new model is better. The embedding model quietly caps what a retrieval system can ever do. Titan, Cohere, and a self-hosted model all trade different things, and the comparison is messier than the marketing suggests.
Read articleThe Greenbox Story · Growing Pains
Hiring Mistakes: The Hire Who Doesn't Work Out (preview)
Jordan Chen's CV was perfect. The technical interview was flawless. Six weeks in, the team is quieter, the code is cleverer, and nobody can explain why things feel worse.
Read articleExam Room · Advanced GenAI
When a Document Won't Fit the Context Window (preview)
A 400-page contract, a 200-page policy manual, and a legal team asking 'what clauses govern refund disputes across both?' A 200,000-token context window sounds like enough until you realise what goes in with the documents. Chunking, map-reduce, hierarchical summarisation, and sliding context windows each answer a different question, and getting the boundaries correct is most of the battle.
Read articleThe Workshop
The Workshop: Event Storming an Architecture (preview)
Event Storming at its most precise. Take a Process Level wall and a team of developers; leave with aggregates, bounded contexts, and a design the room actually believes in. The session that turns 'we understand the flow' into 'we know where the code goes'.
Read articleUnder the Hood
How Email Works (preview)
SMTP, MX records, SPF, DKIM, DMARC, spam filters, and email headers, how the internet's oldest killer app actually delivers a message, and why it's still terrible and still indispensable.
Read articleExam Room · Advanced GenAI
Streaming Responses to Cut First-Token Latency (preview)
A chat interface where users wait four seconds for any response on long generations, abandon rate creeping up, product asking why we can't do the typing-animation thing that every other assistant does. Streaming isn't just a UX polish, it changes how the entire response path has to work, from the SDK call through API Gateway to the browser, and each hop has its own way of getting it wrong.
Read articleThe Greenbox Story · Growing Pains
Service Design: Customer Support at Scale (preview)
At two hundred subscribers, Sam answered every email personally. At three thousand five hundred, the inbox is a wall of fire and Sam hasn't slept properly in weeks.
Read articleExam Room · Advanced GenAI
Importing Custom Weights into Bedrock (preview)
A research team has fine-tuned an open-weights model for medical-notes summarisation on a private SageMaker cluster. The resulting weights live in S3; the production runtime wants Bedrock's ergonomics. Custom model import bridges that gap, but it only works for certain base architectures, comes with throughput minimums, and quietly changes the cost model compared to on-demand foundation models.
Read articleCooking · Mains
Pork and Fennel Ravioli, Take Two (preview)
The second run at pork and fennel ravioli, fixing what the first batch taught me: 500g of pork for six or seven generous parcels each, the pasta rolled properly thin so it cooks through, and a brown butter lifted with lemon and toasted hazelnuts. Roasted cherry tomatoes on the side bring a sweet-sharp contrast. Serves four.
Read articleConsulting and Craft · Through the Kitchen
Resting the Meat (preview)
You take the steak off the heat and the hardest part begins: doing nothing. Most of what I've learned about shipping software, I learned from a piece of resting beef.
Read articleUnder the Hood · Trust
Why Trust Is Hard (preview)
The fundamental problem of the digital age isn't speed or storage or bandwidth. It's trust. How do you verify identity, prevent impersonation, and establish confidence between strangers who will never meet? Humans have been working on this for thousands of years, and the internet broke most of what we'd figured out.
Read articleExam Room · Advanced GenAI
How to Cut a Bedrock Bill Without Hurting Quality (preview)
A Bedrock bill that doubled in two months, a product roadmap that blames the retrieval service, a finance partner who'd like a straight answer. The cheapest token is the one you don't send; the next cheapest is the one you send to the correct model. Model routing, prompt compression, cached retrieval, and provisioned throughput each solve a different slice of the cost problem, and none of them is the silver bullet.
Read articleThe Greenbox Story · Growing Pains
Capacity Planning: The Seasonal Crunch (preview)
Winter arrives and half of Dave's crops don't. The Greenbox team discovers that a subscription product built on fresh produce has a supply problem nobody modelled.
Read articleExam Room · Advanced GenAI
How to Build a Multi-Modal Bedrock Assistant for Insurance Claims (preview)
A claims-processing assistant that reads a scanned invoice, listens to a voicemail, answers the customer's question in plain text, and, if asked, reads it back. Four modalities, one conversation. The model choice, the orchestration shape, and the ways different inputs fail each push the architecture in different directions, and the naive 'just use a multi-modal model' misses half of where the real work is.
Read articleConsulting and Craft · In Practice
Technical Debt Is a Loan, Not a Crime (preview)
Calling something 'tech debt' is usually an accusation. It shouldn't be. Used deliberately, debt is a tool: ship something imperfect, learn from the market, pay it back with what you learned. Used carelessly, it compounds until the interest swallows the principal.
Read articleUnder the Hood
Past the Ten-Second Wall (preview)
Past the ten-second wall the user is gone unless you do something about it. There are two patterns for keeping them: narrate the work as it happens, or release them and come back. This post is about implementing both, the wire formats, the server shapes, the client shapes, and the trade-offs each one makes.
Read articleExam Room · Advanced GenAI
Evaluating LLM Output With Bedrock Eval Jobs (preview)
Two thousand historical support tickets, a summarisation prompt, a new model candidate, and a product manager asking whether switching would hurt quality. Bedrock evaluation jobs offer automated scoring, human review through Ground Truth workflows, and model comparison side by side, but they answer different questions, and getting the right number out of the right job matters more than running more of them.
Read articleThe Greenbox Story · Drawing the Lines
When a Container Earns Its Own Zoom (preview)
The substitution engine got built. It works. But it's complicated enough that Ravi can't explain it to a new joiner, and the C2 diagram just shows it as a single box called Supply Matching. Time for the level of C4 the team has been carefully not drawing until now.
Read articleExam Room · Advanced GenAI
How to Manage Prompts Across Thirty Services on Bedrock (preview)
One prompt scattered across thirty services, no versioning, no tests, drift between the copy in the code and the copy in the docs, a silent regression when somebody changed 'concise' to 'brief' and retention on one response tanked. Prompt engineering at a hundred callers isn't prose discipline, it's configuration management. Bedrock Prompt Management, Git-backed templates, and parameterised prompts each solve a slice of the same problem.
Read articleConsulting and Craft · Through the Kitchen
The Probe in the Meat (preview)
I own four kitchen thermometers, each for a different kind of cooking, and the differences between them are the differences between the kinds of decisions they help me make. Most of what I've learned about observability, I could have learned from a brisket, a vat of hot oil, a batch of caramel, and a steak.
Read articleCooking · Mains
Cod, Prawns & Fennel (preview)
Cod and king prawns poached in a pan of slow-sweated fennel and leek with white wine and crème fraîche. Twenty-five minutes, one pan, four servings.
Read articleUnder the Hood
The Transformer Attention Budget (preview)
Three numbers govern every interactive interface ever built, 0.1, 1, and 10 seconds. They are fifty years old. They predate the personal computer. They have survived every revolution in how software gets to a human, because they describe the human, not the software. With LLMs, the numbers do not stop being true. They bend in ways the original numbers never anticipated.
Read articleExam Room · Advanced GenAI
Picking a Vector Store for Bedrock RAG (preview)
Twelve million embedding vectors, a 50ms retrieval budget, hybrid queries that mix keyword and semantic, and a bill that should not double the Bedrock spend on its own. OpenSearch Serverless, Aurora with pgvector, and Pinecone Serverless all serve the same shape of query, but their pricing curves, operational shapes, and query surfaces diverge the moment the corpus grows beyond demo scale.
Read articleThe Greenbox Story · Drawing the Lines
Architecture Decision Records: Why We Did It That Way (preview)
With LLMs writing more code and new people joining the team, nobody can explain why things are the way they are. Charlotte introduces ADRs — the institutional memory the codebase desperately needs.
Read articleExam Room · Advanced GenAI
How to Wire an LLM to Side-Effecting Actions with Bedrock Agents (preview)
An assistant that has to look up a customer's subscription, pause it, refund a charge, and email confirmation. Not just answer, act. The glue between a language model and the rest of our systems is a solved problem three different ways: Bedrock Agents, a LangChain agent loop, or a hand-written tool router. Each of them handles tool definition, invocation, and error recovery, but they put the guardrails in very different places.
Read articleCooking · Sweet Treats
Tarte Tatin (preview)
Apples caramelised in butter and sugar in a heavy ovenproof pan, lidded with a disc of rough puff, baked until the pastry is gold and the apples are deep amber. Flipped onto a serving plate while still warm. A pudding with a dramatic reveal and three ingredients of consequence (the apples, the caramel, the pastry), which is most of why it works.
Read articleCooking · Sweet Treats
Lunchbox Popcorn (preview)
A bag of plain kernels is the cheapest snack in the cupboard and one of the best for a lunch box: pop a batch on the stove, flavour it three ways, and you have a week of crisp, nut-free fillers that don't go to mush by recess. The base is just kernels, a little oil and a lid. The flavours are a sweet honey-cinnamon, a cheesy savoury one, and a cocoa dusting the kids think is a treat. Make a big batch on Sunday, keep it airtight, and portion it as you pack.
Read articleUnder the Hood · The AI Field Guide
When Not to Use an LLM (preview)
Nine posts in, the question this series has been circling: when is an LLM the wrong tool? A field-guide-style decision walk through the alternatives, encoder models, classical NLP, rules, search, logic, probability, with the symptoms that point to each, and the symptoms that point back to LLMs.
Read articleThe Workshop
The Workshop: Prioritisation (preview)
A 60-to-90 minute pattern for picking the next order of work when the backlog has outgrown the room’s memory. MoSCoW, RICE, Cost of Delay, value-vs-effort — not as rival religions, but as lenses chosen to fit the constraint. Silent scoring, reveal, argue the fringes, land the order.
Read articleUnder the Hood
How Containers Work (preview)
Containers aren't tiny virtual machines. They're processes with boundaries — namespaces for isolation, cgroups for resource limits, and layers for efficient storage. Understanding what's actually happening changes how you use them.
Read articleThe Greenbox Story · Drawing the Lines
Decision Tables: The Substitution Engine in Go (preview)
Maya's substitution rules are captured in decision tables. The LLM generated a thousand lines of Go and four hundred tests from them. This is what that code actually looks like.
Read articleThe Greenbox Story · Drawing the Lines
Decision Tables: Making Maya's Brain Explicit (preview)
The substitution rules Maya carries in her head now need to be captured for the new team. Decision tables express complex multi-condition logic that LLMs can implement comprehensively.
Read articleCooking · Bread
Sourdough Boule (preview)
A 900g sourdough country boule, mostly strong white flour with a handful of wholemeal for character, 75% hydration, three sets of stretch-and-folds, a long bulk on the counter, an overnight cold retard in the fridge, baked Sunday morning in a preheated dutch oven. Pairs excellently with cream of mushroom soup: bake the boule Sunday, ladle the soup Monday.
Read articleCooking · Mains
Cream of Mushroom Soup (preview)
A proper mushroom soup: a kilo of mushrooms browned hard, a handful of dried porcini for depth, sweated onions and garlic, a measure of whisky in the deglaze, blended just enough to be silky without losing the bite. The base freezes in single-serve portions; the cream goes in at reheat so it never splits. Pairs excellently with a sourdough boule baked the day before.
Read articleCooking · Breakfast
Breakfast Smoothie (preview)
A two-glass breakfast smoothie built on banana and frozen fruit, with oats for staying power and yoghurt for body. It carries a scoop of creatine each without a hint of chalk; the frozen fruit and banana hide it completely. Sixty seconds in the blender and you're out the door.
Read articleExam Room · Advanced GenAI
Building RAG When the Source Documents Change Daily (preview)
A support assistant that has to answer from a product manual which the product team edits weekly, a pricing sheet that changes at month-end, and an operational runbook that mutates hourly. The base model doesn't know any of it, and fine-tuning won't keep up. Retrieval is the answer; the question is how much of the retrieval plumbing we want to own, and Bedrock Knowledge Bases, a LangChain stack, and a hand-rolled pipeline each put the lines in different places.
Read articleCooking · Sides
Refrigerator Dill Pickles (preview)
A 1L jar of dill pickles built in fifteen minutes Wednesday evening, ready to eat by Saturday and at their best by the following weekend. Vinegar brine rather than lactofermentation: quicker, sharper, fridge-stable for a month, and pleasantly crunchy when sliced thin onto a burger. Lacto pickles are a different project.
Read articleCooking · Sides
Burger Sauce (preview)
A proper home-made burger sauce: good mayonnaise, a smaller hit of ketchup than you'd expect, yellow mustard for tang, pickle brine for acid, and smoked paprika for the colour and a faint backnote that pairs with anything off a fire. Five minutes of whisking, twenty-four hours in the fridge to come together, and a fortnight of fridge life. Roughly 250ml from one bowl.
Read articleCooking · Mains
Smoked Brisket-Chuck Patties (preview)
Six 170g patties from a 50:50 brisket-and-chuck blend, ground at home through a 4.5mm plate, twice. Smoked on the offset at 110°C over oak until the centres hit 47°C, then seared hard in cast iron with cheese under a lid for the last thirty seconds. Two passes of heat for two different jobs: the smoke does the flavour, the sear does the crust.
Read articleCooking · Bread
Milk Rolls for Burgers (preview)
Eight soft milk rolls built on a tangzhong base (a small flour-and-milk paste cooked to a gel before it joins the dough), which lets the loaf hold roughly 10% more liquid than it otherwise could. The extra hydration is the trick: the rolls stay pillowy for two days, and the crumb has enough structure to take a smoked patty, melted cheese, sauce, and pickles without going pasty.
Read articleCooking
A Burger from Scratch (preview)
A burger built from the ground up over a week, with four recipe cards behind it: pillowy milk rolls with a tangzhong crumb, smashed patties ground from brisket and chuck and smoked on the offset, a proper home-made burger sauce, and refrigerator dill pickles that have had three days to come to themselves. The cheese is the only thing I bought.
Read articleUnder the Hood · The AI Field Guide
Bayesian Reasoning (preview)
Probability is the third leg of classical AI. Bayesian networks, Kalman filters, particle filters, multi-armed bandits. The mathematics that fuses sensor noise into a position estimate, that decides whether to show a user this ad or that one, that diagnoses faults from intermittent symptoms.
Read articleConsulting and Craft · In Practice
The Price of Everything (preview)
Pricing isn't a finance decision; it's a product decision. The journey from a gut-feel launch number to validated unit economics is one of the hardest things a small team has to do, and one of the most consequential.
Read articleUnder the Hood
How Estimation Works (And Why It Doesn't) (preview)
The cone of uncertainty, the planning fallacy, story points, throughput-based forecasting, and Monte Carlo simulation, why software estimation is so hard, and what actually works.
Read articleExam Room · Advanced GenAI
Spreading Bedrock Load with Cross-Region Inference Profiles (preview)
A Bedrock-backed SaaS serving US, EU and APAC customers is hitting regional quota in us-east-1 during peak while the same model sits idle in eu-west-1. The team wants to spread load without fracturing the product into three regional deployments. Three letters on the front of the model ID do the job — if the model supports it and the geography fits the customer.
Read articleThe Greenbox Story · Drawing the Lines
Drawing the System: From Event Storm to C4 (preview)
The wall told the team where the boundaries were. But the wall lives in one room, and the team doesn't. Charlotte introduces C4 — the diagrams that let anyone who wasn't in the room understand what Greenbox actually is.
Read articleExam Room · Advanced GenAI
Configuring Bedrock Guardrails for PII, Topics, and Grounding (preview)
A consumer-facing chatbot on Bedrock has passed every red-team round on the obvious harms — no weapons, no hate, no CSAM — and is still shipping embarrassments: a card number pasted by one user echoing back in a reply, the bot cheerfully comparing the company's product with a named competitor, and a hallucinated policy line that nobody in the building wrote. Five different filter jobs wrap the same Bedrock invocation, and Guardrails is the one surface that does all five without five Lambdas.
Read articleCooking · Sweet Treats
Mille-Feuille aux Framboises (preview)
Three layers of caramelised rough puff with vanilla pastry cream and fresh raspberries between. The classic mille-feuille (thousand-leaf pastry, cream, fruit) with the simplest fruit in the calendar. A weekend dessert: pastry and cream on Saturday, bake and assemble on Sunday.
Read articleCooking · Pastry
Croissants and Pain au Chocolat (preview)
A yeasted laminated dough, three calendar days of slow work, and a freezer full of Monday-morning pastries. Two dozen from a single batch, twelve croissants and twelve pain au chocolat, shaped on Sunday and pulled from the freezer one at a time to thaw and proof overnight in a cool kitchen, then baked fresh whenever the coffee asks for one.
Read articleUnder the Hood · The AI Field Guide
Knowledge, Logic, and Constraints (preview)
Reasoning from facts and rules, the way the AI textbooks of the 1980s thought we'd build minds. Propositional and first-order logic. Knowledge bases. Datalog. Modern descendants. SAT solvers, SMT solvers, and the production rule engines running insurance and banking. When if-this-then-that wins.
Read articleCooking · Mains
Surf and Turf (preview)
The restaurant cliché made better at home: a thick sirloin brought to an even medium-rare in the water bath and then seared hard, king prawns and scallops caught quickly in the same pan, and a disc of tarragon butter, a faux béarnaise, melting over the lot. The butter does the work of a sauce without the fuss: tarragon, shallot and a little vinegar beaten into soft butter and chilled a day or two ahead. Serves four, with triple-cooked chips and a sharp salad to cut the richness.
Read articleCooking · Sides
Triple-Cooked Chips (preview)
The chip worth the name: floury potatoes simmered until their edges crack, dried, then fried twice, once low to cook the inside, once hot to glass the outside. It sounds like a project, but most of it is make-ahead and hands-off, and the two fries are quick. A stockpot and a thermometer are all the kit you need. Enough for four alongside a steak.
Read articleCooking · Mains
Mushroom Galette (preview)
A free-form rough-puff tart built for a Friday night: chestnut mushrooms cooked down hard with caramelised shallots, thyme and garlic, piled onto a ricotta-and-parmesan base and folded into a craggy galette. The pastry shatters, the mushrooms go deep and savoury, and it comes together while the kids' freezer food is in the oven. Serves two adults with a green salad.
Read articleCooking · Mains
Pasties (preview)
Twenty-four pasties from a long weekend bench session: creamy chicken in béchamel, beef in rich brown gravy, and cheese and onion, eight of each. All folded as triangles, marked by one, two, or three steam slits so the fillings are tellable apart in the freezer and the lunchbox. Frozen raw and baked two or three at a time across the week, with cheese twists made from the puff offcuts as the day-one snack.
Read articleCooking · Sides
Homemade Ricotta (preview)
Fresh ricotta from scratch in twenty minutes with three ingredients: whole milk, a splash of cream, and acid to set the curd. Strictly it's a fresh acid-set cheese rather than true whey ricotta, but it's soft, sweet and miles better than the tub, and you control how wet or firm it ends up by how long you drain it. Made for the mushroom galette, but just as good on toast with honey or folded through pasta.
Read articleCooking · Breakfast
Bircher Muesli (preview)
The classic make-ahead breakfast: rolled oats soaked overnight in milk and apple juice with grated apple, yoghurt and a squeeze of lemon, soft and spoonable by morning. Mixed the night before, it carries a scoop of creatine without a hint of grit, the powder dissolves into the liquid while you sleep. Makes two bowls and scales up for the week.
Read articleThe Greenbox Story · Drawing the Lines
Domain-Driven Design: The Anti-Corruption Layer in Go (preview)
When one context needs to talk to another, the temptation is to import its types. The anti-corruption layer says no, translate at the boundary, and let each context own its own language.
Read articleThe Greenbox Story · Drawing the Lines
Domain-Driven Design: Events Across Boundaries in Go (preview)
Bounded contexts don't help if they can't talk to each other. Domain events are how the Subscription context tells Billing that something happened, without knowing Billing exists.
Read articleCooking · Mains
Fish Chowder (preview)
A proper bowl of comfort: smoked cod and firm white fish poached gently in a milk-and-cream broth thick with potato and sweetcorn, crisp pancetta scattered over the top. Everything comes off the Woolworths seafood cabinet, nothing fancy. Serves four with crusty bread, fifteen minutes of prep and not much longer on the stove.
Read articleThe Greenbox Story · Drawing the Lines
Domain-Driven Design: Modelling the Subscription Context in Go (preview)
Charlotte said draw the boundaries. Tom said fine. Now someone has to write the code. This is what DDD looks like in Go, value objects, entities, aggregates, and the type system doing half the work.
Read articleThe Greenbox Story · Drawing the Lines
Domain-Driven Design: Drawing the Boundaries
The codebase is becoming a monolith. Charlotte helps the team define bounded contexts so the architecture matches the business — and so LLMs generate code that fits.
Read articleCooking · Mains
One-Pot Fry-Up
The whole fry-up in a single pan: chorizo rendered down, potatoes fried crisp in the spiced fat, mushrooms and green capsicum browned, two tins of baked beans warmed through, and eggs baked into wells in the top until the whites set and the yolks stay soft. A grown-ups' lunch that goes from pan to table without a sink full of dishes.
Read articleExam Room · Advanced GenAI
Designing Short-Term and Long-Term Memory for a Bedrock Chat Assistant
A customer-support assistant where the average conversation runs fifteen turns before it resolves, and returning users pick up two weeks later expecting the bot to remember they've been waiting on a refund. Two memory problems in one product — what's live in the current conversation and what persists across visits — and four plausible ways to build it. Bedrock Agents' built-in memory handles one half cleanly; the other half is where teams reach for DynamoDB or a knowledge base and get it wrong.
Read articleCooking · Sweet Treats
Super Thick Hot Chocolate
Spanish-style thick drinking chocolate, four small cups from 200g of 70% dark chocolate, 800ml of whole milk, and a cornflour slurry. Fifteen minutes.
Read articleCooking · Mains
Pork and Fennel Ravioli
Fresh egg pasta filled with a pork-and-fennel sausage mince you grind yourself from shoulder, so the toasted fennel, garlic and lemon work right through the meat. About thirty-six ravioli from a three-egg dough and 350g of pork, nine apiece for four. Dressed with brown butter and crisp sage. An autumnal Sunday lunch, with any spare ravioli freezing raw on a tray for a Tuesday night that needs rescuing.
Read articleCooking · Sweet Treats
Scottish Tablet
Scottish tablet is sugar, condensed milk, butter and milk boiled to the soft-ball stage and then beaten as it cools until it grains. That graining is the whole point: tablet isn't fudge, it's harder, drier, and breaks into a brittle, melt-on-the-tongue crumb rather than bending. A sugar thermometer and a sturdy wooden spoon are the only special kit. The boil is quick, the beating is the arm work, and the squares keep for weeks in a tin.
Read articleCooking · Mains
Pizza Night
A thin-crust dough scaled for a crowd: six 12-inch bases from one batch, raised with a pinch of commercial yeast while the sourdough starter is still finding its feet. Stretch it thin, spread a good canned sauce, top with whatever the week left in the fridge, and bake as hot as the oven will go. The dough wants a slow cold rise, so mix it the day before.
Read articleCooking · Sweet Treats
Cherry Pie
A double-crust pie built around a bag of frozen pitted cherries. The trick with frozen fruit is the water it sheds: thaw the cherries in a colander, catch the juice, boil it down to a glossy syrup with the sugar, and thicken that rather than the raw fruit. The cherries go back in barely cooked, the pie bakes on a screaming-hot tray so the base sets crisp, and then it cools all the way down so the filling slices instead of running. All-butter pastry made the day before, fruit prepped while the oven heats.
Read articleCooking · Pastry
Pâte Brisée
Plain all-butter shortcrust, the workhorse pie and tart pastry, no sugar to speak of, made by hand or in the processor in ten minutes. Rubbed coarse so flakes of butter survive into the bake, which is what gives it both tenderness and a little flake. This card makes the full double-crust quantity for a 23cm pie; halve it for a single blind-baked shell. The pastry behind the cherry pie, quiches, custard tarts, and any pie that wants a sturdy base and a lid.
Read articleUnder the Hood · The AI Field Guide
Search and Planning
Most of what people called 'AI' before deep learning was search. A* finding the route, alpha-beta playing the chess move, STRIPS sequencing the plan. The algorithms that run your map app, your build system, your warehouse robot, and your game opponent.
Read articleThe Workshop
The Workshop: Business Model Canvas
A pattern for laying out how a business creates, delivers, and captures value on a single page of nine boxes, filled in customer-first order, so the team can see whether the whole thing actually holds together.
Read articleUnder the Hood · Time
Time Is Wrong Everywhere All at Once
Your laptop's clock is wrong. So is the server's. So is every other computer on the network. The question isn't how to make them agree, it's what to do when they can't. From Lamport clocks to Google Spanner, the story of time in distributed systems.
Read articleExam Room · Advanced GenAI
Combining RAG and Fine-Tuning for a Legal Contract Assistant
A legal-tech team wants a contract review assistant that understands two hundred thousand past matters, speaks in the firm's voice with clause-by-section citations, and refuses anything off-domain. A hundred thousand dollars, three months. RAG, fine-tuning, and continued pre-training each solve a different half of that sentence — and the interesting answer is which two to pick, not which one.
Read articleCooking · Sweet Treats
Boterkoek
A Dutch butter cake: equal-ish weights of flour, butter, and sugar pressed into a tin and pulled out of the oven while the centre is still slightly underdone.
Read articleExam Room · Advanced GenAI
How to Build a Citations-Required RAG Over 50K Internal Documents
Fifty thousand internal documents, five gigabytes of text, weekly churn, a three-second latency budget, per-user access control, and a citation in every single answer. The RAG landscape on Bedrock is bigger than one product and the interesting part of the design is what falls away once you name the five things that actually decide it.
Read articleCooking · Sweet Treats
Giant Marmalade Choux Buns
Four large choux buns under a crackled orange craquelin lid, filled through the base with marmalade chantilly around a hidden core of marmalade. Left whole, so the inside is a surprise. Start to finish in about ninety minutes, no overnight rests; a showstopper you can decide on at five and serve at half six.
Read articleUnder the Hood · The AI Field Guide
Rules, Grammars, and Regex
Sometimes the correct answer to 'what model should I use?' is no model at all. Hand-written rules, regular expressions, finite-state transducers. They're deterministic, auditable, free at inference, and frequently the correct tool for the job.
Read articleConsulting and Craft · Through the Kitchen
The Salt in the Dish
There are four kinds of salt in my kitchen drawer and they are not interchangeable. This post is about salt. Most of it is actually about dependencies, and about the discipline of knowing what you're putting into the dish before you put it in.
Read articleUnder the Hood · Time
Why Does Thursday Last Forever?
A watched pot never boils. Holidays vanish. Childhood lasted decades and your thirties were a weekend. The clock doesn't care, so why does your brain?
Read articleExam Room · Advanced GenAI
Picking a Bedrock Model for High-Volume RAG
A million LLM requests a day, peaking at thirty per second, split across US and EU customers, with a P99 first-token target under 1.5 seconds and real reasoning over retrieved context. Bedrock has seven model families and four ways to buy capacity. Most of the landscape falls away once you name what actually decides it, and the real trick is what you do *after* you've picked the model.
Read articleThe Greenbox Story · Finding the Fit
Prioritisation: What Changes First
The team knows what's wrong — the value prop, the pricing, the unit economics. They can't fix everything at once. Now they need to decide what changes first.
Read articleCooking · Sweet Treats
Crème Anglaise
A pouring custard tuned for puddings that already bring their own sweetness.
Read articleCooking · Sweet Treats
Apple Crumble
Apples chopped chunky by hand, a crumble lid rubbed together with cold fingers a minute later, and both into the fridge in separate tubs until baking. Forty-five minutes in a hot oven the next evening and pudding's done while we eat. The topping never touches the wet fruit until the oven, so the crumb stays dry and the bake comes out crisp.
Read articleExam Room · GenAI
Choosing Between SageMaker, Bedrock, and Purpose-Built AI APIs
A platform team has five AI-shaped requests landing in a single sprint: transcribe call centre audio, detect anomalies in sensor data, extract text from scanned forms, summarise customer emails, and detect faces in CCTV. Someone has already typed 'use SageMaker' into three design docs. Someone else insists Bedrock is the answer. A third voice mutters about purpose-built services. AWS has at least three answers to every AI problem, so there's no single platform that wins; what matters is how to tell which layer of the stack each request lands on, and what that choice costs in time, money, and flexibility.
Read articleUnder the Hood · The AI Field Guide
The Boring Baseline That Wins
TF-IDF, logistic regression, naive Bayes, k-means, LDA. The fifty lines of scikit-learn that beat your fancy model on the small problem you actually have. Why these baselines still win, and why the correct starting point in 2026 is often the same as it was in 2006.
Read articleUnder the Hood
A Gentle Guide to Typography: From Chisels to Character Sets
A warm, thorough walk through the world of typography, from hand-carved letters and Gutenberg's press to fonts, glyphs, kerning, serifs, and Unicode. Everything you wanted to know about how written language gets its shape.
Read articleUnder the Hood · Time
The Clock Inside You
Your body has its own clock, and it doesn't care about UTC. It runs on light, adenosine, and a cluster of 20,000 neurons behind your eyes. Jet lag, shift work, larks and owls, the biology of time is a different machine from the physics, and it keeps its own hours.
Read articleExam Room · GenAI
Choosing Between Chains, Retrieval, and Agents for a GenAI Assistant
A product manager wants a 'GenAI assistant' for internal operations, something that can answer questions, look up customer records, draft emails, and file Jira tickets. Three architectural patterns keep coming up: chains, retrieval, and agents. They sound similar, they all use foundation models, and teams routinely reach for the most elaborate one when a simpler pattern would do. There's no single 'best' here; what matters is which one fits each piece of the assistant's workload, and when elaboration costs more than it earns.
Read articleThe Greenbox Story · Finding the Fit
Pricing Experiments: The Right Box at the Right Price
Greenbox has been charging $25 for the small box and $45 for the large box since day one. Nobody complained. Maya thought that meant the prices were right. A conversation with Lee at the kitchen table suggests otherwise.
Read articleCooking · Mains
Honey Soy Chicken Wings
Thirty minutes from packet to plate. Wings seared hard in a heavy pan until the skin crackles, then rolled through a sticky soy-honey glaze with ginger and garlic, served on jasmine rice with a scatter of spring onion and sesame. One pan, one rice pot, two bowls.
Read articleExam Room · GenAI
How to Make a Bedrock Chatbot Audit-Ready with Guardrails and Watermarks
A fintech ships a customer-facing chatbot on Bedrock. Legal asks: can it give financial advice? Risk asks: can it leak customer account numbers? Compliance asks: if an auditor requests proof a response came from our model, can we demonstrate it? Three questions, three different controls, all of them Bedrock-native. The controls exist; the work is matching the right one to each question and figuring out what the shape of a 'responsible AI' configuration actually looks like when the auditor arrives.
Read articleCooking · Sweet Treats
Sourdough Discard Choc Chip Cookies
Brown-butter chocolate chunk cookies with a hundred grams of sourdough discard folded through the dough: chewy edges, a centre that stays gooey, and just enough tang from the starter to keep them from cloying. The whole batch is built to scoop, freeze raw, and bake one or two from frozen on the night you want them.
Read articleCooking · Sweet Treats
Poires Pochées
Whole pears simmered in a vanilla-lemon syrup until a knife slides in like cutting butter, then cooled in the same syrup overnight so the flavour reaches the core. Beurre Bosc is the May default in Perth, firm enough to hold their shape through a long poach. Standalone or sliced into a mille-feuille.
Read articleCooking · Pastry
Cheese Twists
Rough puff rolled thin, dressed with grated parmesan and a turn of black pepper, folded over and rolled again, sliced into strips and twisted. Baked until deep gold and shattery. The quickest way to turn a batch of pastry into an afternoon snack: twenty-five minutes from fridge to plate, eaten warm with a glass of something cold.
Read articleCooking · Pastry
Crème Pâtissière
Vanilla pastry cream cooked on the stove in twelve minutes: milk, yolks, sugar, cornflour, vanilla. The engine of half of French pastry: the inside of an éclair, the filling under fruit tarts, the cream between the layers of a mille-feuille. Makes about 600g.
Read articleCooking · Pastry
Rough Puff Pastry
Quick puff: four turns instead of six, cubes of butter rolled into the dough instead of a hand-shaped butter block. Most of the flake, half the work. The pastry behind mille-feuille, palmiers, sausage rolls, and anything that wants crisp golden layers without a Friday-night beurrage.
Read articleCooking
A Weekend in the Kitchen
Three things from a busy weekend in the kitchen, each with its own recipe card: a tray of cheese twists from a fresh batch of rough puff, a saucepan of pears poached in vanilla-lemon syrup, and a freezer bag of cookie dough balls that bake one or two at a time when wanted.
Read articleUnder the Hood · The AI Field Guide
Before the Transformer
n-grams. HMMs. CRFs. The language models and sequence taggers that ran the internet before deep learning, and that quietly still do, in autocomplete, spam filters, biomedical NER, speech recognition. What they are, why they still ship, and when they're the correct answer.
Read articleThe Workshop
The Workshop: Assumption Mapping
A pattern for surfacing the beliefs hiding underneath a plan, plotting them on an evidence/impact grid, and deciding which ones to test before you commit. Facilitator's playbook, failure modes, and what to do with the grid afterwards.
Read articleUnder the Hood · Time
Can You Turn Back Time?
General relativity permits time loops. Quantum mechanics hints that the future can influence the past. Hawking threw a party for time travellers and nobody came. The physics of time travel is stranger, and more serious, than science fiction suggests.
Read articleCooking · Preserves
Sweet-Orange Marmalade
A fast one-day marmalade: whole oranges and lemons rough-chopped in the blender, simmered with jam sugar and plain sugar for about an hour. No overnight soak, no separate pectin bag. Just citrus, sugar, heat.
Read articleExam Room · GenAI
Forecasting Without Writing Python
A category manager has 18 months of weekly sales data for 400 SKUs and a deadline to forecast next quarter. She doesn't code. The ML team is booked until Q3. The ask is a tool that lets her build a forecast herself, importable, reviewable, explainable, without waiting for engineering. Which AWS box she clicks matters less than what kind of problem this actually is, what features of the data can honestly feed into a model, and what the business user has to understand for the output to be defensible when finance asks ''why this number?''.
Read articleThe Greenbox Story · Finding the Fit
Business Model Canvas: Does This Actually Work?
Maya needs to convince investors that Greenbox can reach 1,000 subscribers. But when the team maps the business model, they discover the unit economics don't add up — and Lee admits he's out of his depth.
Read articleExam Room · GenAI
Grounding a Chatbot in Your Own PDFs
A facilities team has 600 PDFs, equipment manuals, safety procedures, maintenance schedules, sitting on a SharePoint drive. Engineers want a chatbot that answers 'how do I reset the chiller on floor 4?' in seconds instead of a ten-minute PDF hunt. Retrieval-augmented generation can do this; whether it does it well depends on what the corpus actually looks like, what kinds of questions the engineers really ask, and which configuration knobs decide whether the answers are any good once a managed service is on the table.
Read articleCooking · Sweet Treats
Lemon Meringue Pie
A blender pâte sucrée scented with lemon zest, a smooth lemon curd, and an Italian meringue with peaks blowtorched gold. A weekend pie, two component recipes assembled across Saturday and Sunday.
Read articleCooking · Mains
Fish Pie
Rick Stein's Cornwall fish pie, with Australian fish. White fish, smoked fish, prawns under a buttered mash, baked until the top is gold.
Read articleCooking · Pastry
Pâte Sucrée
Sweet shortcrust pastry made in the food processor in five minutes, perfumed with lemon zest, ready for any tart shell that needs a crisp golden base. The pastry behind the lemon meringue and a dozen other things.
Read articleCooking · Preserves
Lemon Curd
A bright, smooth lemon curd cooked over a pan of water, set with cold butter, and good for everything from a tart filling to a swirl through yoghurt. Makes about 600g.
Read articleUnder the Hood · The AI Field Guide
After the Transformer
Transformers have ruled language modelling for nearly a decade. They have a known weakness, and several research lines are trying to replace them. Mamba, RWKV, RetNet, Hyena, diffusion-for-text, what they are, what they fix, and which ones are likely to matter.
Read articleThe Workshop
The Workshop: Jobs to be Done
A half-day pattern for finding out what your customers actually hired your product to do. Switch interviews, the four forces, shaping the material into job statements, and what to do when the room wants to list features instead.
Read articleUnder the Hood · Time
Does Time Even Exist?
The arrow of time isn't in the equations. "Now" isn't a location. The most fundamental theories of physics may contain no time variable at all. A tour of the foundations, from the block universe to the holographic principle.
Read articleExam Room · GenAI
Choosing Between Prompting, RAG, and Fine-Tuning
A legal-ops team wants a model that answers questions about their 4,000 in-house contract templates. The first prototype, a plain Claude call with the question in the prompt, hallucinates clause numbers. Someone suggests fine-tuning; someone else suggests RAG. They solve different problems, so 'which is better' is the wrong frame; what matters is which problem the team actually has, and what each adaptation technique costs in time, data, and recurring spend.
Read articleThe Greenbox Story · Finding the Fit
Assumption Mapping: Testing What You Believe
The team lists everything they believe but haven't validated, ranks by risk, and discovers that their most confident assumption is the one most likely to be wrong.
Read articleExam Room · GenAI
How to Take a Foundation Model from Pick to Production Endpoint
A product team wants a chatbot that summarises support tickets. They have the tickets, a cloud account, and no ML background. Somebody says 'use a foundation model'. Between that sentence and a working endpoint sit roughly seven distinct stages, each with its own AWS service and its own decisions. Picking the model is the easy part; the real work is figuring out which stages this team can skip, which they absolutely cannot, and what AWS gives them at each step.
Read articleUnder the Hood · The AI Field Guide
The Reranker You Didn't Know You Needed
RAG explanations stop at 'embed the query, look up the nearest documents, hand them to the LLM.' That's the demo. In production, there's a second pass between the lookup and the LLM, and it's the one that actually makes retrieval work.
Read articleConsulting and Craft · Through the Kitchen
The Knife in My Hand
A sharp knife is safer than a dull one. Pick the right tool for the job, then put in the practice. Speed comes from practice, not pressure. A few focussed, well-maintained, well-practised tools will more predictably take you further than the latest beautiful shiny offering.
Read articleUnder the Hood · Time
Time Is Weirder Than You Think
Time doesn't flow at the same rate everywhere. It slows near massive objects, dilates at high speeds, and might not 'flow' at all. From GPS corrections to black holes, the physics that makes time strange.
Read articleExam Room · GenAI
Picking the AWS AI Service Tier for Each Feature
A product manager with no ML background has been told to add AI to a SaaS product, and has heard of Bedrock, SageMaker, Comprehend, Translate, Textract, Rekognition. AWS has three different shapes of AI offering, and the shortest path depends entirely on whether a ready-made service already does the job.
Read articleThe Greenbox Story · Finding the Fit
Jobs to Be Done: Why Subscribers Actually Stay
Greenbox discovers that subscribers don't stay for fresh vegetables. They stay because they don't have to think about dinner on Tuesday.
Read articleExam Room · Architecture
Choosing an S3 Storage Class for Cold Archives
Some data exists for compliance, not for use. Tens of terabytes of records sitting untouched until an auditor wants them. S3 has eight storage classes; only one of them is built for that pattern, and getting it wrong can cost an order of magnitude in a year you weren't paying attention to the bill.
Read articleUnder the Hood · The AI Field Guide
The Other Transformers
BERT and T5 are transformers too, but they aren't trying to be ChatGPT. They're trying to be the boring layer underneath, classifiers, embeddings, structured transformations, and they're often a better answer than an LLM.
Read articleThe Workshop
The Workshop: User Story Mapping
A pattern for laying out the whole user experience as a left-to-right narrative and then slicing it into releases, so the team can see both the shape of the thing they're building and the thinnest honest version they can ship first.
Read articleUnder the Hood · Time
Ticks or Tocks?
A second used to be a fraction of the day. Now it's defined by the vibrations of a caesium atom, and even that might not be precise enough. From quartz watches to optical lattice clocks, the story of how we learned to count time.
Read articleExam Room · Architecture
Routing to the Closest Healthy Region
A multi-region application needs to route requests to the closest healthy region, failing over automatically when the preferred one drops out, with no client-side retries and no extra health-check plumbing to maintain. Route 53 can do all of that in a single record set. Finding the correct combination means touring all seven routing policies and the attributes that separate them.
Read articleThe Greenbox Story · Shipping What Matters
Accessibility: A Product Decision, Not a Compliance Tick
A subscriber sends Greenbox an email that changes how they think about the signup flow. Not a complaint, a question the team didn't know they needed to answer.
Read articleUnder the Hood · Time
What Day Is It?
The date next to the time on your phone is its own kind of fragile. Calendars argue with the moon, the sun, and each other; whole days have been deleted by decree; and the year number on your screen depends on which monk's arithmetic your ancestors trusted.
Read articleThe Workshop
The Workshop: Impact Mapping
A pattern for tracing a business goal through the people whose behaviour has to change, through the changes themselves, to the things you might build — so the team can tell the difference between work that moves the number and work that just feels productive.
Read articleThe Greenbox Story · Shipping What Matters
User Story Mapping: Seeing the Whole
The Greenbox team has a growing backlog of stories but no sense of the whole. User Story Mapping lays out the full user journey, shows the gaps, and makes release planning obvious instead of political.
Read articleThe Workshop
The Workshop: Sprint Planning
A pattern for turning a refined backlog into a sprint the team believes in: a goal, a set of stories, task breakdown, and an explicit commitment, in one focused session.
Read articleUnder the Hood · Time
What Time Is It?
The hour on your phone is a fragile compromise between the sun and politics. Sundials, shipwrecks, railway time, DST, and the volunteer-maintained database that keeps the world's clocks roughly honest.
Read articleThe Workshop
The Workshop: Example Mapping
A pattern for breaking a user story into rules and concrete examples in twenty-five minutes, so the team knows whether it's ready to build before the sprint starts. Facilitator's playbook, failure modes, and what to do with the cards afterwards.
Read articleThe Greenbox Story · Shipping What Matters
Impact Mapping: Connecting Work to Goals
The Greenbox team is shipping features, but are they the correct ones? Impact Mapping connects engineering work to business goals, and reveals that the obvious next feature isn't always the important one.
Read articleThe Workshop
The Workshop: Event Storming a Process
The default Event Storming session and the one you'll run most often. One process, one wall, everyone who touches it in the room, three hours. You leave with a precise shared model of the flow and a short list of the questions it raised. Where Big Picture looks for shape across a whole domain, Process Level looks for precision within one flow.
Read articleThe Workshop
The Workshop: Event Storming a Domain
The widest zoom, and Brandolini's own entry point to Event Storming. Stand a whole business (or a whole product) in front of one wall with everyone who touches it. The output isn't a design or a plan; it's one picture that every department recognises, and a shortlist of the places worth digging into next.
Read articleThe Greenbox Story · Shipping What Matters
Teaching Your LLM the Codebase: CLAUDE.md and AGENTS.md
The Greenbox team's actual CLAUDE.md and AGENTS.md files. What goes in them, how they're structured, and how they shape the code an LLM generates.
Read articleThe Greenbox Story · Shipping What Matters
Teaching Your LLM the Codebase
Tom and Priya are both using LLMs to write code. They're getting different results. Not wrong, different. CLAUDE.md is how the team teaches the LLM to write code like them.
Read articleThe Greenbox Story · Shipping What Matters
Behaviour-Driven Development: From Stories to Working Software
Example Maps become stories. Stories become tests. Tests drive code. And in a world where LLMs can write the code, the discovery work matters more than ever, because the bottleneck isn't implementation any more; it's knowing what to build.
Read articleConsulting and Craft · Through the Kitchen
The Quiet Jar in the Fridge
My last sourdough starter died through a quiet chain of postponed feeds. I'm starting a new one today. Most of what I'm learning as I begin again, I wish I'd known a decade earlier about codebases.
Read articleThe Greenbox Story · From Chaos to Clarity
Sprint Planning: Turning Sticky Notes into Delivery
Walls of sticky notes, a prioritised backlog, concrete examples, and six weeks until the funding deadline. The team has done the discovery. Now they need a rhythm for delivery.
Read articleUnder the Hood · The AI Field Guide
To LLMs… and Beyond!
LLMs are one corner of a much larger field. Diffusion models, reasoning models, multimodal systems, open-weight vs closed — what they are, how they differ, and how to choose.
Read articleThe Greenbox Story · From Chaos to Clarity
Example Mapping: Making Stories Concrete
Four colours of card, twenty-five minutes, and a vague story becomes something you can actually build. Example Mapping is the bridge between understanding and implementation, and the technique you'll use more than any other.
Read articleUnder the Hood · The AI Field Guide
How LLMs Actually Work
Tokens, transformers, attention, and the training pipeline: what large language models actually do when they 'predict the next token', why they hallucinate, and why they're so good at code.
Read articleThe Greenbox Story · From Chaos to Clarity
Event Storming: Building Shared Understanding
The Greenbox team covers a wall in sticky notes and discovers they've been building on assumptions. A deep dive into Event Storming, the workshop that gets an entire domain out of one person's head and into shared understanding.
Read articleThe Greenbox Story · From Chaos to Clarity
Retrospectives: Catching the Wrong Kind of Fast
A small team, a good idea, LLMs generating code at lightning speed, and four weeks of building the wrong thing. The first part of a series on getting from discovery to delivery without wasting everyone's time.
Read articleConsulting and Craft · In Practice
The Value Is in Ideas, Not Code
LLMs have made code implementation almost trivial. The bottleneck has shifted from writing code to knowing what to ask for. Your library of patterns, concepts, and hard-won experience is now your competitive advantage.
Read articleThe Greenbox Story · From Chaos to Clarity
Minimum Viable Product: The First Box
Twenty-two people signed up from a flyer at the Margaret River farmers market. The first delivery was a disaster: wrong addresses, wilted spinach, and a box that arrived on Wednesday instead of Thursday. Maya cried in her car. Then she drove to the next address.
Read articleThe Greenbox Story · From Chaos to Clarity
Customer Discovery: Before the First Line of Code
Maya grew up on a farm outside Margaret River. She left to study computer science, spent a decade in consulting, and came back to Perth with an idea that wouldn't let go: what if the best produce in Western Australia could reach people's doors every week?
Read articleBash Pipes Execute in Subshells
...so they can't set environment variables in the parent process.
Read articlePooling ActiveMQ Connections for Camel
Pool connections between Camel and ActiveMQ to save time and resources.
Read article