The Platform — Barking Iguana

Eight cities. Eighty people. The culture work is settling. The acquisition integration is holding. And the technology that Tom and Priya built for three cities is coming apart at the seams.

Tom knows it’s bad when the Adelaide squad lead messages him at 7:14am on a Monday: “Deploy queue has five PRs in it. We’ve been waiting since Thursday. When’s our slot?”

He knows it’s really bad when the Brisbane squad lead messages him at 7:22am: “Same question. Also, three tests failed randomly on our branch. We re-ran and they passed. Nobody trusts the build any more.”

He knows it’s a crisis when Sam messages him at 7:31am: “Adelaide’s Thursday boxes went out with last week’s substitutions. Nobody noticed until a subscriber called. We had the data – it didn’t reach the delivery system. Four hours of wrong boxes.”

Tom puts down his coffee. It’s Monday morning. The kids are at school. Sarah has left for work. The house is quiet. His three monitors show: the deploy queue (five PRs, estimated time to clear: 4 hours), the test dashboard (three flaky tests, red for the third time this week), and the production monitoring – such as it is – which doesn’t cover Adelaide at all.

He opens the document he started eight months ago, in the management gap week. The one he saved in a folder called “Engineering Lead.” The note reads: Deploy pipeline – three squads, one staging, review bottleneck. Fix before it breaks something.

It broke something. Four hours of wrong boxes in Adelaide. Subscribers getting last week’s substitutions – the wrong produce for the wrong preferences. Nobody noticed because the monitoring only covers Perth, Melbourne, and Brisbane. Adelaide was added to the platform three months ago and nobody extended the alerting.

Patricia’s question from the board room echoes in Tom’s head: “If a system is healthy, the system should tell you. Not a person checking – the system itself.”

The system didn’t tell them. A subscriber did. Four hours later.

The problems compound

Tom spends Monday cataloguing the failures. Not for blame – for understanding. The problems aren’t new. They’ve been accumulating for eighteen months, seeded in small decisions that made sense at the time and compound into crisis at scale.

The deploy queue. GreenBox has eight environments – one per city, each with its own configuration for farm partners, delivery routes, courier APIs, and regional settings. A full deployment touches all eight environments sequentially. Each environment takes approximately forty-five minutes: build, test, deploy, smoke test, verify. A complete release takes six hours on a good day. On a bad day – when a test fails in environment three and the developer has to fix it and restart from environment three – it takes all day.

Developers wait in line. The Perth squad finished a feature on Wednesday. It’s Monday and the feature isn’t deployed because Brisbane and Sydney had priority releases on Thursday and Friday. The bottleneck isn’t writing code – it’s shipping it.

Tom recognises the pattern. It’s the Theory of Constraints, which Charlotte introduced during the Wardley Mapping work: the throughput of a system is limited by its single biggest bottleneck. In year one, the bottleneck was understanding – the team didn’t know what to build. They fixed that with Event Storming, JTBD, Example Mapping. In year two, the bottleneck was coordination – two squads building on the same codebase. They fixed that with bounded contexts and contract testing. In year three, the bottleneck has moved again. Now it’s delivery infrastructure. The constraint isn’t understanding or coordination. It’s the pipeline itself.

The flaky tests. The test suite runs for ninety minutes. At three cities, it ran for forty-five. The additional city-specific tests, integration tests, and end-to-end scenarios have doubled the runtime. Three tests fail randomly – not consistently, not predictably. One is a timing issue in the substitution engine’s async processing. One is a database connection pool exhaustion under load. One is a mystery that nobody has had time to diagnose.

The team has developed a coping mechanism: re-run and hope. If the build fails, trigger it again. If it passes on the second run, ship it. Nobody trusts the green build any more. A green build means “probably fine” rather than “definitely working.” The psychological impact is corrosive – developers stop caring about test failures because test failures have stopped meaning anything.

Priya, from Melbourne, articulates the danger in an engineering channel message: “When the test suite becomes a lottery, developers stop writing tests. When developers stop writing tests, we stop catching bugs before production. When we stop catching bugs before production, we become the company that ships wrong boxes to Adelaide for four hours.”

The monitoring gap. GreenBox’s monitoring was built for Perth. When Melbourne launched, Tom extended it to cover Melbourne. When Brisbane launched, he extended it again. Each extension was a manual process – adding dashboards, configuring alerts, setting thresholds. It worked because Tom was personally responsible and Tom paid attention.

Adelaide was added to the platform three months ago. Nobody extended the monitoring. Not because of negligence – because the process was manual and the person who knew how to do it (Tom) was occupied with the Harvest Box acquisition integration. The Sydney monitoring is a patchwork: half Harvest Box’s Heroku metrics, half GreenBox’s custom dashboards, none of it connected.

Patricia Osei’s golden signals question – latency, traffic, errors, saturation – has been answered for Perth, Melbourne, and Brisbane. For Adelaide and Sydney, the answer is “we don’t know.” For Hobart, which launches next month, the answer will also be “we don’t know” unless something changes.

Conway’s Law. The architecture mirrors the three-city organisation it was built for. The core platform has code paths for Perth, Melbourne, and Brisbane baked into the logic – if city == "perth" conditionals that Tom wrote in the early days and never refactored. Adding Adelaide meant adding a fourth conditional. Adding Sydney (via the API boundary) meant adding a fifth, different pattern. Adding Hobart will mean a sixth.

The bounded contexts from the DDD work saved the architecture from being a complete monolith. The supply matching, customer profiles, delivery logistics, and payment systems are separate services with defined interfaces. But the delivery infrastructure – the pipeline, the monitoring, the deployment process – was designed when GreenBox had three cities, two squads, and Tom personally overseeing every release. The architecture of the business logic scales. The architecture of the delivery system doesn’t.

Jess

Tom hires GreenBox’s first Site Reliability Engineer in early February. Her name is Jess Okonjo, she’s 30, and she arrives from Atlassian’s Sydney office with a systems engineering degree from UNSW and the particular calm of someone who has been on-call for production systems serving millions of users and finds a thirty-thousand-subscriber produce box company refreshingly manageable.

Jess is direct. Not Charlotte-direct – Charlotte is direct with purpose and precision. Jess is direct with data and indifference to hierarchy. She says what the numbers say, regardless of who’s in the room.

Tom interviews her on a video call. She’s sitting in her apartment in Newtown, Sydney, with a whiteboard behind her that has a half-finished system diagram and a sticky note that reads “ALL SYSTEMS FAIL.”

“Why GreenBox?” Tom asks.

“Because you’re about to break.” She says it matter-of-factly, without malice. “I read the job listing. You’re hiring your first SRE at eight cities. That means your infrastructure was built for fewer cities than you have, your deploy process is manual, your monitoring has gaps, and your test suite is either too slow or too unreliable. Probably both.”

Tom is quiet for a moment. “How do you know all that?”

“Because every company that hires their first SRE at this stage has exactly those problems. The specifics vary. The pattern doesn’t.”

Tom hires her. She starts two weeks later.

Jess spends her first week doing what Diane did when she first arrived at GreenBox: watching. She sits in on the Monday engineering standup. She reads the deploy logs. She traces the test suite failures. She maps the monitoring coverage. She talks to every squad’s lead developer. She reads the ADRs from the architecture decision records work.

On Friday afternoon, she presents her findings. Present: Tom, Priya (video), Charlotte (video), Maya (listening from the doorway because she wants to understand but doesn’t want to intimidate). The assessment is two pages. The summary is brutal.

“Your deploy pipeline is your biggest bottleneck. Your monitoring is a patchwork. Your test suite is a lottery. And your architecture assumes you’ll never have more than three cities.”

Tom doesn’t flinch. He’s been saying this for months. Hearing it from someone with fresh eyes and an Atlassian pedigree makes it real in a way that his own internal warnings didn’t.

“The good news,” Jess continues, “is that the foundation is solid. The bounded contexts are well-designed. The service interfaces are clean. The domain logic – the substitution engine, the customer matching, the farm availability system – is genuinely good. The problem isn’t the architecture. It’s the infrastructure around the architecture. You built a house with excellent rooms and no plumbing.”

Priya laughs. It’s the first time anyone has laughed in an engineering meeting in weeks.

Measuring the damage

Jess introduces a framework that Tom has heard of but never applied: DORA metrics. Four measures of software delivery performance, developed by the DevOps Research and Assessment team over a decade of industry research.

Deploy frequency. How often does the team deploy to production? Industry benchmark for high performers: on demand, multiple times per day. GreenBox: once a week, sometimes less. Each deployment is a planned event that requires coordination, a deployment slot, and Tom’s personal attention.

Lead time for changes. How long does it take from code committed to code running in production? High performers: less than one day. GreenBox: five days. A developer finishes a feature on Monday. The PR is reviewed by Wednesday. The deploy queue is reached by Thursday. The deployment completes on Friday. If the deploy queue is backed up, it slips to the following week.

Change failure rate. What percentage of deployments cause a failure in production? High performers: less than 5%. GreenBox: approximately 20%. One in five deployments causes something to break – a test environment mismatch, a configuration error, a city-specific conditional that wasn’t updated. The four hours of wrong Adelaide boxes was one such failure.

Time to restore service. When a failure occurs, how long does it take to recover? High performers: less than one hour. GreenBox: varies wildly. Perth failures are caught quickly because the monitoring is comprehensive. Adelaide failures take four hours because there’s no monitoring. Sydney failures are unpredictable because the Harvest Box integration adds complexity.

Jess puts the numbers on a whiteboard in the Perth office. She doesn’t annotate them with “good” or “bad.” She lets the numbers speak.

Tom stares at the whiteboard. He’s a developer who takes pride in his craft. These numbers say: the code is good, but the system for delivering the code is broken. It’s the engineering equivalent of a chef who makes excellent food but can’t get it to the table while it’s still hot.

“These are embarrassing,” Tom says.

“They’re honest,” Jess says. “And they’re fixable. The question is: what do you want them to be in six months?”

Tom thinks about the Cerulean Ventures pitch. He told the tech partner: “Six months to get deployment automation and golden signals in place. Twelve months to full platform maturity.” That was eight months ago. He’s behind. The acquisition consumed two months. The national expansion consumed three more. The platform work kept getting deferred because there was always something more urgent.

“Deploy frequency: daily. Lead time: less than one day. Change failure rate: under 5%. Time to restore: under one hour.”

“Good,” Jess says. “Now let me tell you what that requires.”

The golden signals

Before Jess can fix the deploy pipeline, she needs to see the systems. All of them. Properly.

She sets up monitoring dashboards using the golden signals framework that Patricia mentioned in the board room and Tom wrote down but never implemented. Four signals, measured for every service across every city.

Latency. How long do requests take? The substitution engine responds in 200ms for Perth and Melbourne (where the data is well-established) and 800ms for Adelaide and Sydney (where the data is thinner and the queries hit cold caches). Jess flags the disparity. “Your new cities are three times slower than your old cities. Subscribers in Adelaide are getting a noticeably worse experience.”

Traffic. How many requests per second? The patterns are predictable – spikes on Thursday (delivery day), Tuesday (customer interview insights feeding into the substitution engine), and Monday (weekly planning sync). But Sydney’s traffic pattern is different because the Harvest Box integration adds an extra API hop. Every subscriber request goes through GreenBox’s platform, then through the API boundary, then to the Harvest Box Rails app, then back. The round trip adds 300ms and generates twice the traffic on the API layer.

Errors. What percentage of requests fail? The overall error rate is 0.3% – acceptable. But Jess disaggregates by city and finds that Sydney’s error rate is 1.2%, four times the national average. The Harvest Box API boundary is intermittently failing – connection timeouts, mostly, when the Heroku dyno goes to sleep during low-traffic periods.

Saturation. How close are the systems to their resource limits? The database is at 60% capacity, which is fine. The message queue is at 80%, which is not fine – the substitution engine’s async processing is backing up during Thursday peak delivery, and the queue doesn’t drain until Friday morning. The deploy environments share a single build server that runs at 95% during deployments, which is why builds take 45 minutes instead of 15.

Jess creates dashboards for all of this and projects them on a screen in the Perth office. For the first time, the engineering team sees their systems properly. Not through Tom’s personal knowledge (“I think Adelaide is slow”), not through subscriber complaints (“my substitutions were wrong”), but through data that updates in real time and tells a story.

Tom stands in front of the dashboards for a long time. He’s quiet – the Tom-quiet that Sarah recognises as processing, not anger.

“I’ve been running this on instinct,” he says. “Personal knowledge. Mental models. Knowing which systems are fragile because I built them and remember the shortcuts I took.”

Jess nods. “That works at three cities. It doesn’t work at eight. The monitoring replaces your memory with something that doesn’t go on holiday or get distracted by acquisitions.”

The platform team

The monitoring reveals the problems. Fixing them requires a structural change that Tom has been avoiding.

GreenBox’s engineering team is organised by city. Perth squad, Melbourne squad, Brisbane squad, and so on. Each squad builds features for its city – customer-facing improvements, farm portal updates, delivery route optimisations. This structure worked well when each city was relatively independent. But the platform – the deploy pipeline, the monitoring, the shared infrastructure, the API boundary – belongs to no squad. It belongs to Tom, personally, and to whoever he can borrow for a few days when something breaks.

Jess names the problem using language from Team Topologies, a book she gives Tom to read.

“You have stream-aligned teams,” Jess says. “Each squad is aligned to a stream of work – a city, a set of customers, a product area. That’s good. Stream-aligned teams are fast because they own their domain end to end.”

“But you have no platform team. The shared infrastructure – deploys, monitoring, CI/CD, the build server, the database, the API boundary – is nobody’s primary responsibility. It’s maintained by interrupt. Someone notices it’s broken, someone fixes it, everyone goes back to their feature work. The platform is maintained by goodwill, not by ownership.”

Tom has read about Team Topologies. Charlotte mentioned it during the Two Squads work. But at two squads, a dedicated platform team felt like overhead. At eight squads, it’s a necessity.

“I need to restructure the engineering team,” Tom says.

Charlotte, on the video call, leans forward. “Tell me what you’re thinking.”

Tom draws on the whiteboard. It’s the first architectural diagram he’s drawn in months that isn’t about the product – it’s about the organisation.

Stream-aligned teams (product squads): Three squads, each responsible for a cluster of cities. Perth/Adelaide/Hobart. Melbourne/Regional Victoria. Brisbane/Sydney. Each squad owns the customer-facing features, the local delivery logic, and the farm partnerships for their cities. They ship features. They talk to customers. They run the discovery cadence.

Platform team: Jess’s team. Four people initially – Jess, one developer pulled from the Perth squad, one from Melbourne, and a new hire (infrastructure specialist). They own the deploy pipeline, the monitoring, the CI/CD system, the shared databases, the build infrastructure, and the API boundary to the Harvest Box system. They don’t build features. They build the system that lets the feature teams ship.

Charlotte sees what Tom is doing. “You’re applying Conway’s Law deliberately.”

“Yes. Right now the architecture mirrors the old three-city org – because that’s the org that built it. The deploy pipeline is manual because one person (me) did the deploys. The monitoring covers three cities because three cities was the scope. I need the architecture to mirror the org I want, not the org I had.”

“And the org you want has a platform team.”

“The org I want has a platform team that makes the stream-aligned teams faster. Not a team that gates the stream-aligned teams. Not a team that controls deploys. A team that makes deploys so easy that any developer can ship to any city without waiting.”

Jess nods. “That’s the right framing. The platform team’s success metric isn’t uptime – it’s developer velocity. If the stream-aligned teams are shipping faster and more reliably, we’re doing our job.”

Building the pipeline

The platform team begins work in the second week of February. Jess runs it like a startup within a startup – small team, tight scope, weekly demos to the engineering team.

Week one: Automated deploys. Jess and the platform team replace the manual deployment process with an automated CI/CD pipeline. Every pull request triggers a build. Every merged PR triggers a deployment to a staging environment. Every approved staging deployment triggers a production rollout.

The key innovation: parallel deployments. Instead of deploying to eight cities sequentially (six hours), the pipeline deploys to all eight cities simultaneously (forty-five minutes). Each city’s deployment is independent – if Adelaide fails, Perth continues. The failure is caught, reported, and fixed without blocking the entire release.

Tom watches the first automated deploy from his desk. The pipeline runs. Eight cities, simultaneously. Green across the board. Forty-three minutes.

He messages Sarah: Remember when I said I’d spend my Sunday doing the deploy? I don’t have to do that any more.

Sarah: Good. You were insufferable on deploy Sundays.

Week two: Test suite. Jess attacks the flaky tests with the pragmatism of someone who has fixed test suites at Atlassian scale. She quarantines the three flaky tests – moves them out of the main suite into a separate “known issues” suite that runs in parallel but doesn’t block the build. The three tests are tagged with owners and deadlines for fixing the root cause. The main suite drops from ninety minutes to fifty-five.

Then she parallelises the test suite. Instead of running all tests sequentially on one machine, the tests run across four machines simultaneously. Fifty-five minutes becomes fourteen.

A green build now means something again. Developers start trusting the pipeline. When a test fails, it means something broke – not that the universe was feeling capricious.

Priya, from Melbourne, sends a message to the engineering channel: “I just deployed a feature at 2pm and it was in production at 2:20pm. Last month that would have taken until Friday. I could cry.”

Week three: Monitoring. The golden signals dashboards are live, but they’re passive – they show data, they don’t alert. Jess builds alerting for all eight cities. Latency above 500ms: alert. Error rate above 1%: alert. Queue saturation above 85%: alert. Database approaching capacity: alert.

The alerts go to a shared channel: #platform-alerts. The platform team responds during business hours. After hours, the on-call rotates between platform team members. No more Tom-as-single-point-of-failure.

The first week of alerting produces forty-seven alerts. Most are threshold calibration issues – the alerts are too sensitive and fire on normal traffic spikes. Jess tunes them. By week two, the alerts are down to three per day, each one meaningful.

The Adelaide substitution failure – four hours of wrong boxes – would now be caught in under five minutes. The monitoring would detect the stale data, alert the platform team, and the fix would be deployed through the automated pipeline. Five minutes instead of four hours. That’s the difference between “a system that works” and “a system that knows when it’s broken.”

Week four: Feature flags and canary deployments. Jess introduces two practices that change how GreenBox ships software.

Feature flags allow the team to deploy code to production without activating it for users. A new substitution algorithm can be in the production codebase, tested against real data, but invisible to subscribers until the flag is turned on. This decouples deployment from release – the code ships continuously, the features activate when the team is confident.

Canary deployments roll out changes to a small percentage of users first. A new delivery route optimisation deploys to 5% of Perth subscribers. If the error rate stays below the threshold, it rolls out to 25%, then 50%, then 100%. If the error rate spikes, the deployment automatically rolls back.

Tom, who once spent a weekend manually rolling back a botched delivery tracking release, watches the first canary deployment with something close to awe. The system deploys to 5% of subscribers. The monitoring watches. The latency is flat. The error rate is zero. The deployment expands to 25%. Still flat. 50%. Still flat. 100%. Done.

“That’s what it’s supposed to feel like,” Jess says. “Boring. Reliable. Nobody heroic.”

Infrastructure as code

The final piece of Jess’s first-month sprint is the one that makes Charlotte smile: infrastructure as code.

GreenBox’s infrastructure – the servers, the databases, the message queues, the build machines, the monitoring systems – has been managed manually. Tom set up Perth’s infrastructure by hand. He set up Melbourne’s by copying Perth’s configuration and modifying it. Brisbane was a copy of Melbourne. Adelaide was a copy of Brisbane. Each copy introduced small differences – a different database version, a different queue configuration, a slightly different monitoring threshold.

These differences are why Adelaide had the monitoring gap. When Tom copied Brisbane’s infrastructure for Adelaide, he forgot to copy the alerting configuration. A single missed setting, invisible until it mattered.

Jess writes the entire infrastructure as code – Terraform and AWS CDK definitions that describe every server, every database, every queue, every monitoring dashboard, every alerting threshold. The infrastructure is versioned, reviewed, and deployed through the same pipeline as the application code.

Adding a new city no longer means Tom spending a weekend manually configuring servers. It means adding a city configuration file and running the pipeline. The infrastructure deploys itself. Hobart, which launches next month, will be the first city deployed entirely through infrastructure as code. Jess estimates setup time: two hours. The previous record (Adelaide) was three weeks.

Charlotte reviews the infrastructure code during a Friday engineering demo. She watches Jess deploy a complete staging environment for a hypothetical ninth city in four minutes.

“This is what I meant when I said GreenBox needed platform maturity,” Charlotte says. “Not more code. Not more features. The ability to deploy confidently, monitor continuously, and scale without manual heroics.”

Tom is in the room. He’s watching the demo with an expression Charlotte hasn’t seen before – not the resistant scepticism of year one, not the reluctant acceptance of year two. Something quieter. Relief, maybe. Or recognition.

Tom’s full leadership moment

The platform work takes a month. By the end of February, the numbers tell the story.

Metric	Before	After	Target
Deploy frequency	Weekly	Daily (often multiple)	Daily
Lead time	5 days	4 hours	< 1 day
Change failure rate	20%	6%	< 5%
Time to restore	4 hours (Adelaide)	12 minutes (average)	< 1 hour
Test suite duration	90 minutes	14 minutes	< 15 minutes
Deploy duration	6 hours (sequential)	43 minutes (parallel)	< 1 hour

Tom presents these numbers at the quarterly engineering review. Charlotte, Diane, Maya, Patricia (video), and all eight squad leads are present.

He doesn’t present them as a triumph. He presents them as a lesson.

“Eight months ago, I told the Cerulean investors we’d have deployment automation and golden signals in six months. It took eight months. The acquisition delayed us. The national expansion delayed us. And – honestly – I delayed us, because I kept thinking the platform work could wait while we shipped features.”

He pauses. “The Adelaide incident taught me that the platform isn’t separate from the features. When the platform breaks, the features break. When the deploy pipeline is slow, the features are slow. When the monitoring has gaps, the failures are invisible. The platform is the foundation. Everything else sits on top of it.”

Charlotte, who has been waiting three years for Tom to say this, doesn’t say “I told you so.” She says: “What would you have done differently?”

Tom considers the question. He’s standing in front of a room of people he leads – not just manages, leads. He’s responsible for the technical decisions of a company with eighty people and thirty thousand subscribers. The weekend prototype builder from Series 5 – the one Charlotte called “week one vibes” – is still in there. But the person standing in this room is different.

“I would have hired Jess six months earlier. I would have started the platform team when we hit five cities, not eight. I would have measured the DORA metrics from the start, not after the crisis. And I would have listened to Priya, who told me two years ago that the architecture mirrors the org structure, and if we wanted a different architecture, we needed a different org structure.”

Priya, on the Melbourne video feed, doesn’t say anything. She doesn’t need to.

Charlotte asks the harder question. “What did you learn about your own role?”

Tom looks at the whiteboard. The DORA metrics. The golden signals. The platform team structure. The infrastructure as code. None of this is code he wrote. Jess designed the pipeline. Priya reviewed the architecture. The platform team implemented it. Tom’s contribution was: recognising the problem, hiring the right person, restructuring the team to address it, and getting out of the way.

“I learned that the engineering lead’s job isn’t to build the platform. It’s to make sure the platform gets built.” He pauses. “In year one, I was the builder. In year two, I was the architect. In year three, I’m the – I don’t know what the word is.”

“Leader,” Charlotte says.

Tom doesn’t argue.

That evening, he’s at home. Sarah is marking student papers at the kitchen table. Leo is building something elaborate with LEGO on the living room floor. Ava is reading, as always.

“How was work?” Sarah asks, not looking up from the papers.

“I presented to the leadership team. The platform numbers.”

“Did they go well?”

“Yeah. Charlotte asked what I learned about my role. I said something about not building things myself any more.”

Sarah puts her pen down. “Is that hard?”

“It used to be. Today it wasn’t.” He pauses. “Jess designed the deploy pipeline. Priya reviewed the architecture. The platform team built it. I just – held the space for it to happen.”

“That sounds like something Charlotte would say.”

“It’s something Priya said. Two years ago. I didn’t listen.”

Sarah picks up her pen. “You listened eventually. That counts.”

Tom opens his laptop. He has a message from Jess: The Hobart infrastructure deployed in 1 hour 47 minutes. Fully automated. Zero manual steps. First city launched entirely through the platform.

He types back: Show me the dashboard.

Jess sends a screenshot. Eight cities. Green across the board. Latency nominal. Error rates below threshold. Deploy pipeline clear. The system is healthy, and the system is telling them so.

Tom remembers the weekend delivery tracking prototype from year two. The one he built in two days, alone, in his home office after the kids were asleep. Charlotte said “week one vibes” – meaning: you’re still the solo builder, solving problems alone because it’s faster than working with the team.

Now he’s looking at a platform that eight squads depend on, built by a team he structured, designed by an engineer he hired, monitored by dashboards he didn’t build. The LLM still writes a lot of the code – Jess’s team uses it extensively for the Terraform modules and the alerting configurations. But the LLM didn’t decide to hire Jess. The LLM didn’t restructure the team into stream-aligned and platform. The LLM didn’t measure the DORA metrics and present them to the leadership team with an honest assessment of what went wrong.

The LLM wrote the code. But the team wrote the thinking.

Same principle as the ensemble programming sessions. Same principle as the Event Storm. Same principle as the very first Example Mapping session in a room above a cafe, when the whole company was five people and two hundred subscribers.

The tools don’t matter. The thinking does. And the thinking is always, always a team sport.

Charlotte and Diane and Tom

There’s a moment, late on the Friday after the quarterly review, when Charlotte, Diane, and Tom are the last three people in the Perth office. Maya has gone home. Sam left at five. Jas is in a video call with the Sydney designer. The office is quiet.

Charlotte is updating her spreadsheet. Row 47. GreenBox. She adds the DORA metrics to a column she’s labelled “Platform Maturity.” She gives GreenBox an 8 out of 10. Two months ago it was a 3.

Diane is on her phone, texting someone – probably the Sunridge buyer, with whom she maintains a complicated professional relationship. She puts the phone down and looks at Tom.

“The engineering needed structure,” Charlotte says, not looking up from her spreadsheet.

“The engineering needed to stop being the bottleneck to the business,” Diane says.

Tom looks at both of them. “You’re both right. And it took me too long to figure out how to do both at once.”

Charlotte closes her laptop. “You didn’t just do both. You built a system that delivers both. That’s the difference between a lead who makes decisions and a leader who builds the organisation that makes decisions.”

Diane nods. “When I scaled Sunridge, I was the bottleneck until the day I sold the company. I never built the platform team. I never hired my Jess. I just ran faster until I couldn’t run any more.” She pauses. “You didn’t do that. You hired someone better than you at the thing you couldn’t do, and you let them do it.”

Tom is uncomfortable with the compliment. He deflects with humour, the way he always does. “Jess would be insulted to hear you describe infrastructure as ‘the thing I couldn’t do.’ I could do it. I just couldn’t do it and also lead eight squads and also integrate an acquisition and also parent two children.”

“That’s the point,” Charlotte says. “Nobody can. That’s why you build teams.”

The office is quiet. The Event Storm photos on the wall. The planning onion, updated for the third year. The DORA metrics on the whiteboard, still there from Tom’s presentation. The golden signals dashboard on the wall-mounted screen, eight cities, all green.

Tom looks at the dashboard. “Eight cities. All green.”

Charlotte: “For now.”

Diane: “That’s all ‘for now’ ever means.”

Tom smiles. It’s the first genuine smile he’s had in the office in weeks. Not because the problems are solved – they’re not. The change failure rate is 6%, not the 5% target. The Harvest Box API boundary is still a maintenance burden. The test suite has two quarantined flaky tests that nobody has root-caused. Hobart launches next month and they’ll discover new problems.

But the system works. The pipeline flows. The monitoring watches. The teams ship code daily instead of weekly. The golden signals are golden. And when something breaks – which it will, because everything breaks – the system will tell them, and they’ll fix it, and they’ll ship the fix in minutes instead of days.

The hook

Eight cities. Eighty people. The platform works. The pipeline flows. The culture holds – mostly. And then Hartland Group calls.

Not about Freshly. Not about a competitive move. Not about the market.

About GreenBox.

The next series, The Big Table (coming February), follows GreenBox into the conversations that will define its future.