GreenBox is a produce-box company delivering to 1,200 subscribers across Perth and Melbourne, with a team of fifteen. The architecture is clean, the domain logic is captured in decision tables and ADRs – but now someone wants to build a delivery tracking system, and the question is whether they should.
Sam is drowning.
Her Tuesday starts at 7am, which is already later than she planned. She’d set her alarm for 6:30 to get ahead of the delivery confirmations, but she hit snooze twice. She showers, feeds herself standing at the kitchen counter, and opens the spreadsheet on the bus.
The spreadsheet has seventeen tabs. One for each courier. One for each city. One for the routes that don’t fit neatly into either. Sam has colour-coded it – green for confirmed, yellow for in transit, red for problems. By 8am, there are five red cells. By noon, there are twelve.
GreenBox has a thousand subscribers in Perth. Melbourne is live, with two hundred and growing fast. Fifteen people on the team. Three new farms in the Yarra Valley. And Sam – who has handled customer operations since the very beginning – is still managing delivery logistics with this spreadsheet and a phone.
At 1pm, the Melbourne courier calls. Three boxes didn’t make the morning run. Wrong addresses on the dispatch sheet – the new farm onboarding data had mismatched postcodes. Sam spends forty minutes on the phone sorting it out, while Perth emails stack up unanswered.
At 2:30, a Perth subscriber messages: “My box hasn’t arrived. It’s 38 degrees and I’ve got dairy in there.” Sam checks the spreadsheet. The courier shows “in transit.” She calls. Hold music. Four minutes of hold music. The courier confirms: the driver is running late. ETA two hours. Sam calls the customer back, apologises, and offers a replacement for anything that spoils. The customer is gracious. Sam adds a row to the “issue log” tab.
At 4pm, another Melbourne customer emails asking “where’s my box?” Then another. Then three more. Sam checks the tracking for each one individually – open the spreadsheet, find the customer, find the courier, call the courier, get an update, email the customer. Five customers. Five calls. Five individual emails. The Melbourne courier doesn’t have a self-service tracking page. Everything goes through Sam.
By 6pm, the red cells outnumber the green ones. She’s been on the phone for four straight hours. She missed two customer complaints that came in while she was dealing with the Melbourne courier. She finds them at 6:15 and her stomach drops – one is from a subscriber who’d already flagged a problem last week. Two problems in two weeks. The subscriber has cancelled by the time Sam replies.
Sam leaves the office at 6:30. She sits in her car in the car park. It’s dark – July in Perth, dark by 5:30. She puts her hands on the steering wheel and cries. Not dramatically. Quietly, with the engine running and the heater on and her forehead against the wheel. She cries because she’s been at this for eleven hours and she lost a subscriber anyway. She cries because she knows tomorrow will be the same.
Then she feels angry at herself for crying. She’s twenty-seven. She’s good at this job. She handles things. That’s who she is – Sam handles things.
She calls her mum.
“You’re not a machine, Samara,” her mum says. Her mum only uses her full name when she’s worried. “You can’t do everything yourself. Nobody can.”
“I know, Mum.”
“Do you? Because you sound exactly like you did during finals.”
Sam drives home. She eats leftover rice from the fridge. She goes to bed at 9pm and sets her alarm for 6:30 again.
On Monday, at standup, she says what she’s been needing to say for weeks.
“I need a system. Delivery tracking. Route optimisation. Real-time notifications so customers stop emailing me to ask where their box is.” She hesitates, then adds: “Customers are comparing us to Freshly. They’ve got real-time tracking in their app. Our customers email me and I call a courier. That’s the gap.”
Everyone agrees. The question is: how?
Tom’s instinct
Tom has been watching Sam struggle for weeks. He’s already been thinking about it. In fact, he’s been thinking about it since Sam’s Tuesday meltdowns started becoming visible – the dark circles, the clipped Slack messages, the way she stopped coming to Friday drinks because she was still answering emails at 7pm. Tom solves problems with code. That’s who he is. And this is a problem that code can solve.
“Give me a week,” he says. “I’ll build it. The LLM can generate a tracking system – courier API integrations, a customer-facing status page, automated notifications. I’ve already scoped it out in my head. Five days, maybe six.”
He’s probably right. Tom is fast with an LLM. He built the substitution engine from Charlotte’s decision tables in three days. A delivery tracking system is conceptually simpler – poll the courier API, update a status, send a notification. The code isn’t hard. With an LLM generating the boilerplate, Tom could have a working prototype by Friday. He’d already started sketching the data model in his notebook during last week’s standup.
Charlotte has been listening. She doesn’t disagree with Tom’s estimate. She disagrees with the question.
“Before we talk about how to build it,” she says, “let’s talk about whether we should build it.”
Tom’s face changes. It’s subtle – a tightening around the jaw, the slight lean back in his chair that Priya has learned to read as “I’m about to argue.” He’d already scoped this project. He’d already imagined the clean architecture, the courier adapter pattern, the notification pipeline. In his head, the delivery tracker was halfway built.
“Why wouldn’t we? We need it. I can build it. The LLM makes it cheap.”
“Cheap to build,” Charlotte says. “But is it cheap to own?”
The map
Charlotte introduces Wardley Mapping. The technique comes from Simon Wardley, and the core idea is disarmingly simple: plot everything in your value chain on two axes, and use the map to make strategic decisions about what to build, what to buy, and what to borrow.
The vertical axis is visibility to the user. Things at the top are directly visible – the customer experience, the product itself. Things at the bottom are invisible infrastructure – servers, payment processing, logistics.
The horizontal axis is evolution. Things on the left are novel, custom, poorly understood – genesis and custom-built. Things on the right are well-understood, standardised, commodity – products and utilities. Everything moves left to right over time as practices mature and markets form.
Charlotte draws the axes on the whiteboard and writes four labels along the bottom: Genesis, Custom-Built, Product, Commodity.
“Let’s map GreenBox,” she says.
Mapping the value chain
The team spends an hour placing GreenBox’s capabilities on the map. Charlotte facilitates. Lee dials in from his home office – he’s still involved for the strategic sessions, and this one’s important.
Lee dials in from his home office near Margaret River. He’s been mostly absent during Series 3 – Charlotte has the scaling expertise he doesn’t, and Lee is honest enough to step back when someone else is better positioned to help. But this session is strategic, not structural, and Lee has opinions about strategy.
They start at the top. What does the customer see?
The box itself. The curated selection of seasonal produce, matched to preferences, sourced from local farms. This is what customers pay for. It goes at the top of the map, and it sits firmly on the left – custom, evolving, unique to GreenBox. No one else does it quite this way.
The substitution engine. The system that swaps produce when supply falls short, respecting allergens, preferences, and seasons. Also custom, also on the left. Built from Maya’s expertise and Charlotte’s decision tables. A competitive advantage.
The farm relationships. Maya’s network of local growers, the reliability data, the trust built over years. Custom. Far left. Can’t be bought.
Customer notifications. “Your box is on its way.” “Your delivery is scheduled for Thursday between 2 and 5.” Visible to the customer, but the notifications themselves are not unique. Every delivery service sends them. Middle of the map, trending right.
Delivery tracking. Knowing where a box is between the warehouse and the customer’s door. Important, but not unique. Couriers have APIs for this. Third-party tracking platforms exist. This sits on the right side of the map – it’s a commodity capability.
Route optimisation. Working out the most efficient delivery sequence. Well-solved problem. Dozens of providers. Commodity.
Payment processing. Already outsourced to Stripe (see ADR-013). Commodity. Far right.
Charlotte steps back and looks at the map.
“Look at where things cluster,” Charlotte says. “On the left, you’ve got the things that make GreenBox special. The curation, the substitution engine, the farm relationships. Nobody else has these. They’re your competitive advantage.”
She points to the right side. “Over here, you’ve got delivery tracking, route optimisation, payment processing. These are solved problems. Hundreds of companies do this. You’re not going to out-deliver Australia Post by building your own tracking system.”
The build-vs-buy calculus
Tom sees where this is going. “You’re saying don’t build the delivery tracking.”
“I’m saying let’s do the maths first,” Charlotte replies. “What does it actually cost to build it versus buying it?”
They work through the numbers together.
Option A: Build with LLM.
Tom estimates five days of development. At his loaded cost, that’s roughly $5,000. The LLM keeps the dev cost low – most of the courier API integration is boilerplate that generates cleanly.
But then what? The Perth courier uses one API. The Melbourne courier uses a different one. When they expand to Sydney, that’ll be a third. Each courier updates their API periodically. Each one has different error codes, different retry logic, different webhook formats.
Charlotte asks: “Who maintains the integrations when the courier changes their API?”
Tom: “Me, I guess. Or whoever’s available.”
“How often do courier APIs change?”
Priya checks. “The Perth courier pushed a breaking change three months ago. The Melbourne one changed their authentication in January.”
“So at least twice a year, someone drops what they’re doing to fix a courier integration. How long does that take?”
Tom: “Half a day, maybe a full day if it’s gnarly.”
Charlotte writes it on the whiteboard:
- Build cost: $5,000 (one-off)
- Ongoing maintenance: ~4 days/year at $1,000/day = $4,000/year
- Year 1 total: $9,000
- Year 2 total: $13,000
- Year 3 total: $17,000
And that’s assuming one integration per city. When they add Sydney, Adelaide, Brisbane – each new city adds a courier, and each courier adds maintenance burden.
Option B: Third-party tracking service.
Sam has already been looking at options. There’s a platform that integrates with every major Australian courier, provides real-time tracking, customer notifications, and a branded tracking page. $200 per month.
- Year 1 total: $2,400
- Year 2 total: $4,800
- Year 3 total: $7,200
No maintenance. No courier API changes to handle. No developer time. If a courier changes their API, the platform handles it. That’s their entire business.
Charlotte puts the two columns side by side.
“The LLM made Option A cheaper to build than it used to be. Five years ago, this integration would have cost $25,000 in developer time. The LLM brought it down to $5,000. That’s real. But the maintenance cost didn’t change. The LLM helps you build it fast. It doesn’t help you maintain it forever.”
Tom stares at the numbers. He can see the crossover. By month eight, the custom build is more expensive. By year two, it’s almost three times the cost. And it gets worse as they add cities.
What the LLM changed, and what it didn’t
This is the insight Charlotte wants the team to internalise. LLMs have genuinely shifted the economics of software development. Things that used to be expensive to build are now cheap to build. That shift is real and significant.
But it’s only half the equation.
Software has two costs: building and owning. The building cost is the one-time investment to get something working. The owning cost is everything after – maintenance, bug fixes, API changes, security patches, onboarding new developers, handling edge cases you didn’t anticipate.
LLMs have driven down the building cost dramatically. A competent developer with an LLM can produce in a week what used to take a month. But the owning cost hasn’t changed much. The courier’s API still changes twice a year. The edge cases still emerge in production. The new developer still needs to understand the code to maintain it. The LLM helps you build the first version, but version 2, 3, 4 and the ongoing trickle of fixes – that’s where the real cost lives.
Charlotte draws it simply:
“The left side of this chart is where LLMs shine,” Charlotte says. “Month one, build is cheap. But zoom out. The lines cross. And they keep diverging.”
Lee, who’s been quiet on the call, adds: “There’s a hidden cost too. Every hour Tom spends maintaining a courier integration is an hour he’s not spending on the substitution engine or the curation algorithm. The opportunity cost of building commodity software is that you’re not building the stuff that actually differentiates you.”
He pauses, then says something that stays with the team. “Look at the map. Everything on the right side – delivery tracking, route optimisation, payment processing – Freshly has too. They probably have better versions. They’ve got twelve million dollars and a team of engineers in Sydney. You will never out-deliver Freshly on logistics. But look at the left side. The farm relationships. The curation. The substitution engine that knows Dave’s zucchini over-promises by twenty percent. That column, they don’t have. That column, they can’t buy.”
As if to underscore Lee’s point, Maya shares something that happened the previous week. Dave’s son Ben – thirty, broad-shouldered, more comfortable with a phone than his father – had started using the farm portal to submit weekly availability. Dave had watched him do it, standing behind Ben in the farmhouse kitchen, squinting at the screen.
“That’s more technology than I’ve used in fifty-eight years,” Dave had told Maya on the phone afterwards. A pause – Dave’s pauses could fill a room. “It’s quicker than calling you, though.”
The generational handoff was happening. Ben handled the portal. Dave handled the soil. The farm relationship wasn’t with a system. It was with a family.
The counter-example
Priya raises her hand. “So should we never build anything? Just buy everything?”
“No,” Charlotte says. “That’s the other trap. Let me show you.”
She points back at the Wardley Map. “The substitution engine. Where does it sit?”
Left side. Custom. Evolving. Core to the value proposition.
“Is there a third-party substitution engine you can buy?”
Maya laughs. “No. Nobody does what we do. The farm relationships, the seasonal logic, the preference matching – that’s us.”
“Exactly. The substitution engine is worth building because it’s your competitive advantage and no one else sells it. The decision tables, the generated code, the ongoing refinement – that’s investment in the thing that makes GreenBox GreenBox.”
Charlotte writes the rule on the whiteboard:
Build what differentiates you. Buy what’s commodity.
“The Wardley Map tells you which is which. Things on the left – custom, evolving, unique – build those. Things on the right – commodity, well-understood, many providers – buy those. The LLM shifted the build cost down, which means some things that were ‘buy’ become ‘build.’ But only if they’re close to the middle. Anything that’s firmly commodity – delivery tracking, route optimisation, payment processing – still makes more sense to buy.”
Tom builds it anyway
The team agrees with the analysis. They sign up for the third-party tracking service. Sam is relieved. The integration takes Priya two days using the platform’s API documentation and an LLM to generate the webhook handlers.
But Tom can’t let it go.
Over the weekend, he builds the delivery tracker anyway. He’s a developer. He sees a problem, he wants to solve it with code. The LLM generates the courier API integration for Perth in three hours. Tom adds the customer notification system on Saturday afternoon. By Sunday evening, he has a working prototype. It polls the Perth courier’s API, updates delivery statuses in real time, and sends SMS notifications when a box is thirty minutes away.
It’s beautiful. Clean code. Good test coverage. Works perfectly.
On Monday, he shows it to the team. “Look, I know we went with the third party. But I built this over the weekend and it’s better. The notifications are more granular. The tracking is real-time instead of the platform’s fifteen-minute delay.”
Charlotte looks at it. “It’s good work, Tom. Genuinely. Now show me the Melbourne integration.”
Tom pauses. “I haven’t done Melbourne yet.”
“Do it now. I’ll wait.”
Tom opens his LLM and starts prompting. The Melbourne courier uses a completely different API. Different authentication (OAuth2 instead of API keys). Different status codes (the Perth courier uses “in_transit” and “delivered”; the Melbourne courier uses “out_for_delivery”, “attempted”, and “completed”). Different webhook formats. Different time zone handling – the Perth courier returns timestamps in AWST; the Melbourne courier returns them in UTC.
Three hours later, Tom has a Melbourne integration. It works. But it’s ugly. The mapping between the two courier APIs is a mess of conditional logic. The time zone handling required a whole new module. The test suite doubled in size.
“Now imagine doing this for Sydney,” Charlotte says gently. “And Adelaide. And Brisbane. Every city has a different courier. Every courier has a different API.”
Tom looks at his weekend project. He looks at the third-party platform, which already supports every Australian courier out of the box.
“Yeah,” he says. “Fair enough.”
The prototype goes in the bin. Tom’s weekend of work is gone. Charlotte doesn’t frame it as failure.
“You learned something real,” she tells him. “You confirmed the analysis with your hands instead of just your head. And now you’ll never wonder ‘what if we’d built it ourselves?’ You know the answer. The Perth integration was easy. The second one was hard. The fifth one would have been a nightmare.”
She glances at the Wardley Map still pinned to the whiteboard. “And every day you spent building and maintaining that tracker is a day Sam spends on the phone because it’s not live yet. The Wardley Map isn’t just about Tom’s time. It’s about Sam’s. Delivery tracking is a commodity. Sam’s judgement is not. We’re burning a scarce resource on a commodity task.”
Tom looks at Sam, who has been quiet through this conversation. Sam looks back. She doesn’t say anything, but her expression says enough.
The Wardley Map as a strategic tool
The delivery tracking decision becomes a template. Over the following weeks, the team faces three more build-vs-buy decisions:
Customer analytics dashboard. Where does it sit on the map? Right side – commodity. Dozens of analytics platforms exist. They go with a third party. $150/month.
Seasonal recipe suggestion engine. Where does it sit? Left side – custom. Nobody else matches recipes to the specific contents of a GreenBox produce box, accounting for seasonal availability and customer preferences. They build it. Tom and the LLM produce a first version in four days.
Email marketing automation. Right side. Commodity. They already use Mailchimp.
Each decision takes ten minutes of mapping instead of two hours of debate. Charlotte pins the Wardley Map to the wall next to the Event Storm photographs and the Context Map.
“When someone proposes a new feature,” Charlotte says, “the first question is: where does it sit on the map? If it’s on the left, we build it. If it’s on the right, we buy it. If it’s in the middle, we have a conversation. But at least we’re having the right conversation.”
LLMs and the shifting boundary
Lee raises a nuance in the next strategy session. “The boundary between build and buy isn’t fixed. LLMs are moving it.”
He’s right. Before LLMs, the build cost for a custom analytics dashboard might have been $50,000. At that price, buying a $150/month subscription was obvious. But with LLMs, the build cost might be $5,000. At that price, the decision is less clear-cut.
Charlotte agrees but adds a qualifier. “LLMs shift the boundary, but they don’t eliminate it. Some things that used to be ‘buy’ are now ‘build’ – things near the middle of the map where the custom version adds real value and the build cost was the main barrier.”
She gives an example. The recipe suggestion engine. Before LLMs, building a recommendation system was a $100,000 project requiring a data scientist and months of work. Now Tom and an LLM can build a first version in four days. That shifted it from “too expensive to build” to “worth building because it’s custom and differentiated.”
“But delivery tracking didn’t shift. The build cost went from $25,000 to $5,000, sure. But the maintenance cost stayed at $4,000 a year, and the third-party option is still $2,400 a year. The gap narrowed. The answer didn’t change.”
The rule of thumb Charlotte gives the team: LLMs reduce the build cost but not the own cost. If the own cost is what makes something expensive, the LLM didn’t change the calculus.
The maintenance trap
Charlotte names the pattern she sees most often in teams using LLMs: the maintenance trap.
“It goes like this. The LLM makes building cheap. So you build everything. Custom delivery tracking. Custom analytics. Custom email templates. Custom everything. Each one takes a few days. Each one works beautifully on day one.”
“Six months later, you’ve got fifteen custom systems. Each one needs maintenance. Each one has edge cases. Each one has dependencies that change. The developer who built it has moved on to the next thing and barely remembers the code. You’re spending more time maintaining custom systems than building new features.”
“That’s the trap. The LLM makes it feel free to build. But owning isn’t free. Owning is never free.”
She draws the distinction:
| Build cost (with LLM) | Ongoing cost | Should you build? | |
|---|---|---|---|
| Competitive advantage, no third party exists | Low | High, but worth it | Yes |
| Competitive advantage, third party exists but poor fit | Low | Medium | Probably yes |
| Commodity, third party exists | Low | High (relative to subscription) | No |
| Commodity, no third party exists | Low | High | Build, but plan to replace when a product emerges |
“The bottom-left cell is the trap,” Charlotte says. “Build cost is low, so it feels like the right call. But the ongoing cost makes it wrong.”
When the map is wrong
The Wardley Map isn’t infallible. Charlotte is honest about this.
“Maps are models. Models are simplifications. Sometimes you’ll place something on the right side – commodity – and discover it’s actually more custom than you thought. That’s fine. The map isn’t a permanent decision. It’s a starting point for a conversation.”
She gives an example from a previous client. “A meal-kit company I worked with mapped ‘recipe development’ as commodity. They figured they’d license recipes from a food media company. Turns out their subscribers specifically loved the quirky, personalised recipes that the founder wrote. Recipe development was actually their competitive advantage. They moved it back to the left side of the map and brought it in-house.”
“The point isn’t to get the map perfectly right the first time. The point is to have a map at all. Without it, every build-vs-buy decision is a gut feeling. With it, you’ve at least got a framework.”
Lee adds from the call: “And update the map. Quarterly, at least. Things move. Delivery tracking was commodity when we started. If GreenBox decides that same-day farm-to-door delivery is a differentiator, tracking might shift left. The map is alive.”
When to use Wardley Mapping
Wardley Mapping is most useful when the team faces a strategic decision about capability investment. Build, buy, or borrow. In-house or outsource. Custom or off-the-shelf.
The signals that you need a Wardley Map:
- Someone says “we should build this” and someone else says “can’t we just buy it?” and neither person has a framework for resolving the disagreement
- The team is spending developer time on something that feels like it should be someone else’s problem
- A new capability is needed and the build cost is low enough that building feels obvious – but the long-term implications aren’t clear
- The company is growing into new markets and needs to decide which capabilities to invest in
The signals that you don’t need a Wardley Map:
- The decision is obviously commodity (payment processing, email sending, file storage) – just buy it
- The decision is obviously custom and core (your unique product logic, your domain expertise) – just build it
- The decision is small enough that the cost of mapping exceeds the cost of being wrong
Charlotte’s practical advice: “Don’t Wardley Map everything. Map the things where the answer isn’t obvious. If you’re debating for more than ten minutes, pull out the map.”
What the team learned
The Wardley Map gave GreenBox a language for strategic technology decisions. Instead of “Tom thinks we should build it” versus “Sam thinks we should buy it,” the conversation became “where does this sit on the map, and what does that tell us?”
The key insights:
LLMs shifted the build cost down. That’s real and significant. Some things that used to be “buy” are now “build” because the upfront investment dropped. The recipe suggestion engine is the clearest example – prohibitively expensive to build before LLMs, entirely feasible now.
But maintenance cost didn’t change. The courier API still changes twice a year. The edge cases still emerge. The new developer still needs to understand the code. The LLM helps you build version one. It doesn’t maintain versions two through twenty.
Build what differentiates you, buy what’s commodity. The Wardley Map tells you which is which. Plot it, discuss it, decide. Then revisit the map when circumstances change.
And the hardest lesson: the fact that you can build something doesn’t mean you should. Tom’s weekend prototype was technically excellent. It was also strategically wrong. The best developers are the ones who know which problems are worth solving with code and which problems are worth solving with a credit card.
The team has bounded contexts, decision tables, ADRs, and a Wardley Map. The architecture is solid. The strategic decisions have a framework. The technical foundations are in place.
But GreenBox is now two squads across two cities. Fifteen people. And the squads keep surprising each other. Perth changes the subscription API without telling Melbourne. Melbourne builds a notification system that duplicates Perth’s. The bounded contexts helped the code, but nothing yet has helped the squads pull in the same direction (coming 1 September).