API Contracts: Two Squads, One Direction

The Slack message arrives at 7:09am on a Wednesday.

Anika, Melbourne squad lead: “Our farm reconciliation is broken. The subscription API is returning a different payload than last week. Did something change?”

Tom, Perth squad lead, replies twenty minutes later: “Oh. Yeah. We shipped the pause feature yesterday. Had to change the subscription endpoint to support pause states. Sorry, was in our sprint plan but I didn’t think it’d affect you.”

Anika: “It affected us.”

Melbourne’s farm reconciliation polls the subscription API every morning to match active subscribers with available produce. Perth’s change added a new pause_state field and restructured the response. Melbourne’s service expected the old format. The reconciliation ran, got malformed data, and silently produced incorrect box allocations for 340 Melbourne subscribers.

Sam’s phone rings at 9:03am.

Mrs Patterson. Subscribed since the very first box, whose beetroot preference Maya added to the decision tables by hand, whose loyalty has been a quiet constant through every crisis. Mrs Patterson, who has just opened a box containing capsicum despite her nightshade allergy flag.

She isn’t angry, which somehow makes it worse. She’s quiet. “I’ve trusted you with this,” she tells Sam. “I just need to know I still can.”

“You can. This won’t happen again.” Sam takes the details, promises a replacement box by the end of the day, and rings the warehouse to pull one together by hand, every item checked against Mrs Patterson’s profile. Then she checks the morning’s reconciliation output and finds two more boxes with the same scrambled preference flags. The substitution engine worked perfectly; it just received the wrong inputs. Three subscribers received produce they’d explicitly flagged as allergens.

Sam calls the other two. The first is upset but stays. The second, a Melbourne subscriber with six weeks’ tenure, cancels on the phone. “I can’t risk it. I have a child with nut allergies. If this had been nuts instead of capsicum…”

She doesn’t finish the sentence.

Then Sam calls Maya, who goes quiet for long enough that Sam checks the line is still open.

Priya fixes the integration in two hours. Twelve lines of code.

She doesn’t say much while she’s coding. But when the fix is deployed and the tests pass, she does something nobody has seen before. She gets angry.

She messages Anika: “This should never have happened. A schema change to a shared API with no consumer notification, no contract test, no versioning. Three subscribers got allergens in their boxes.”

Anika: “I agree. Completely. What do you propose?”

“Contract testing. Every bounded context that publishes events or exposes an API writes a contract. Consumers write expectations against the contract. If a change breaks it, the build fails before the PR merges.”

Charlotte supports it publicly: “Priya’s right. This is the first time the architecture has hurt a subscriber.”

Priya adds the contract tests that afternoon. The mechanics are simple: each consumer checks in a set of expectations about the interfaces it depends on, the fields it reads, the types it assumes, the values it can handle, and the provider’s build replays those expectations against its real responses. If Perth’s pause-state change had run against Melbourne’s expectations, the build would have gone red on Tuesday afternoon, before the PR merged, instead of scrambling allergen flags on Wednesday morning. The tests add three minutes to the build. Charlotte: “Three minutes that prevent three hours of incident response.”

Maya sends Charlotte a message at midnight: “If this happens again with a serious allergy, we’re not just losing a subscriber. We’re in court.”

At half past midnight Tom is still in #incidents, reconstructing the timeline and answering every question in the thread. Sarah texts him a photo of his dinner, cling-wrapped on the kitchen bench; he sends back a thumbs up and keeps typing.

The post-mortem

Charlotte books the meeting room for Thursday morning. Both squads. She writes two words on the whiteboard before anyone arrives: Prime Directive.

“This post-mortem has one rule. Regardless of what we discover, we understand and truly believe that everyone did the best job they could, given what they knew at the time. This is about the system, not the people.”

Tom shifts in his chair.

“First: timeline.” Charlotte draws a horizontal line across the board.

Tom: “Perth API change shipped Tuesday, 4:47pm.” Anika: “Reconciliation ran Wednesday, 5:30am.” Sam: “Mrs Patterson’s phone call, 9:03am.” Priya: “Fix deployed 11:14am.”

Charlotte writes each timestamp. Fourteen hours between the change shipping and the fix going live. Four hours between bad data hitting production and someone noticing.

“Now: root cause. Why did three subscribers get allergens in their boxes?”

Anika: “Preference flags were scrambled.”

“Why?”

Priya: “Melbourne’s reconciliation got malformed data from the subscription API.”

“Why malformed?”

Tom, quietly: “We restructured the response for the pause feature.”

“Why didn’t Melbourne know?”

Silence. Then Anika: “No contract test. No process for cross-squad notification when a shared interface changes.”

Charlotte draws a box around the last answer. “Root cause: no mechanism for one squad to know what the other is changing.”

She turns to Tom. “You said you didn’t think the change would affect Melbourne. That’s honest. It’s also not the root cause. Nothing in our process would have caught this. If you hadn’t shipped the pause feature, someone else would have hit the same gap, different API, different squad, same result.”

Tom nods slowly.

“Contributing factors are different from root cause. No consumer notification, no API versioning, the reconciliation failing silently, contributing factors. They made it worse. But the root cause is structural: two squads changing shared interfaces with no way to detect breakage before it ships.”

She writes two columns: Actions and Owner.

Contract testing across bounded contexts. Priya. Already done, but Charlotte records it formally. Cross-squad notification for API changes. Tom and Anika, by next Friday. Reconciliation alerting on schema mismatches. Ravi, by end of sprint.

“Three actions. Not twelve. Three we’ll actually do.”

She photographs the whiteboard, emails it to both squads, and pins it in #incidents. It’s the first entry in what will become the incident log, the spreadsheet she’ll project at quarterly planning three months later.

The pattern

Charlotte tracks the incidents over the following month.

Cross-Squad Incidents: the first month

Week	Type	Description	Impact
1	Surprise	Perth API change breaks Melbourne reconciliation	340 wrong allocations, 1 cancellation
2	Duplication	Two notification systems built independently	~5 dev days wasted
3	Duplication	Both squads build subscriber preference export	~3 dev days wasted
4	Surprise	Melbourne adds produce categories Perth's decision tables don't cover	Substitution failures for new categories
4	Duplication	Both squads write farm reliability scoring	~4 dev days wasted

After the retro, everyone leaves. Charlotte stays.

She sits alone in the meeting room with the incident spreadsheet on the wall. Seven incidents in four weeks. Three of them in her bounded contexts, the clean architecture she’d drawn, the events she’d mapped, the contracts she’d specified. The structure was right. The coordination was missing.

She’s seen this pattern before. The meal kit company she coached had the same progression. Clean architecture. Growing team. Two squads that stopped talking. Small failures, then a big one.

She picks up her phone and texts Lee: “Am I repeating myself?”

Lee replies at 11:38pm. “The patterns repeat. You’re not. The difference is you’re catching it this time.”

Charlotte reads it twice. She’s not sure it’s enough. But it’s something.

The gap

Charlotte runs a cross-team retro. Both squads in the same room. Not a sprint retro, a retro about the space between the squads.

The insights come fast. Tom: “We never look at each other’s sprint plans.” Anika: “We assume if something’s inside our bounded context, it’s ours to change. But the events that flow between contexts are shared contracts.” Priya: “The notification duplication happened because we solved the same subscriber complaint independently.”

Three actions: a weekly cross-squad sync, a shared view of what each squad is working on, and a quarterly planning session to align direction before sprints begin.

Quarterly planning

Charlotte proposes a half-day session. Both squads in the same room.

Tom pushes back. “We already have sprint planning and retros. Are we really adding another meeting?”

“Sprint planning tells each squad what they’re doing. It doesn’t tell them what the other squad is doing.”

“So we read each other’s sprint plans.”

“When was the last time you read Melbourne’s sprint plan?”

Tom pauses. “I don’t think I ever have.”

The first quarterly planning day has five blocks:

Review last quarter. Each squad presents what they delivered and what surprised them. Tom’s surprise: the pause feature took three sprints instead of one (billing coupling from ADR-001). Anika’s: “We learned about Perth’s API change from a broken build.”

Charlotte projects the incident spreadsheet. Nobody had seen the full picture before.

Update the Impact Map. The goal is now 10,000 subscribers by the end of next year. Melbourne growth matters more than Perth retention. Perth is plateauing at around 2,500 while Melbourne is growing fast.

Propose quarterly themes. Tom: “Reduce churn below 3% through delivery experience.” Anika: “Launch Melbourne farm onboarding and reach 500 Melbourne-local subscribers.”

Maya spots something: “If Perth is on delivery experience and Melbourne is on farm onboarding, who’s merging the notification systems?” They assign it to Melbourne, with Perth deprecating their version. A cross-squad dependency, identified before it became a mid-sprint surprise.

Map dependencies. Charlotte pulls up the bounded context map.

Bounded Context	Perth	Melbourne	Coordination?
Subscription	Churn analytics	Onboarding flow	Yes
Billing	--	--	No
Supply Matching	Substitution comms	Farm onboarding	Yes
Fulfilment	Delivery windows	Notification merge	Yes

Three of four contexts are shared. Three potential collision points. Before this session, those would have been three mid-sprint surprises.

Commit. Two themes, three dependency points, three coordination owners. Tom owns the Subscription API contract. Anika owns the notification migration. Ravi owns the Supply Matching interface.

The weekly sync

Charlotte adds a fifteen-minute Monday sync between squad leads. Three things each: what we’re working on, what contexts we’re touching, anything that might affect the other squad.

Priya builds a lightweight script that feeds both squads’ sprint backlogs to an LLM and flags overlaps, any items from different squads touching the same bounded context. Not perfect, but it catches the obvious ones and gives Tom and Anika a starting point for the sync.

Tom is sceptical. The first sync surfaces a conflict that would have blown up mid-sprint. Perth is about to refactor the Subscription event format, which Melbourne’s onboarding consumes. Fifteen-minute conversation. They agree on backward-compatible changes.

“Fifteen minutes to prevent a two-hour outage,” Tom admits after the third week. “Fine. I’ll keep coming.”

Three months later

The second quarterly planning day. The difference is immediate.

First quarter (before)

4 cross-squad surprises
3 duplicated work
~12 dev days lost

Second quarter (after)

0 cross-squad surprises
0 duplicated work
~0 dev days lost

Tom, who resisted the quarterly planning three months ago, facilitates the dependency mapping block. He draws the bounded context grid without being asked. Anika fills in Melbourne’s column. They flag two coordination points and assign owners in ten minutes.

“I was wrong,” Tom tells Charlotte afterward. “I thought this was just another meeting. It’s the thing that makes the sprint meetings work across squads.”

The planning onion

Charlotte names the pattern at the quarterly retro: planning as nested layers.

The innermost layer is the daily standup. Next is sprint planning. Now they’ve added the quarterly layer, direction, dependencies, coordination owners.

“As you grow, you’ll need a yearly layer too. Strategic priorities. How each quarter contributes to the annual goals. The Wardley Map feeds into that layer.”

Each outer layer sets direction for the inner ones. The quarterly plan doesn’t dictate sprint content. It makes sure the sprints don’t collide.

Ravi asks: “What happens when we add a third squad?”

“The same structure scales. Three columns in the dependency map instead of two. Three leads in the weekly sync. But the discipline is the same: align on direction, map dependencies, assign coordination owners.”

What comes next

The squads are aligned. The dependencies are mapped. The quarterly planning gives direction without dictating sprint content. The weekly sync catches collisions before they become incidents. Zero cross-squad surprises in the second quarter, zero duplicated work, zero lost days.

Tom, who resisted every coordination practice Charlotte proposed, now facilitates the dependency mapping block without being asked. The architecture and the organisation match, not by accident, but by deliberate, sustained effort across two quarters.

It’s a good place to stop and breathe.

But Charlotte said something at the end of the Q4 retro that nobody quite caught. She was packing up her laptop, speaking quietly to Anika: “The challenges ahead won’t be technical or organisational. They’ll be human.”

She was right sooner than anyone expected. The next thing to break wasn’t a contract or a boundary or a squad drifting out of sync. It came at 3am, from a subscriber who opened her box and found something in it she’d told them she was allergic to, and from the fact that nobody at Greenbox was awake to see her say so. The coordination held. The people on the other end of the pager were the part nobody had designed for yet.

The next chapter, On-Call and Incident Response: When the Pager Goes Off, publishes around 28 July.