Ensemble Programming: The Team Navigates, the LLM Types

Greenbox is a produce-box company with 5,800 subscribers across Perth and Melbourne, now expanding to Brisbane. Three squads, twenty-five people, and a substitution engine that’s about to get its most complex upgrade yet, built by one developer working alone, with predictable results.

Greenbox has 5,800 subscribers. Three squads. Twenty-five people. Three cities. And a substitution engine that’s about to get a lot more complicated.

The Perth squad has picked up a major upgrade: seasonal rules, allergen combinations, and customer preference learning. It’s the most complex code in the system, and it touches every bounded context they’ve drawn.

Tom volunteers to build it.

The solo sprint

Tom is still one of the best developers in the organisation. He opens a session with Claude and starts prompting.

The first afternoon is electric. Tom hasn’t felt this way in months, maybe since the early weeks, before the workshops and the retros and the cadences. Just him and the machine, building. The LLMLLMA neural network trained to predict the next token in a sequence, large enough that it generalises to tasks it wasn’t explicitly trained for. generates a seasonal availability model and Tom reshapes it, tightens the types, adds constraints. He works through Ava’s bedtime and Leo’s story and doesn’t hear Sarah come into his office at ten o’clock.

“You’ve got that look,” she says from the doorway.

“What look?”

“The one you had when you first started at Greenbox. When you told me it was the most productive week you’d ever had.”

Day two is even better. He builds the allergen cross-referencing module, the preference learning system, the feedback loop. The code is elegant. By five o’clock he has 2,000 lines of generated code that compiles, passes tests, handles seasonal availability, cross-references allergen profiles, and learns from customer feedback. He opens a pull request feeling proud.

Kai reviews it. He stares at the PR for forty minutes and sends Tom a message: “I can read each function but I can’t follow the logic.”

Ravi has a different concern: “This touches two bounded contexts. Did we check the contracts?”

Maya looks at the seasonal rules and spots something immediately: “In winter, never substitute sweet potato for pumpkin, they’re both in season, so if we’re short on one, we’re short on both.”

Three people, three categories of bug, all from the same root cause: Tom was the only person thinking when the code was written.

Charlotte looks at the PR comments and says, “We’ve been here before. Week one vibes.”

The words land on Tom like cold water. He closes his laptop and stares at the wall. The framed print of his first merged pull request hangs above his monitor.

That evening, Sarah asks how his day was. Tom loads the dishwasher while the kids argue in the next room. “Charlotte was right. She keeps being right and I keep needing to hear it twice.”

Sarah dries her hands on the tea towel. “At least you hear it. Your dad never did.”

Tom picks up his phone and texts Priya: I think I needed to learn this lesson twice.

Priya replies at eleven: The good news is you learned it.

The ensemble idea

Charlotte suggests ensemble programming, mob programming, but with the LLM as the driver.

“In traditional mob programming, the bottleneck was typing speed. One person at the keyboard, everyone else waiting. That’s why a lot of teams gave up on it. But if the LLM types, there’s no bottleneck at the keyboard. The only bottleneck is how fast the team can think.”

Six people. One laptop on a large screen, running Claude. Everyone sees the code as it’s generated. Navigator rotation every ten minutes. Anyone can call “stop” if they spot something wrong.

The first attempt

The first session is awkward.

Maya starts as navigator. She describes the seasonal substitution rules to the LLM. The code it generates is clean. The domain logic is correct.

Tom takes over. He instructs the LLM to integrate the seasonal model into the existing substitution pipeline.

Kai catches a boundary violation before his rotation. Charlotte holds up a hand: “Write it on a sticky note. You’re next.” When his turn comes, he explains the bounded context issue and the LLM restructures the code to communicate through the existing boundary.

This is the moment where the ensemble justifies itself. In Tom’s solo session, the boundary violation would have gone unnoticed until code review, two days of asynchronous back-and-forth compressed into thirty seconds of real-time conversation.

Priya’s turn. She’s been accumulating sticky notes. “What about a customer who’s allergic to nuts and the best seasonal substitute is a nut? Where does allergen filtering happen relative to seasonal filtering?”

Unavailable Items

→

Seasonal Filter

→

Allergen Filter

→

Preference Ranking

→

Ranked Substitutes

Maya looks at the pipeline: “The order is wrong. Allergen filtering should happen first. If we filter by season first, we might eliminate a safe substitute. Filter the dangerous stuff first, then rank what’s left.”

The LLM swaps the order. That’s a domain insight that wouldn’t have surfaced in Tom’s solo session. Tom doesn’t think about allergens the way Maya does, and Maya doesn’t think about filter ordering the way a developer does. It took both perspectives in the same room.

Finding the rhythm

The first session produces about 400 lines, less than Tom’s solo effort in raw volume. But every line has been seen by six pairs of eyes. The domain logic is correct because Maya was there. The architecture respects boundaries because Kai and Ravi were there. The edge cases are covered because Priya was there.

By the third session, the navigators have learned to scope their instructions to fit a ten-minute rotation. Maya stops saying “handle the case where two items are both in season” and starts saying “add a co-seasonality check: if the unavailable item and the candidate share the same growing season in the same region, exclude the candidate.” The more precise the instruction, the better the generated code.

The team learns to describe behaviour rather than implementation. “Generate a function that takes candidate substitutes and a customer’s allergen profile, and returns only safe candidates.” That gives the LLM freedom to choose the implementation while constraining the outcome. Maya’s instructions are often the cleanest because they’re the most abstract, she can’t describe code, but she can describe what should happen. Simpler instructions produce simpler code.

Solo + LLM

One developer thinks
LLM types
Others review later
Bugs found in review

Ensemble + LLM

Whole team thinks
LLM types
Everyone sees it live
Issues caught immediately

The PR from the ensemble session had zero review comments. Not because people were being polite, because every concern had been raised during the session. The review happened live.

When it doesn’t work

Anika tried running an ensemble for a routine bug fix, a timezone conversion error. Four people, thirty minutes, a three-line fix.

“That was a waste of everyone’s time,” she said afterwards.

The team settles into a split: complex features crossing bounded contexts get ensemble sessions. Routine work gets solo development with standard review. Roughly 30/70.

Situation	Approach
New feature crossing bounded contexts	Ensemble
Complex domain logic (substitutions, pricing)	Ensemble
Onboarding to a complex area	Ensemble (new person observes)
Routine bug fix, known root cause	Solo + review
Experimental prototype	Solo

Tom’s conversion

Tom becomes one of the ensemble’s best navigators. His instructions to the LLM are surgically precise. But he also learns something about his blind spots.

“When I work alone, I optimise for elegance,” he tells Charlotte one afternoon. “In the ensemble, I can see that cleverness is a tax on everyone else’s understanding. Maya doesn’t care about clever code. She cares about code that does the right thing.”

He pauses. “Sarah told me something once. She said I love making things, but I hate letting anyone help me make them. She said I’m like my dad.” He looks at Charlotte. “My dad builds houses. He’s good at it. But every subcontractor he’s ever worked with has a story about Marco Russo standing over their shoulder.”

Charlotte waits.

“I don’t want to be that person. The ensemble is the opposite of that. It’s me trusting that the room is smarter than I am. Which it is.”

The result

The substitution engine ships two weeks after the first ensemble session. Seasonal rules, allergen combinations, preference learning. Every developer in Perth understands how it works. Melbourne sat in on the final session for when they implement Melbourne-specific rules.

Maya reviews the production output after the first week. Substitution quality is noticeably better. Fewer complaints. No repeats of the sweet-potato-for-pumpkin mistake.

“The LLM wrote the code,” she says. “But the team wrote the thinking.”

The ensemble sessions are working. So is every other discovery technique the team has learned. The problem is that they’re now using all of them for everything, including stories where everyone already knows the answer. Workshop fatigue is setting in. Anika sent Charlotte a long message last week, unusual for her, about a Melbourne Example Mapping session that spent twenty-five minutes on a story the team could have built in their sleep.

The teams are spending more energy going through the motions than doing the work that needs deep thinking. Charlotte introduces Cynefin, the framework that tells you which approach to use and when to stop over-thinking.

The next chapter, Cynefin: Not Everything Needs a Workshop, publishes around 22 September.