Value Stream Mapping: Finding Where to Focus

Charlotte notices it first. She’s been reviewing the team’s sprint data and the numbers don’t add up.

“A story takes three weeks to get from ‘ready to build’ to ‘in production,’” she tells Maya on Monday morning. “Three weeks. When you were five people, it took three days.”

Maya frowns. “We’re bigger. More coordination. That’s normal.”

“Five times slower isn’t normal. It’s a symptom.”

Charlotte has seen this before. The team hasn’t become worse at building software. The bounded contexts made PRs smaller. The decision tables reduced substitution bugs. The ADRs help new developers understand the system. Everything that used to be hard is easier.

And yet stories take three weeks to ship. Nobody can explain why.

What is Value Stream Mapping?

Value Stream Mapping comes from lean manufacturing. Toyota used it to trace the flow of materials through a factory, measuring how long each step took and where things got stuck. The core insight: most of the time a product spends in a factory, it isn’t being worked on. It’s waiting. Sitting in a queue. Waiting for a handoff. Waiting for a decision.

A car takes 20 hours to assemble but spends 6 weeks in the factory. The assembly is the value. The 6 weeks minus 20 hours is waste.

Software delivery works the same way. Charlotte suggests the team map the flow of a single story from “ready to build” to “running in production.”

Mapping the engineering value stream

Charlotte facilitates. The whole team gathers around a whiteboard. She picks a recent story (the one Priya shipped last week, a small change to the delivery notification email). It took three weeks from start to finish. The code change itself was thirty-two lines.

“Walk me through what happened,” Charlotte says. “Every step. Every handoff. Every wait.”

Tom starts. Priya fills in details. Sam adds the operational steps. The timeline builds left to right across the whiteboard.

Example Map 25 min work
→
Wait for Sprint 4 days wait
→
Build 2 days work
→
Wait for Review 3 days wait
→
Review 1 hr work
→
Wait for Staging 4 days wait
→
Test on Staging 2 hrs work
→
Wait for Deploy 5 days wait
→
Deploy 20 min work
→
Wait for Maya 3 days wait
→
Verify Live 30 min work

Charlotte adds up the numbers in front of everyone.

Total work time: about 2.5 days. The Example Map (25 minutes), the build (2 days), the code review (1 hour), the staging test (2 hours), the deploy (20 minutes), the production verification (30 minutes).

Total wait time: about 19 days. Waiting for the sprint to start (4 days). Waiting for someone to review the PR (3 days). Waiting for the single staging environment to be free (4 days). Waiting for a deploy slot; they only deploy on Tuesdays and Thursdays, and the queue is three deploys deep (5 days). Waiting for Maya to verify the substitution rule changes work correctly with real data (3 days).

Charlotte writes two numbers on the whiteboard. She circles them in red.

Work: 2.5 days. Wait: 19 days.

The room goes quiet.

The revelation

“The code takes two days,” Charlotte says. “Everything else, waiting for review, waiting for staging, waiting for a deploy slot, waiting for Maya, takes nineteen. The bottleneck isn’t engineering. It’s the process around engineering.”

Tom stares at the board. “We spent three weeks improving the bounded contexts and the decision tables. We thought we were fixing the delivery problem. We were fixing the wrong thing.”

“No,” Charlotte says. “The architecture work was necessary. Without it, the two days of coding would have been four, and the review would have been impossible. You fixed the work. Now you need to fix the waiting.”

She breaks down each wait:

Wait for sprint (4 days). Stories are only picked up at the start of a sprint. If a story is ready on Tuesday, it sits until Monday. The sprint cadence that worked beautifully at five people has become a gate at eighteen.

Wait for review (3 days). Tom reviews most PRs. He’s also building features, attending meetings, and helping new developers. His review queue is consistently three to five PRs deep. Kai and Priya can review each other’s code, but Tom still reviews everything in the Supply Matching and Commercial contexts because he wrote the original code.

“That’s me being a bottleneck,” Tom says. He’s heard this before; Charlotte identified him as a bottleneck in the code review process the same way Lee identified Maya as a bottleneck in the matching process. Same pattern, different person.

Wait for staging (4 days). There’s one staging environment. When someone is using it, everyone else queues. Deploying to staging takes twenty minutes and involves reseeding the database. Melbourne and Perth teams compete for the same environment.

Wait for deploy (5 days). Deploys happen twice a week, Tuesday and Thursday. The queue is first-come, first-served. A story that’s ready on Wednesday waits until Thursday. If the queue is full, it waits until Tuesday. The deploy process itself takes twenty minutes and requires Tom’s involvement.

Wait for Maya (3 days). Changes to the substitution rules need Maya’s approval because the decision tables still require her judgement for edge cases. She’s managing farm relationships, running two cities’ operations, and preparing board presentations. A change sits in her approval queue until she finds time.

Priya speaks up. “Each of these waits seems reasonable on its own. Three days for review isn’t crazy. Four days for staging isn’t crazy. But they compound. Five reasonable waits stack into nineteen unreasonable days.”

“That’s the insight VSM gives you,” Charlotte says. “You can’t see the compounding from inside any single step. You can only see it when you map the whole flow.”

What the map tells the team to fix

Charlotte doesn’t prescribe solutions. She asks the team to look at each wait and ask: what could we change to reduce it?

Sprint gating. Priya suggests pulling stories into the sprint mid-cycle instead of only at planning. “If a developer finishes a story on Wednesday and the next priority is ready, why wait until Monday?” Charlotte agrees: the sprint cadence should be a planning rhythm, not a work-start gate.

Review bottleneck. Kai offers to take over reviews for the Fulfilment context. Tom agrees to review only the Commercial context. They document the review ownership in the team’s ADRs. The review queue should halve.

Single staging environment. Tom volunteers to create a second staging environment. “It’s a Saturday morning of work. The infrastructure is already templated.” Charlotte nods: “Cheap to build, expensive not to have.”

Deploy queue. The twice-a-week deploy schedule was set when deploys were risky and manual. With the CI pipeline, tests, and feature flags, deploys could happen daily. Tom proposes deploying every day the pipeline is green. Charlotte: “That’s a culture change, not just a process change. The team needs to trust the pipeline.”

Maya’s approval. This is the hardest one. Maya is a bottleneck for the same reason she was a bottleneck in the first year: she conflates caring with doing. The decision tables captured most of her substitution knowledge, but she still reviews every change personally. Charlotte is direct: “You built the decision tables so Anika could run Melbourne without you. But you’re still reviewing every table change. At what point do you trust the tables?”

Maya is quiet for a moment. “I’ll trust the tables when they’ve been right for a month without my correction.”

“Fair,” Charlotte says. “Then let’s track that. If the tables produce correct substitutions for four consecutive weeks without your intervention, the approval step goes away.”

The numbers after

The team implements the changes over three weeks. Not all at once; Charlotte insists on one change per week so they can measure the impact.

Week one: mid-sprint story pulling. Wait-for-sprint drops from 4 days to 1 day.

Week two: split reviews and second staging environment. Wait-for-review drops from 3 days to 1.5. Wait-for-staging drops from 4 days to 1.

Week three: daily deploys when the pipeline is green. Wait-for-deploy drops from 5 days to 1.

Or it should.

The Friday fear

Two weeks into the daily deploy experiment, Ravi ships a change on a Friday afternoon. It passes CI, passes the staging check, and goes live at 3:52 PM. By 4:30, three Melbourne subscribers are getting duplicate delivery notifications. Sam’s phone starts buzzing. The bug is small (a race condition in the event handler that only triggers under load) but it takes Ravi ninety minutes to diagnose and fix. By the time the hotfix ships, it’s past six. Sam has already emailed apologies. Ravi spends the weekend quietly furious with himself.

On Monday morning, Tom draws a line. “No deploys after 2 PM on Fridays.”

Nobody argues. It feels responsible. If something breaks on Friday afternoon, you’re firefighting into the weekend. Nobody wants to be the person who ruined Sam’s Saturday. The rule is informal but absolute. Within a fortnight, it’s hardened: no deploys on Fridays at all. Then someone suggests no deploys on Thursday afternoon either, “just to be safe.” The deploy window is quietly shrinking.

Charlotte sees the VSM numbers drift. Lead time, which dropped to 10 days, is creeping back up. She maps it again.

The problem is inventory. When the team can’t deploy on Friday, stories finished on Thursday sit in a queue until Monday. But Monday is sprint planning. So they don’t deploy until Tuesday. That’s four days of finished work sitting in a pile: tested, reviewed, approved, not shipped. And the pile creates its own problems: by Tuesday, three or four changes are queued up. Deploying them all at once is riskier than deploying each one individually. If something breaks, you have to figure out which of the four changes caused it. The “safety” rule has made deploys less safe, not more.

Meanwhile, developers subconsciously cram work into Thursday morning. The code is a bit more rushed. Reviews are a bit less thorough. “Let’s get this in before the Friday cutoff.” Thursday becomes the highest-risk deploy day of the week, the exact opposite of what the rule intended.

Charlotte puts the updated VSM on the wall next to the original. “The no-Friday rule added 1.5 days of wait time back into the pipeline. Your lead time went from 10 days back to 11.5. And your Thursday deploys are now the riskiest of the week because you’re batching.”

Tom pushes back. “Ravi’s Friday bug cost Sam her evening.”

“It did. And that’s a real problem. But the answer isn’t ‘deploy less.’ The answer is ‘make deploying safe enough that the day doesn’t matter.’”

She writes three questions on the whiteboard. Not solutions. Questions.

How would we know something was wrong before a subscriber told us?

How fast could we undo a deploy that went bad?

Whose job is it when something breaks at 4pm on a Friday?

“Right now the answers are ‘we wouldn’t,’ ‘ninety minutes if Ravi happens to be at his desk,’ and ‘whoever feels guiltiest,’” Charlotte says. “That’s why Friday feels dangerous. It isn’t the day. You’re deploying without a way to see trouble coming, without a quick way back, and without a named person to catch it. Fix those and the calendar stops mattering.”

Tom looks at the three questions. “That’s a lot of building.”

“It is. It’s its own piece of work, and it deserves to be done properly, not squeezed into the corners of a sprint.” She caps the marker. “In the meantime: take the calendar rule down. The map says it’s costing more than it saves. Batching four changes into a Tuesday is riskier than shipping one on a Friday, and you all know which morning produces your most rushed code now.”

The no-Friday rule comes down that afternoon. Not because Friday deploys became safe; they didn’t. Because the team can finally see that the rule never made them safer, only later and bigger. Ravi ships a small change the following Friday at 11am, keeps half an eye on Sam’s inbox for the rest of the day, and goes home on time. Nothing breaks. Everyone understands that nothing breaking proves nothing.

Maya’s approval wait stays at 3 days. She’s tracking the decision tables but they haven’t hit the four-week mark yet. Charlotte lets it stand. “The data will decide, not an argument.”

Total lead time after the changes: work (2.5 days) + waiting (about 7.5) = roughly 10 days. Down from three weeks. The no-Friday rule briefly pushed it back towards 12 before the second map caught it, which is its own lesson: every rule that adds a queue has to keep earning its place, because queues compound quietly.

Charlotte’s three questions stay on the whiteboard, unanswered. They’ll stay there until a subscriber’s 3am tweet answers the first one the hard way; that story is a few weeks away.

Tom adds a lead-time chart to the team’s wall display. It’s basic: average lead time from story start to production, updated weekly. Priya’s chart ritual from the early sprints now includes a second line: subscriber count AND lead time. Both visible. Both tracked.

Lee, who dials in occasionally for strategic sessions, mentions during this one that “at scale, you’ll need people who weren’t in this room to understand these flows.” It’s an echo of something he said during the first Event Storm, nearly two years before. The value stream map, like the laminated Event Storm photos, becomes a reference the team returns to.

When to use Value Stream Mapping

When lead time has increased and nobody knows why. VSM reveals whether the bottleneck is in the work or in the waiting. It’s almost always in the waiting.
When the team feels busy but nothing ships. High utilisation plus low throughput is a queue problem. VSM makes the queues visible.
When you’ve improved the work but delivery hasn’t improved. Greenbox improved their code quality with bounded contexts and decision tables. Delivery didn’t speed up because the bottleneck was in the process, not the code.

When not to use it

When the problem is obvious. If everyone knows deploys take three hours and that’s the bottleneck, fix the deploys. You don’t need a map to tell you what you already know.
When the team is too small. Five people with one value stream don’t need to map it. They can see the whole flow by looking left and right.

What comes next

The value stream map revealed one more thing. The staging environments, the deploy pipeline, the lead-time chart: these are all things Tom and Priya built by hand, one at a time, as the need arose. Sam’s delivery tracking is a spreadsheet with seventeen tabs and colour codes that only Sam understands. The courier coordination is a combination of email, phone calls, and hope.

Some of these things should be built. Some should be bought. Some should be automated with an LLM. The team is about to face the classic build-vs-buy question, and Tom has already scoped the delivery tracking system in his head.

Charlotte has a different question: “Before you decide how to build it, shouldn’t you decide whether to build it?” That’s Wardley Mapping, and the answer isn’t what Tom expects.

The next chapter, Wardley Mapping: Build, Buy, or Borrow?, publishes around 21 July.