Value Stream Mapping: Finding Where to Focus

August 11, 2026 · 15 min read

Greenbox has 5,200 subscribers across Perth and Melbourne. Eighteen people. Bounded contexts drawn, decision tables capturing Maya’s substitution expertise, ADRs preserving institutional memory. The architecture is cleaner than it’s ever been. But something has gone wrong with the delivery of software itself.

Charlotte notices it first. She’s been reviewing the team’s sprint data and the numbers don’t add up.

“A story takes three weeks to get from ‘ready to build’ to ‘in production,’” she tells Maya on Monday morning. “Three weeks. When you were five people, it took three days.”

Maya frowns. “We’re bigger. More coordination. That’s normal.”

“Five times slower isn’t normal. It’s a symptom.”

Charlotte has seen this before. The team hasn’t become worse at building software. The bounded contexts made PRs smaller. The decision tables reduced substitution bugs. The ADRs help new developers understand the system. Everything that used to be hard is easier.

And yet stories take three weeks to ship. Nobody can explain why.

What is Value Stream Mapping?

Value Stream Mapping comes from lean manufacturing. Toyota used it to trace the flow of materials through a factory, measuring how long each step took and where things got stuck. The core insight: most of the time a product spends in a factory, it isn’t being worked on. It’s waiting. Sitting in a queue. Waiting for a handoff. Waiting for a decision.

A car takes 20 hours to assemble but spends 6 weeks in the factory. The assembly is the value. The 6 weeks minus 20 hours is waste.

Software delivery works the same way. Charlotte suggests the team map the flow of a single story from “ready to build” to “running in production.”

Mapping the engineering value stream

Charlotte facilitates. The whole team gathers around a whiteboard. She picks a recent story — the one Priya shipped last week, a small change to the delivery notification email. It took eighteen days from start to finish. The code change itself was thirty-two lines.

“Walk me through what happened,” Charlotte says. “Every step. Every handoff. Every wait.”

Tom starts. Priya fills in details. Sam adds the operational steps. The timeline builds left to right across the whiteboard.

  1. Example Map 25 min work
  2. Wait for Sprint 4 days wait
  3. Build 2 days work
  4. Wait for Review 3 days wait
  5. Review 1 hr work
  6. Wait for Staging 4 days wait
  7. Test on Staging 2 hrs work
  8. Wait for Deploy 5 days wait
  9. Deploy 20 min work
  10. Wait for Maya 3 days wait
  11. Verify Live 30 min work

Charlotte adds up the numbers in front of everyone.

Total work time: about 2.5 days. The Example Map (25 minutes), the build (2 days), the code review (1 hour), the staging test (2 hours), the deploy (20 minutes), the production verification (30 minutes).

Total wait time: about 19 days. Waiting for the sprint to start (4 days). Waiting for someone to review the PR (3 days). Waiting for the single staging environment to be free (4 days). Waiting for a deploy slot — they only deploy on Tuesdays and Thursdays, and the queue is three deploys deep (5 days). Waiting for Maya to verify the substitution rule changes work correctly with real data (3 days).

Charlotte writes two numbers on the whiteboard. She circles them in red.

Work: 2 days. Wait: 19 days.

The room goes quiet.

The revelation

“The code takes two days,” Charlotte says. “Everything else — waiting for review, waiting for staging, waiting for a deploy slot, waiting for Maya — takes nineteen. The bottleneck isn’t engineering. It’s the process around engineering.”

Tom stares at the board. “We spent three weeks improving the bounded contexts and the decision tables. We thought we were fixing the delivery problem. We were fixing the wrong thing.”

“No,” Charlotte says. “The architecture work was necessary. Without it, the two days of coding would have been four, and the review would have been impossible. You fixed the work. Now you need to fix the waiting.”

She breaks down each wait:

Wait for sprint (4 days). Stories are only picked up at the start of a sprint. If a story is ready on Tuesday, it sits until Monday. The sprint cadence that worked beautifully at five people has become a gate at fifteen.

Wait for review (3 days). Tom reviews most PRs. He’s also building features, attending meetings, and helping new developers. His review queue is consistently three to five PRs deep. Kai and Priya can review each other’s code, but Tom still reviews everything in the Supply Matching and Commercial contexts because he wrote the original code.

“That’s me being a bottleneck,” Tom says. He’s heard this before — Charlotte identified him as a bottleneck in the code review process the same way Lee identified Maya as a bottleneck in the matching process. Same pattern, different person.

Wait for staging (4 days). There’s one staging environment. When someone is using it, everyone else queues. Deploying to staging takes twenty minutes and involves reseeding the database. Melbourne and Perth teams compete for the same environment.

Wait for deploy (5 days). Deploys happen twice a week — Tuesday and Thursday. The queue is first-come, first-served. A story that’s ready on Wednesday waits until Thursday. If the queue is full, it waits until Tuesday. The deploy process itself takes twenty minutes and requires Tom’s involvement.

Wait for Maya (3 days). Changes to the substitution rules need Maya’s approval because the decision tables still require her judgement for edge cases. She’s managing farm relationships, running two cities’ operations, and preparing board presentations. A change sits in her approval queue until she finds time.

Priya speaks up. “Each of these waits seems reasonable on its own. Three days for review isn’t crazy. Four days for staging isn’t crazy. But they compound. Five reasonable waits stack into nineteen unreasonable days.”

“That’s the insight VSM gives you,” Charlotte says. “You can’t see the compounding from inside any single step. You can only see it when you map the whole flow.”

What the map tells the team to fix

Charlotte doesn’t prescribe solutions. She asks the team to look at each wait and ask: what could we change to reduce it?

Sprint gating. Priya suggests pulling stories into the sprint mid-cycle instead of only at planning. “If a developer finishes a story on Wednesday and the next priority is ready, why wait until Monday?” Charlotte agrees — the sprint cadence should be a planning rhythm, not a work-start gate.

Review bottleneck. Kai offers to take over reviews for the Fulfilment context. Tom agrees to review only the Commercial context. They document the review ownership in the team’s ADRs. The review queue should halve.

Single staging environment. Tom volunteers to create a second staging environment. “It’s a Saturday morning of work. The infrastructure is already templated.” Charlotte nods: “Cheap to build, expensive not to have.”

Deploy queue. The twice-a-week deploy schedule was set when deploys were risky and manual. With the CI pipeline, tests, and feature flags, deploys could happen daily. Tom proposes deploying every day the pipeline is green. Charlotte: “That’s a culture change, not just a process change. The team needs to trust the pipeline.”

Maya’s approval. This is the hardest one. Maya is a bottleneck for the same reason she was a bottleneck in Series 1 — she conflates caring with doing. The decision tables captured most of her substitution knowledge, but she still reviews every change personally. Charlotte is direct: “You built the decision tables so Anika could run Melbourne without you. But you’re still reviewing every table change. At what point do you trust the tables?”

Maya is quiet for a moment. “I’ll trust the tables when they’ve been right for a month without my correction.”

“Fair,” Charlotte says. “Then let’s track that. If the tables produce correct substitutions for four consecutive weeks without your intervention, the approval step goes away.”

The numbers after

The team implements the changes over three weeks. Not all at once — Charlotte insists on one change per week so they can measure the impact.

Week one: mid-sprint story pulling. Wait-for-sprint drops from 4 days to 1 day.

Week two: split reviews and second staging environment. Wait-for-review drops from 3 days to 1.5. Wait-for-staging drops from 4 days to 1.

Week three: daily deploys when the pipeline is green. Wait-for-deploy drops from 5 days to 1.

Or it should.

The Friday fear

Two weeks into the daily deploy experiment, Ravi ships a change on a Friday afternoon. It passes CI, passes the staging check, and goes live at 3:47 PM. By 4:30, three Melbourne subscribers are getting duplicate delivery notifications. Sam’s phone starts buzzing. The bug is small (a race condition in the event handler that only triggers under load) but it takes Ravi ninety minutes to diagnose and fix. By the time the hotfix ships, it’s past six. Sam has already emailed apologies. Ravi spends the weekend quietly furious with himself.

On Monday morning, Tom draws a line. “No deploys after 2 PM on Fridays.”

Nobody argues. It feels responsible. If something breaks on Friday afternoon, you’re firefighting into the weekend. Nobody wants to be the person who ruined Sam’s Saturday. The rule is informal but absolute. Within a fortnight, it’s hardened: no deploys on Fridays at all. Then someone suggests no deploys on Thursday afternoon either, “just to be safe.” The deploy window is quietly shrinking.

Charlotte sees the VSM numbers drift. Lead time, which dropped to 10 days, is creeping back up. She maps it again.

The problem is inventory. When the team can’t deploy on Friday, stories finished on Thursday sit in a queue until Monday. But Monday is sprint planning. So they don’t deploy until Tuesday. That’s four days of finished work sitting in a pile: tested, reviewed, approved, not shipped. And the pile creates its own problems: by Tuesday, three or four changes are queued up. Deploying them all at once is riskier than deploying each one individually. If something breaks, you have to figure out which of the four changes caused it. The “safety” rule has made deploys less safe, not more.

Meanwhile, developers subconsciously cram work into Thursday morning. The code is a bit more rushed. Reviews are a bit less thorough. “Let’s get this in before the Friday cutoff.” Thursday becomes the highest-risk deploy day of the week, the exact opposite of what the rule intended.

Charlotte puts the updated VSM on the wall next to the original. “The no-Friday rule added 1.5 days of wait time back into the pipeline. Your lead time went from 10 days back to 11.5. And your Thursday deploys are now the riskiest of the week because you’re batching.”

Tom pushes back. “Ravi’s Friday bug cost Sam her evening.”

“It did. And that’s a real problem. But the answer isn’t ‘deploy less.’ The answer is ‘make deploying safe enough that the day doesn’t matter.’”

She writes four things on the whiteboard:

Canary deploys. Don’t send a change to all subscribers at once. Route 5% of traffic to the new version, watch the error rates and latency for fifteen minutes, then promote to 100%. If the canary shows problems, roll back automatically. Ravi’s race condition would have affected fifteen subscribers instead of three hundred, and the rollback would have been automatic.

Observability, not just monitoring. The team has uptime checks and a response-time dashboard. That’s monitoring; it tells you whether things are working. They need observability: the ability to ask why something isn’t working, after the fact, without having to reproduce it. Structured logging. Distributed tracing. Error tracking with context. When Ravi’s bug hit, he spent ninety minutes diagnosing because the logs didn’t have enough context. With structured traces, he’d have seen the race condition in the first stack trace.

Automated rollback. If the error rate spikes above a threshold within fifteen minutes of a deploy, roll back without waiting for a human to notice. The CI/CD pipeline should be able to undo what it just did. Tom’s rollback capability from the Impact Mapping post, the one-line change to his deploy script, was the seed. This is the grown-up version.

On-call rotation. If someone deploys on Friday and something breaks, it shouldn’t fall on whoever happens to be near their phone. A formal on-call rotation means there’s always a named person responsible, with clear escalation paths, and they’re compensated for it. The fear of Friday deploys is partly the fear of an unstructured, uncompensated obligation to fix things on your own time.

Tom looks at the list. “So we’re building an SRE practice.”

“You’re building the confidence to deploy whenever the code is ready,” Charlotte says. “Friday, Saturday, 3 AM: it shouldn’t matter. If your pipeline is good enough, the day of the week is irrelevant.”

It takes six weeks. Kai builds the canary deploy mechanism: route a percentage of traffic to the new version, compare error rates against the baseline, auto-promote or auto-rollback. Priya adds structured tracing with correlation IDs so any request can be followed from the API gateway to the database and back. Tom sets up the error-rate threshold with automated rollback. Charlotte establishes the on-call rotation: two people per week, one primary and one secondary, with a day off in lieu the following week.

The first Friday deploy under the new system goes live at 4:15 PM. The canary routes 5% of traffic. Error rates are flat. After fifteen minutes, it auto-promotes to 100%. Nobody’s phone buzzes. Sam’s Saturday is her own.

Tom, watching the dashboard: “That was anticlimactic.”

Charlotte: “Anticlimactic deploys are the goal.”

Maya’s approval wait stays at 3 days. She’s tracking the decision tables but they haven’t hit the four-week mark yet. Charlotte lets it stand. “The data will decide, not an argument.”

Total lead time after all changes: work (2 days) + waiting (6 days) = roughly 8 days. Down from 21. The no-Friday rule nearly pushed it back to 12, but the SRE work pulled it lower than the original daily-deploy target. Making deploys safe didn’t just remove the Friday fear. It made every deploy safer, every day of the week.

Tom adds a response-time dashboard to the team’s monitoring. It’s basic — average lead time from story start to production, updated weekly. Priya’s chart ritual from the early sprints now includes a second line: subscriber count AND lead time. Both visible. Both tracked.

Lee, who dials in occasionally for strategic sessions, mentions during this one that “at scale, you’ll need people who weren’t in this room to understand these flows.” It’s an echo of something he said months ago during the first Event Storm. The value stream map, like the laminated Event Storm photos, becomes a reference the team returns to.

When to use Value Stream Mapping

  • When lead time has increased and nobody knows why. VSM reveals whether the bottleneck is in the work or in the waiting. It’s almost always in the waiting.
  • When the team feels busy but nothing ships. High utilisation plus low throughput is a queue problem. VSM makes the queues visible.
  • When you’ve improved the work but delivery hasn’t improved. Greenbox improved their code quality with bounded contexts and decision tables. Delivery didn’t speed up because the bottleneck was in the process, not the code.

When not to use it

  • When the problem is obvious. If everyone knows deploys take three hours and that’s the bottleneck, fix the deploys. You don’t need a map to tell you what you already know.
  • When the team is too small. Five people with one value stream don’t need to map it. They can see the whole flow by looking left and right.

What comes next

The value stream map revealed one more thing. The staging environment, the deploy pipeline, the monitoring — these are all things Tom and Priya built by hand, one at a time, as the need arose. Sam’s delivery tracking is a spreadsheet with seventeen tabs and colour codes that only Sam understands. The courier coordination is a combination of email, phone calls, and hope.

Some of these things should be built. Some should be bought. Some should be automated with an LLM. The team is about to face the classic build-vs-buy question — and Tom has already scoped the delivery tracking system in his head.

Charlotte has a different question: “Before you decide how to build it, shouldn’t you decide whether to build it?” That’s Wardley Mapping, and the answer isn’t what Tom expects.

The next chapter, Wardley Mapping: Build, Buy, or Borrow?, publishes around 18 August.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.