Outcome Over Output: The Feature Factory

Greenbox has scaled to 7,000 subscribers across three cities. The team is 28 people across multiple squads. The weekly discovery cadence that Charlotte built is still technically in place. But somewhere in the last quarter, the rhythm changed. The squads are shipping more than ever. Nobody can say whether any of it matters.

The quarterly review happens on a Thursday in late November. Three squads present. The format is simple: what did you ship, what’s the impact, what’s next.

Perth goes first. Tom runs through the slide. “We shipped the new box preview redesign, the subscription gifting feature, the improved farm dashboard, and the allergen warning system. Four features. All on time.”

Melbourne: Anika presents. “Meal kit expansion phase two, the corporate ordering portal, delivery window preferences, and the customer referral programme. Four features.”

Brisbane: “The new small-box pricing tier, farmer spotlight pages, a Brisbane-specific landing page, and push notification preferences. Four features.”

Twelve features in one quarter. The team applauds. Maya smiles. It feels like progress.

Charlotte doesn’t applaud. She’s writing something in her notebook.

After the presentations, she asks one question.

“Which of these twelve features is responsible for your subscriber growth this quarter?”

Silence.

Tom tries first. “The box preview probably helped conversion. We saw a bump in signups after it launched.”

“How big a bump?”

“I’d have to check the numbers.”

“Did you set a target before you built it?”

Tom pauses. “Not… specifically. We knew it would help.”

Charlotte turns to Anika. “The corporate portal. How many corporate orders have come through?”

Anika checks her laptop. “Seventeen.”

“What was the target?”

“We didn’t set one. The request came from three enterprise prospects.”

Charlotte looks at the Brisbane squad. “The farmer spotlight pages. What metric do they affect?”

“Engagement. People like reading about the farmers.”

“How do you know?”

“We’ve had positive feedback.”

“How many people read them?”

Another pause. “We haven’t added analytics to those pages yet.”

Charlotte closes her notebook. “You shipped twelve features. You can’t measure the impact of any of them. You’re a feature factory.”

The mirror

The term stings because it’s accurate.

A feature factory is a team that measures productivity by output, features shipped, stories completed, velocity maintained, without connecting that output to outcomes. The features get built. They get deployed. They get announced. Nobody checks whether they changed anything.

Melissa Perri coined the term “build trap” for this pattern: staying busy building the wrong things. Not wrong in the sense that they’re bad features. Wrong in the sense that they might not matter. The team is productive. The team is efficient. The team is potentially wasting its time.

Maya recognises the pattern because she’s part of the cause. Over the last three months, she’s said yes to almost everything. A customer asked for gifting. Maya added it to the backlog. An enterprise prospect wanted corporate ordering. Maya flagged it as a priority. A board member mentioned push notifications. Maya asked Brisbane to build it. Each request was reasonable. Each feature was well-built. But the backlog grew by accretion, not by design.

“I kept saying yes,” Maya tells Charlotte after the review. “Every request seemed important when it arrived.”

“They probably were important to the person asking. But important to one customer isn’t the same as important to the business. You stopped asking ‘why’ and started asking ‘when.’”

Where the discovery went

Charlotte pulls up the weekly cadence she’d designed three months earlier. Monday assumptions check. Tuesday customer interview. Wednesday Example Mapping. Thursday and Friday: build, ship, measure.

“When did the Tuesday interviews stop?”

Tom answers honestly. “About two months ago. We were heads down on the gifting feature. It had a hard deadline for the Christmas rush. The interviews felt like they could wait.”

“And the Monday assumptions check?”

“We still do it. But nobody’s raised a risky assumption in weeks. We’ve been doing features we understand.”

“Or features you think you understand.”

Charlotte isn’t angry. She’s seen this before, at three previous companies, in fact. The pattern is always the same. A team builds good discovery habits. The habits work. The team gets confident. Delivery pressure builds. The discovery activities feel like overhead because the team is building things they’ve already decided to build. One week the interview gets skipped. Then another. Then the Example Mapping sessions stop because all the stories are “Clear.” Within two months, the team is back to building from assumptions, the same assumptions that Assumption Mapping and Impact Mapping were designed to catch.

“Discovery didn’t stop because you decided to stop,” Charlotte says. “It stopped because delivery pressure created a gradient, and the team rolled downhill.”

The backlog audit

Charlotte suggests a backlog review. The kind nobody wants to do.

The Greenbox backlog has 340 items. They sit in a project management tool, loosely prioritised, spanning three squads and eighteen months of accumulated requests. Some were added by Maya. Some by Tom. Some by customers. Some by board members. A few were added by people who no longer work at Greenbox.

Charlotte asks each squad to go through their items and answer one question: what outcome does this item achieve?

Not “what does it do.” What outcome. What measurable change in a number the business cares about, subscriber growth, churn reduction, revenue per subscriber, delivery cost, NPS score.

The Perth squad reviews their 120 items in ninety minutes. When they’re done, Tom reads the results.

“Forty-seven items have a clear outcome. Thirty-one have a vague outcome, something like ‘improve the customer experience.’ Forty-two have no outcome at all. They’re features for their own sake.”

Anika’s Melbourne squad: “Thirty-nine with outcomes. Twenty-eight vague. Fifty-three with no outcome. Three items that nobody can explain the purpose of.”

Brisbane: “Twenty-two with outcomes. Nineteen vague. Twenty-seven with no outcome.”

Charlotte writes the totals on the whiteboard.

108

Clear outcome
(32%)

Vague outcome
(23%)

154

No outcome
(45%)

Forty-five percent of the backlog has no measurable outcome attached. Nearly half. The team has been feeding features into a machine without checking whether anything comes out the other end.

“This is what a feature factory looks like from the inside,” Charlotte says. “Nobody made a bad decision. Each item was added for a reason. But without an outcome, you can’t prioritise, you can’t measure, and you can’t learn.”

The impact map test

Lee joins the conversation by video. He pulls up the impact map the team built months earlier. The goal was simple: reduce churn from 8% to 4%.

“Let’s check. Which of the twelve features you shipped this quarter are on this map?”

Tom scans the map. The box preview redesign connects to “improve first-box experience,” which connects to reducing churn. That’s on the map.

The allergen warning system, born from the allergen incident, connects to “prevent bad box experiences,” which also connects to reducing churn. That’s on the map.

The other ten features? Gifting, corporate ordering, delivery windows, referral programme, pricing tiers, farmer spotlights, landing pages, push notifications, farm dashboard, meal kit expansion. None of them appear on the impact map.

“Two out of twelve,” Lee says. “You spent roughly 80% of your development capacity on work that wasn’t connected to your stated goal.”

“Some of those are about growth, not churn,” Maya objects.

“Fair. What’s the growth goal?”

Maya hesitates. “We want to grow.”

“That’s an aspiration, not a goal. How much growth? By when? Through which channels? The impact map is specific. ‘Reduce churn from 8% to 4%.’ You can measure that. You can connect work to it. ‘We want to grow’ lets you justify building anything.”

Output vs outcomes

Charlotte draws two columns on the whiteboard.

Output	Outcome
Shipped gifting feature	Subscriber growth increased by X%
Shipped corporate portal	Revenue from corporate accounts reached $Y/month
Shipped referral programme	Z new subscribers joined via referral
Shipped farmer spotlights	NPS improved by W points

“The left column is what you celebrate. The right column is what matters. You can fill in every row on the left. You can’t fill in a single row on the right.”

“Because we didn’t set targets,” Priya says quietly. She’s been listening the whole time.

“Because you didn’t set targets before you built. If you’d said ‘the corporate portal needs to generate fifty orders in the first month or we’ll reconsider,’ you’d have known at seventeen that something was wrong. You’d have talked to those corporate prospects again. You’d have learned. Instead, seventeen feels like progress because you didn’t define what success looked like.”

Ravi, who’s been quiet, speaks up. “I spent three weeks on the referral programme. How many referrals have come through?”

Sam checks. “Eleven.”

Ravi does the maths in his head. Three weeks of a senior developer’s time to generate eleven referrals. At Greenbox’s average subscriber value, the referral programme has cost roughly twenty times more to build than it has generated.

“I could have spent those three weeks reducing churn,” he says. “One percent less churn would have saved more subscribers than eleven referrals.”

“Now you’re thinking in outcomes,” Charlotte says.

The dual-track reset

Charlotte proposes a reset. Not a revolution, a recalibration.

Dual-track agile. Discovery and delivery run in parallel. Every squad does both, every week. The delivery track builds features. The discovery track validates that the features are worth building.

“You had this,” Charlotte reminds them. “The Tuesday interviews. The Monday assumptions check. You let it lapse because delivery felt more urgent. The fix isn’t a new process. It’s recommitting to the one you already had.”

One discovery activity per squad per week. Non-negotiable. It doesn’t have to be a customer interview every time. It could be a data review, a competitor analysis, a prototype test, or a five-minute conversation with Sam about what customers are complaining about. The format varies. The habit doesn’t.

Every backlog item gets an outcome statement before it enters a sprint. Not after. Before. “As a [user], I want [feature], so that [outcome]” is the minimum. Better: “We believe [feature] will cause [measurable change] and we’ll know within [timeframe].”

“What about items the board requests?” Maya asks. It’s the question she’s been dreading.

“Same rule. If a board member suggests a feature, you add an outcome. If they can’t articulate one, the item sits in the backlog until someone can. Board requests aren’t immune to the laws of prioritisation.”

Maya nods slowly. This is the part where founder instincts collide with process discipline. She’s been saying yes because saying yes feels like progress. Every “yes” is a commitment, a promise, a relationship maintained. Saying “yes, and what outcome are we targeting?” feels slower. It is slower. That’s the point.

Outcome-based roadmap. Instead of a roadmap that says “Q1: gifting, corporate portal, referral programme,” the new roadmap says “Q1: reduce churn from 6% to 4%.” The squads decide what to build to achieve that outcome. Maybe it’s a better onboarding flow. Maybe it’s fixing the three delivery complaints that drive 40% of cancellations. Maybe it’s something nobody has thought of yet, which is why the discovery track exists.

The reconnection

The following Tuesday, every squad does a customer interview. For Perth, it’s the first one in nine weeks.

Tom interviews a subscriber named David who’s been with Greenbox for fourteen months. Tom asks the standard questions: what’s working, what’s not, would you recommend it.

David’s answer surprises him. “I nearly cancelled last month. The box was fine. The produce was fine. I just felt like nobody was listening any more.”

“What do you mean?”

“I filled out the feedback form three times about getting too many root vegetables in winter. Nothing changed. I used to feel like someone was paying attention. Now it feels automated.”

Tom thanks him. Ends the call. Sits at his desk.

Three weeks ago, Tom shipped the box preview redesign. It was polished. The code was excellent. David didn’t mention it. What David mentioned was that nobody responded to his feedback. That’s not a feature problem. That’s a relationship problem. And it’s the kind of thing you only learn by talking to customers.

Tom posts the summary in the team channel. No commentary. Just the transcript excerpt.

Priya replies first: “This is what we miss when we stop listening.”

The three-week check

Three weeks into the reset, Charlotte checks in.

The Tuesday interviews are happening again. Each squad has conducted three. Two of the nine surfaced insights that changed sprint priorities, a delivery timing complaint in Melbourne that was more widespread than the data showed, and a Brisbane subscriber who explained why she’d downgraded her box size in a way that suggested the pricing tier wasn’t the problem.

The backlog has shrunk from 340 items to 280. Not because they deleted items, because they archived the 60 that nobody could attach an outcome to. Those items aren’t gone. They’re in a “needs outcome” holding area. If someone can articulate why they matter, they come back.

The Monday assumptions check produced one genuinely risky assumption: “We assume corporate customers will reorder monthly.” Nobody had tested this. The Melbourne squad designed a three-email follow-up sequence and tracked reorder rates. The assumption was wrong, corporate customers reorder quarterly, not monthly. The revenue model for the corporate portal was off by 3x.

“That one insight,” Charlotte tells Maya, “is worth more than the last three features you shipped. Because it changes a decision. The features just added code.”

What a feature factory looks like from the inside

The uncomfortable truth about feature factories is that they feel productive. The team is busy. The sprints are full. Features ship. Demos are satisfying. The board sees progress. Customers get things they asked for.

The problem is invisible until you measure it. And measuring it requires asking a question that nobody in a feature factory wants to ask: Did any of this matter?

Most teams don’t ask because the answer might be no. And if the answer is no, then the last three months of hard work, the late nights, the sprint commitments, the careful code reviews, were pointed in the wrong direction. That’s not a comfortable thing to confront.

But the alternative is worse. The alternative is continuing to build, continuing to ship, continuing to celebrate output, and slowly drifting from the outcomes that actually determine whether the business survives.

Greenbox drifted for one quarter. They caught it because Charlotte asked a direct question and nobody could answer it. Some teams drift for years.

Maya drives home that evening through the Perth suburbs. It’s still light at 7pm, the long Western Australian twilight that stretches the day. She thinks about the twelve features. Twelve things her team built with care and skill. Two of them connected to a goal. Ten of them built because someone asked and she said yes.

She’s not going to stop saying yes. That’s who she is, responsive, connected, attuned to what people need. But she’s going to start saying “yes, and what changes when we build it?” Because the answer to that question is the difference between a feature factory and a product company.