Pricing Experiments: The Right Box at the Right Price

May 19, 2026 · 13 min read

Greenbox has just over two hundred subscribers. The Business Model Canvas workshop was earlier in the week, and something Lee said is still rattling around in Maya’s head: “You’ve validated almost everything on that canvas. The one box you haven’t tested is the revenue model.”

Lee is at Maya’s kitchen table with a coffee and the canvas printed on A3. He’s drawn a circle around the Revenue Streams box and written three words next to it: “untested, but working.”

“The prices are working,” Maya says. “People are paying them. Nobody’s complained.”

“Nobody who signed up has complained. You don’t know anything about the people who looked at the site, saw $25, and closed the tab. You don’t know whether you could charge $30 and still get the same conversion. You don’t know whether the small box is mispriced relative to the large box. You’ve got one data point – the current price – and you’ve decided it’s right because people are buying.”

Maya frowns. She’d been congratulating herself, a little, that the pricing felt settled. It was one less thing to worry about.

“So what are you suggesting?”

“Pricing experiments. Not once. As a habit. You should be testing your pricing the way you test your features.”

Why pricing feels different

Pricing is one of the few decisions in a startup that feels genuinely scary to get wrong. A bad feature can be rolled back. A bad copy tweak can be un-tweaked. A bad price, once published, anchors every subscriber’s expectation of what the product costs. Raise it and people feel cheated. Lower it and the people who paid the old price feel stupid.

This is why most founders pick a number that feels right, ship it, and never touch it again. It’s not laziness. It’s loss-aversion. The downside of changing pricing feels enormous, and the upside feels uncertain.

Lee has seen this pattern many times. His view is that it’s the wrong way to think about it.

“The risk isn’t changing your price. The risk is being wrong about your price and not knowing. If you’re charging $25 for a small box that people would happily pay $30 for, you’re not being nice. You’re giving away five dollars per subscriber per week. At two hundred and ten subscribers, that’s $1,050 per week you’re leaving on the table. That’s a farm partner you could pay. That’s Sam’s hours. That’s runway.”

Maya does the maths in her head. The number is uncomfortably large.

What to test

The team sits down on a Wednesday afternoon to work out what they actually want to learn. Lee uses the same approach they’ve used for product discovery: write down the assumptions and figure out which ones are worth testing.

The assumptions on the wall, in Maya’s handwriting:

  1. The small box at $25 and the large box at $45 are both priced correctly relative to cost.
  2. The $20 gap between small and large reflects the actual value difference to subscribers.
  3. Subscribers would not pay more than $25 for the small box.
  4. A third, larger box option would not expand the market.
  5. Weekly delivery is the only frequency that makes sense.
  6. Free delivery is a deal-breaker if removed.

Lee reads down the list and then sets the marker down. “Which of these would you be most embarrassed to find out you’d been wrong about for six months?”

Maya doesn’t hesitate. “Three. If people would happily pay $30 for the small box, I’d feel sick.”

“Good. Let’s test three first.”

Designing the experiment

This is where it gets interesting. You can’t just change the price on the website and see what happens, because if the new price is wrong you’ve damaged the brand. You also can’t ask people “would you pay $30?” in a survey, because people lie about pricing in surveys – not deliberately, but because stated preference and revealed preference are different things.

Lee proposes a simple framework:

Test the price before the commitment. New visitors who haven’t yet chosen a box see a landing page with the price. Half see $25. Half see $30. Everybody who clicks through can choose to continue at the price they saw. Measure two things: the click-through rate (do fewer people continue at $30?) and the conversion rate (of the people who continue, how many actually subscribe?).

Be honest about what you’re doing. If a subscriber asks why they saw $30 and their friend saw $25, don’t pretend it was a glitch. Explain that you were testing pricing and wanted to understand what the right number was. Offer them whichever price they would have preferred.

Lee pauses there. “That’s the design. Now the part I can’t do for you. I’ve watched plenty of these experiments and I know what they cost when they’re set up wrong – but the actual sample-size maths, how big and how long and how confident, isn’t my world. I can tell you a pricing test is worth running. I can’t tell you whether your traffic will let you read the result. You need that worked out before you agree on a duration.”

He looks across the table. Priya is already on it.

“Yeah. Give me thirty seconds.”

She works through a few more lines and looks up.

“I want to check what we can actually measure here. Our conversion rate is around seven percent. If we want to be confident we’ve detected a ten percent shift in that – seven dropping to six and a bit – the back-of-envelope number is about twenty thousand visitors per arm. We’re getting maybe a hundred and fifty a week. That’s two and a half years per arm.”

Tom holds up a hand. “Two and a half years? Hold on – before you tell me the rest, can you walk us through where that number actually comes from? I hear ‘twenty thousand per arm’ and I’ve got no idea what’s in the calculation.”

Priya nods. “Worth doing. Five things go into it, and they’re all things we have to commit to before we run the experiment, not after.”

She turns the envelope over.

“First, the conversion rate. The fraction of visitors who actually subscribe. We’re at about seven percent. That’s the baseline – the thing we’re trying to move.

“Second, the shift we want to detect. We have to say up front what size of change matters to us. ‘Conversion fell from seven percent to six and a bit’ is a ten percent relative shift. ‘From seven to five point six’ is a twenty percent shift. The smaller the change we want to spot, the bigger the sample we need – because small differences look a lot like noise.

“Third, the null hypothesis. The boring assumption. In our case, ‘the two prices convert identically.’ That’s what we’re trying to disprove. The whole game is asking: could the difference we observed have happened by chance, even if the boring assumption is true? If yes, we shrug. If no, we’ve learned something.

“Fourth, alpha. How often we’re willing to be fooled by chance. Standard number is five percent – one experiment in twenty where we cry ‘effect!’ when actually there’s nothing there. Picking five percent is us saying we’ll accept being wrong that way one time in twenty.

“Fifth, power. The flip side. If there really is an effect, how likely are we to catch it? Standard number is eighty percent. So even with a perfectly designed experiment, one time in five we’d miss a real difference and write it off as noise.”

Tom nods slowly. “So alpha and power are us deciding how cautious we want to be in each direction.”

“Exactly. Once you’ve fixed all four – baseline, target shift, alpha, power – the maths gives you a sample size. Two pieces of vocabulary in that formula are worth pinning down. The first is standard deviation. Think of it as the typical size of the wobble. If we flip a fair coin a hundred times we expect fifty heads, but we don’t get exactly fifty every time – sometimes forty-six, sometimes fifty-three. Standard deviation puts a number on that everyday wobble: the noise you get from a process even when nothing interesting is going on.

“The second is the z-score. That’s just our observed signal measured in those wobble-units: how many standard deviations away from boring-and-noisy our result sits. The bigger the z, the harder it gets to explain the result away as noise. But what we actually want out the other end of the formula is one number: how many visitors per arm to give ourselves a fair chance of seeing the shift we said mattered.”

She underlines her result. “For us, with a ten percent target shift: twenty thousand visitors per arm. We get a hundred and fifty a week. Two and a half years per arm. If we’ll settle for spotting a really big shift – twenty percent or so – we could call it in eight months. For a five percent shift, which is well inside what a small price tweak might actually move, we’re talking the better part of a decade.”

Maya looks at Lee. “So the experiment can’t really prove anything in any sane window.”

“Not at your volume. Which means the choice you have isn’t ‘run it until it’s statistically valid’. It’s ‘run it long enough to see the shape of the signal, then act on the direction’. You’ll be moving from one defensible price to another defensible price with a lean, not a proof. That’s the only kind of pricing decision you can make at this stage.”

Maya thinks about it. “If we’re wrong by five percent in either direction, we can adjust. We won’t be wrong by fifty percent.”

“That’s the right framing. And it tells you what the time limit is for.”

Set a time limit. Two weeks, then stop – not because two weeks will give you certainty (it won’t), but because the time box is what limits your exposure. A pricing experiment is a temporary act of price discrimination, and the longer it runs the more it corrodes trust. The time limit is containment, not measurement.

Know what you’ll do with each outcome. Before starting, write down what decision you’ll make if revenue per visitor goes up, goes down, or barely moves. “The data is inconclusive” needs to be one of the outcomes you’ve planned for – because at this volume it’s the most likely one. Decide in advance whether a directional lean is enough to act on. If it isn’t, don’t run the experiment yet; save it for when you’ve got the traffic.

Priya adds one more thing. “And we write a note in the wiki – ‘small box price, revisit when weekly visitors exceed five thousand’. Future us deserves to know we acted on a lean, not on a proof.”

Tom has a concern. “What if conversion drops by ten percent at $30? Does that mean $30 is the wrong price?”

Lee thinks about it. “Not necessarily. If conversion drops ten percent but revenue per converted subscriber goes up twenty percent, you’re still ahead. The question isn’t ‘does conversion drop’ – it’s ‘does total revenue go up or down.’ And you have to weight that against the long-term effects on word-of-mouth, retention, and brand.”

Running the experiment

They run it for two weeks.

The setup is deliberately simple. Priya writes a small piece of code that assigns each new visitor to one of two groups at random and shows them the appropriate landing page. The price on the page is the price they’d pay if they subscribed. Nothing else about the site changes.

Over two weeks, 312 visitors see the $25 page. 298 visitors see the $30 page. The split is even enough to compare – and, as the team already knows going in, well short of what statistical confidence would require.

The results:

Small box pricing experiment: two weeks
Variant Visitors Clicked through Subscribed Revenue/week
$25 (control) 312 184 (59%) 22 (7.1%) $550
$30 (test) 298 164 (55%) 19 (6.4%) $570

Click-through drops from 59% to 55%. Conversion drops from 7.1% to 6.4%. But revenue per visitor goes up, because the people who subscribe at $30 are paying more than the people who subscribe at $25.

The total revenue from the $30 cohort is slightly higher than the $25 cohort, despite slightly fewer subscribers.

Reading the numbers

The team gathers round Priya’s laptop.

“This is roughly the shape we expected,” Priya says. “Twenty-two subscribers versus nineteen, out of about three hundred visitors each. The gap is well inside the noise – if we’d run the experiment in a different fortnight, those numbers could easily have flipped. The data isn’t conclusive. We knew going in it wouldn’t be.”

“Then what is it telling us?” Maya asks.

“Direction. Click-through is slightly lower at $30. Conversion is slightly lower. Revenue per visitor is slightly higher. That’s the shape you’d expect if $30 is closer to the right price than $25, and roughly the opposite of what you’d see if $30 were too high. It’s not proof. It’s a lean.”

Lee picks it up. “And the team agreed before we ran it that a lean is what we’d act on. The alternative was waiting years for a confidence we don’t actually need to make a five-dollar decision.”

The decision

Maya looks at the numbers again. “So we raise the price.”

“On a directional signal,” Lee says. “Not because the data proved anything, but because we said we’d act on direction and the direction is up. If it had pointed the other way we’d be having a much shorter meeting. The five percent more revenue per visitor isn’t huge – but the people who said yes at $30 are telling you they value the box at $30. The people who said no were probably never going to be great subscribers anyway. They would have subscribed for a month and cancelled.”

“Before you change anything – how do you feel about the subscribers who paid $25?”

Maya thinks. “I don’t want to raise the price on them. They signed up at $25 and that was the deal.”

“Good instinct. Honour the original price for existing subscribers indefinitely. New subscribers sign up at $30. Your existing subscribers feel looked after. Your new subscribers feel fairly treated. The only people who lose are the ones who would have subscribed at $25 but won’t at $30 – and the experiment tells you that’s a small group, and a group that probably wouldn’t have stuck around.”

It’s a clean decision, but it’s only clean because they measured first – and only honest because they were clear, before measuring, about what kind of evidence they’d accept.

What gets tested next

Maya and Lee work through the rest of the list. The $20 gap between small and large is the obvious next target – assumption 2 from the wall. After that, the mixed-sourcing pilot that’s been on the whiteboard for weeks. Each one a separate test. Each one starting the same way: write down the question, design the test, decide what you’d do with each outcome before you run it, measure, decide.

Pricing experiments. Not once. As a habit.

The team doesn’t know it yet, but the question on the wall is about to change. The discipline they’ve just learned will get its first real test on a deadline they didn’t pick. That’s a story for another week.

Lee writes a single sentence at the top of the pricing page in the team wiki:

“Price is an assumption until you’ve tested it. Test the assumptions you’d be most embarrassed to be wrong about first.”

Maya reads it, nods, and goes back to the kitchen to think about what to test next.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.