Threat Modelling: What the LLM Didn't Think About

GreenBox has 5,000 subscribers, three squads, and LLMs generating code at speed. The team knows how to build the right thing – but a near-miss with credit card data in a debug log reveals they haven’t been thinking systematically about what could go wrong.

Sam catches it on a Thursday afternoon.

She’s in the staging environment because Kai mentioned in the morning standup that the new payment debugging tool was ready for the support team to test. Sam isn’t a developer. She doesn’t read code. But she’s been handling payment support tickets for eighteen months, and when someone says “the debugging tool is ready,” Sam is the person who actually tries to debug a payment with it.

She opens a failed payment record – Mrs Patterson’s, as it happens, from a test transaction – and sees something that makes her stomach drop. The debug log shows the full credit card number. Not a masked version. Not the last four digits. The full sixteen-digit number, the expiry date, and the CVC. All of it, rendered in neat monospace font on the screen, as if it were just another data field.

Sam doesn’t panic. She doesn’t walk across the office to find Charlotte. She screenshots the screen, carefully, making sure the card number is visible in the capture. She opens Slack and sends it to Charlotte with four words: “This can’t go to production.”

Then she sits at her desk and waits. Her hands are steady. Inside, her heart is hammering.

Charlotte sees the screenshot and pulls the PR within three minutes. The code is clean, well-structured, passes all its tests. Kai had prompted the LLM to build a payment debugging tool that captures the full request and response cycle for Stripe API calls. The LLM did exactly what was asked. It logged the full request body, which includes the card details that get sent to Stripe during the initial charge.

The code review had three approvals. All three reviewers checked that the tool worked correctly – that it showed the right events in the right order, that the filtering by customer worked, that the date range selector functioned. Nobody checked what data was being logged. The reviews were about functionality, not security.

Kai is mortified. He’s sitting at his desk in Melbourne when Charlotte calls. She doesn’t raise her voice. She describes what Sam found. There’s a long silence on the line.

“I didn’t even think about it,” Kai says. His voice is quiet. “I told the LLM to log the payment request. It logged the payment request. The card data is in the request. I didn’t even think about what was in the request.”

That sentence – I didn’t even think about it – is the most dangerous sentence in the post. Not because Kai is careless. He’s a good developer. But the LLM generates code so fluently, so confidently, that the gap between “this works” and “this is safe” becomes invisible. The code looked professional. It passed tests. It had three approvals. And it would have written five thousand customers’ credit card numbers into a log file.

Charlotte makes an observation: “Sam caught this because she happened to be checking staging. Our monitoring tells us if the site is up. It doesn’t tell us what’s in the logs. We’re monitoring availability but not safety.” Tom: “We need someone who thinks about systems, not features.” He writes it on the whiteboard. It stays there for six months until they hire Jess.

Charlotte doesn’t blame Kai. She doesn’t blame the reviewers either. She blames the process. “We have code review for functionality. We have Example Mapping for business rules. We have tests for correctness. We have nothing for security. No systematic way to think about what could go wrong from a security perspective. And the LLM made this worse, not better.”

Later that afternoon, Charlotte finds Sam at her desk. Sam is replying to a support ticket, something about a Brisbane delivery. Charlotte stands next to her for a moment.

“You might have saved the company today,” Charlotte says.

Sam looks up. “I was just checking the staging environment.”

“I know. That’s the point. You check things. You notice things. That’s not a small thing, Sam.”

Sam nods and turns back to her ticket. She doesn’t mention it again. But that evening, on the drive home, she thinks about the screenshot. The neat rows of numbers. Someone’s credit card, fully exposed, because nobody in a room full of developers thought to ask what data was being logged. She thinks about the vintage radio she’s been restoring on her workbench at home – how every wire has to be right, how one wrong connection means the whole thing smokes. Software is like that. Except when the radio smokes, nobody’s credit card gets stolen.

The gap LLMs create

Here’s the uncomfortable truth about LLM-generated code: it does what you ask, and most people don’t ask about security.

When you prompt an LLM to build a payment debugging tool, it optimises for the stated goal – debugging payments. It doesn’t spontaneously consider what happens if someone with access to the debug logs misuses the data. It doesn’t think about PCI compliance. It doesn’t consider that logging card numbers creates a liability that could end the business.

This isn’t a flaw in the LLM. It’s working as designed – it does what you ask. The problem is that security isn’t usually part of what people ask for. Developers think about making things work. They think about edge cases. They think about performance. Security thinking – “what could an adversary do with this?” – is a different mental muscle, and most developers don’t exercise it habitually.

In the pre-LLM world, the pace of development provided an accidental safety net. You wrote code slowly enough that you sometimes noticed the security implications while you were writing it. The mechanical act of typing log.info(request.body) might trigger a thought: “Wait, what’s in request.body?”

When the LLM generates 200 lines of code in thirty seconds, there’s no time for that accidental reflection. The code arrives complete, looking professional, passing tests. The security gaps are hidden inside code that looks like it was written by someone who knew what they were doing.

Charlotte calls a meeting with the squad leads. “We need a systematic way to think about security. Not code review – we’re already doing that. Something that happens before the code is written. Something that asks ‘what could go wrong?’ in a structured way.”

STRIDE

Charlotte introduces STRIDE. It’s a threat modelling framework created at Microsoft in the late 1990s. It’s survived for nearly three decades because it’s simple, systematic, and it works. The acronym stands for six categories of threat:

Spoofing – pretending to be someone or something you’re not. A user logging in as someone else. A system impersonating a trusted service.

Tampering – modifying data or code without authorisation. Changing the price of a subscription. Altering a farm’s availability submission after the deadline.

Repudiation – denying that an action occurred when it did. A customer claiming they didn’t place an order. A farm denying they committed supply.

Information Disclosure – exposing data to someone who shouldn’t see it. Kai’s credit card logging. A customer seeing another customer’s delivery details. An employee accessing data they don’t need.

Denial of Service – making a system unavailable. Flooding the signup page. Overwhelming the supply matching engine during the weekly run.

Elevation of Privilege – gaining access or capabilities beyond what’s authorised. A customer accessing admin functions. A farm seeing other farms’ pricing.

STRIDE Threat Categories

Spoofing
Who are you?

Tampering
Was this changed?

Repudiation
Can they deny it?

Information Disclosure
Who can see this?

Denial of Service
Can this be blocked?

Elevation of Privilege
Can they do more than allowed?

The power of STRIDE is that it gives the team a checklist for thinking. Instead of hoping someone spontaneously notices a security issue, the team systematically asks six questions about every boundary in the system. It turns security thinking from an art into a process.

The first threat modelling session

Charlotte runs the first session with the Perth squad. She starts with the subscription flow because it’s the most security-sensitive part of the system – it handles payment data, personal information, and money.

She pulls up the photographs from the original Event Storm – the sticky-note timeline that’s been on the wall since month one. “Every domain event on this wall is a potential attack surface,” she says. “Data enters the system. Data leaves the system. Someone acts on data. At every one of those points, something could go wrong.”

The team works through the subscription flow event by event, applying STRIDE to each one.

Customer Browses → Box Selected

Spoofing: could someone browse as a logged-in user who isn’t them? (Session hijacking)
Tampering: could someone modify the box prices displayed on the page? (Client-side manipulation)
Information Disclosure: could browsing behaviour be leaked to third parties? (Analytics tracking, GDPR)

Payment Submitted → Payment Confirmed

This is where the conversation gets serious.

Spoofing: could someone subscribe with a stolen credit card? Tom raises this. “We don’t verify card ownership. We charge it through Stripe and if it works, we accept it. But what if the real cardholder disputes it three weeks later?”
Tampering: could someone modify the payment amount between submission and confirmation? Ravi checks the code. “The amount is calculated server-side, so the client can’t change it. But…” He pauses. “The webhook from Stripe that confirms the payment – are we verifying the webhook signature?”
Information Disclosure: Kai’s credit card logging incident. Already caught, but the systematic review surfaces two more places where payment data is being handled with insufficient care.
Repudiation: “What if a customer claims they didn’t subscribe? Do we have an audit trail that proves they did?” Nobody has thought about this.

Supply Matched to Demand → Substitution Decided

Tampering: what if a farm submits fraudulent availability data – claiming they have produce they don’t have, to secure the contract, then delivering less?
Elevation of Privilege: what if a farm could see other farms’ availability or pricing? That would give them a competitive advantage. Priya checks the farm portal. “The API endpoint for availability doesn’t filter by farm ID on the query. If a farm guessed another farm’s ID, they could see their data.” Everyone goes quiet. That’s a real bug. Not hypothetical – actual.

Box Dispatched → Box Delivered

Information Disclosure: the delivery tracking page shows the customer’s address. What if someone with the tracking link isn’t the customer?
Denial of Service: what if someone replays webhook calls from the courier service to mark deliveries as complete when they haven’t been dispatched?

The session runs for two hours. The team identifies 23 potential threats across the subscription flow. Some are theoretical. Some are already present in the code. The farm API bug is real and needs fixing immediately.

There’s a moment during the session that Charlotte later describes as the “oh no” moment. The team is working through the STRIDE categories for the farm availability submission when Dave – the farmer who attended the original Event Storm over a year ago and who now chairs the farm advisory group – joins via video call. Charlotte had invited him for domain perspective.

Dave listens to the tampering discussion and says, “You’re worried about farms lying about availability? That happens all the time. Not maliciously – optimistically. A farmer looks at their crop on Monday, estimates they’ll have enough for Wednesday, and then it rains on Tuesday and they’re short. The system needs to handle that gracefully, not treat it as fraud.”

That’s a domain insight that changes the threat analysis. The team was thinking about deliberate tampering – a farm gaming the system. Dave is describing something more subtle: well-intentioned inaccuracy that has the same downstream effect. The mitigation is different. For deliberate fraud, you need detection and enforcement. For optimistic estimation, you need buffers, deadlines, and communication.

Maya adds: “We built the supply shortfall handling for weather and crop failures. We never built it for estimation error. There’s no feedback loop that tells a farm ‘you’ve over-promised three weeks in a row.’” Another sticky note for the wall.

Priya writes the threats up on the whiteboard, grouped by severity:

Severity	Threats	Examples
Critical	3	Farm data leak via API, unverified Stripe webhooks, credit card data in logs
High	7	No audit trail for subscriptions, stolen card risk, delivery address exposure
Medium	8	Session management gaps, client-side price display, analytics GDPR risk
Low	5	Theoretical DoS vectors, repudiation edge cases

LLMs help WITH threat modelling

Here’s the thing Charlotte discovers halfway through the session: LLMs are actually good at this.

Not good at doing the threat modelling for you. But good at the systematic enumeration part – the bit where you take a process and exhaustively list what could go wrong.

Charlotte pastes the subscription flow into an LLM session: “Here’s our subscription flow. For each step, identify threats using the STRIDE framework: Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege.”

The LLM produces a first pass that covers about 70% of what the team found manually. It catches the obvious things – unverified webhooks, session hijacking, data exposure. It misses the context-specific things – the farm API bug, the specific way GreenBox’s substitution engine handles data, the cultural expectation in regional Australia that farms are honest about their supply.

But 70% is a great starting point. The team takes the LLM’s output and adds the 30% that requires domain knowledge. The LLM is better at systematic enumeration – it doesn’t get tired, it doesn’t forget categories, it doesn’t get bored on the fifth STRIDE pass. The humans are better at context – they know which threats are realistic, which are theoretical, and which ones keep you up at night.

Charlotte establishes a new practice: before any feature that touches a system boundary (data entry, data exit, external API, user action), the developer feeds the design to an LLM with a STRIDE prompt. For GreenBox’s scale – a small team with a single trust domain – focusing on external boundaries is a pragmatic starting point. Organisations with stricter security requirements, particularly those adopting a Zero Trust posture, would treat internal boundaries the same way: every service-to-service call, every database access, every inter-context communication is a point where STRIDE applies. The DDD bounded contexts the team defined earlier map naturally to trust boundaries. The principle is the same – the scope expands with the threat model’s maturity. The LLM produces a first-pass threat model. The team reviews it, adds context-specific threats, and decides on mitigations. It takes thirty minutes for a typical feature. Substantially less than the two-hour deep-dive they did for the subscription flow, because the LLM does the systematic part.

Threat Modelling Process

1. Identify boundaries
Where data enters or leaves the system
2. LLM first pass
STRIDE enumeration
3. Team review
Add domain context
4. Prioritise threats
Severity x likelihood
5. Plan mitigations
Smallest effective fix

The mitigations

The team works through the critical and high-severity threats over the following week. Most mitigations are surprisingly small.

Stripe webhook verification. Ravi adds signature verification to the webhook handler. It’s eight lines of code. The LLM generates it in seconds when asked “verify Stripe webhook signatures in Go using the Stripe library.” The hard part wasn’t the code – it was knowing the verification was needed.

Farm API access control. Kai fixes the API endpoint to filter availability data by the authenticated farm’s ID. Another small change. Priya writes a test that specifically tries to access another farm’s data and verifies it returns a 403. That test will catch any regression.

Audit logging. Tom adds an audit log for subscription events: created, paused, resumed, cancelled. Each entry records the user, the timestamp, and the action. It’s ten lines in a middleware. They use the LLM to generate the middleware and the database migration. The ensemble session for this takes twenty minutes.

Credit card data scrubbing. The team reviews every log statement in the payment bounded context and adds a scrubbing function that masks card numbers before logging. The LLM generates the regex. Priya writes tests with known card number formats to make sure the scrubbing catches everything.

Webhook replay protection. Ravi adds idempotency checks to the webhook handlers. Each webhook has a unique event ID from Stripe. If the same event ID is processed twice, the second processing is a no-op. This prevents replay attacks and also fixes an occasional bug where Stripe’s retry mechanism caused double processing.

Most of these are individually small – an afternoon of work each. The biggest is the audit logging, which takes two days because it touches multiple bounded contexts. But none of them would have been built without the threat model, because none of them are features. They’re invisible to users. They don’t appear on Impact Maps. They don’t make the product better in any way a customer would notice. They just prevent disasters.

There’s a lesson here about the nature of security work. It’s defensive. It doesn’t move metrics. It doesn’t delight customers. It doesn’t show up in demos. It’s the work that prevents the worst-case scenario – the one where GreenBox’s name appears in a data breach notification email and five thousand subscribers lose trust overnight.

Charlotte frames it to the squad leads this way: “Think of security mitigations as insurance. You don’t buy insurance because it makes your day better. You buy it because the alternative is catastrophic. The webhook signature verification took Ravi eight lines of code. A replayed webhook that double-charged a thousand customers would take the support team two weeks to sort out, plus the reputational damage. Eight lines of code versus two weeks of crisis management. The maths isn’t close.”

The farm availability feedback loop

One mitigation deserves a deeper look because it shows how threat modelling can improve the product, not just protect it.

Dave’s observation about optimistic estimation led the team to build something they’d never have built otherwise: a farm accuracy dashboard. Each week, the system compares what a farm promised against what they actually delivered. Over time, a pattern emerges. Most farms are within 10% of their estimates. Two farms are consistently 30-40% over-promising.

Maya uses this data in her weekly farm calls. “Dave, you estimated 200kg of zucchini last three weeks and delivered 140, 130, and 150. Can we talk about what’s going on?” The conversations are friendly, not accusatory – Dave genuinely didn’t realise how far off his estimates were. He starts being more conservative, and the shortfall rate drops.

This isn’t security in the traditional sense. But it emerged from threat modelling. The STRIDE analysis asked “what happens if availability data is inaccurate?” and the answer turned out to be “build a feedback loop” rather than “build a fraud detection system.” The threat model surfaced a business problem disguised as a security concern.

The over-reaction

Tom, predictably, wants to threat-model everything.

After the subscription flow session, he schedules threat modelling sessions for the delivery flow, the farm portal, the admin interface, the seasonal availability engine, and the customer feedback system. He blocks out two hours for each.

Charlotte intervenes. “This is Cynefin again, Tom. Not everything needs the same level of analysis.”

She draws a line. “Threat-model the boundaries – where data enters and leaves the system. The API endpoints. The webhook handlers. The authentication flows. The payment processing. The places where one bounded context communicates with another.”

“Don’t threat-model internal pure functions. The seasonal availability calculator that takes a date and returns a list of in-season produce doesn’t need a STRIDE analysis. There’s no data entering or leaving. There’s no user action. There’s no attack surface.”

Tom’s instinct is understandable. The credit card logging incident scared everyone, and the natural response to a scare is to over-correct. But threat modelling has the same fatigue problem as Example Mapping – if you do it for everything, people stop taking it seriously, and you miss the sessions that actually matter.

Charlotte’s rule: threat-model every boundary, every external integration, and every feature that handles PII or financial data. Skip it for internal logic, pure functions, and UI-only changes that don’t affect data flow.

The Melbourne squad adopts a lightweight version: a “security check” question added to their Definition of Done. Before any PR is approved, the reviewer asks: “Does this touch a system boundary? If yes, has the boundary been threat-modelled?” If the answer is yes and no, the PR gets parked until someone runs a quick STRIDE pass.

It’s not a heavyweight process. It’s a question on a checklist. But it catches the gaps that code review alone misses.

What the team learns

The threat modelling work teaches the GreenBox team something that goes beyond security. It changes how they think about LLM-generated code.

Before the credit card incident, the team’s mental model of LLM-generated code was: it works, it passes tests, it’s probably fine. After threat modelling, the mental model shifts to: it works, it passes tests, but what hasn’t it considered?

The LLM doesn’t think adversarially. It doesn’t imagine a malicious user, a disgruntled employee, a misconfigured server, or a replay attack. It generates code that handles the happy path and the explicitly stated error paths. Everything else – the threats, the edge cases that nobody thought to specify, the security implications of logging decisions – is invisible to it.

This isn’t unique to LLMs. Human developers also miss security considerations. The difference is scale. When a human writes code slowly, there’s time for security concerns to surface organically – in code review, in casual conversation, in the shower. When an LLM generates hundreds of lines in seconds, the volume of unreviewed assumptions goes up proportionally.

Threat modelling is the team’s way of deliberately creating the reflection time that the LLM’s speed has eliminated. It’s a forced pause between “the code works” and “the code is safe.”

Maya makes the connection explicit at a Friday all-hands. “We learned early on that LLMs build the wrong thing if you don’t tell them what to build. Event Storming and Example Mapping fixed that. Now we’re learning that LLMs build insecure things if you don’t tell them what to protect against. Threat modelling fixes that. Same pattern, different dimension.”

She’s right. The discovery techniques the team learned in months one through three were about building the right thing. Threat modelling is about building the right thing safely. Both are gaps that human thinking has to fill, because the LLM won’t fill them on its own.

The Event Storm connection

There’s an elegant connection between the Event Storm wall and the threat model that Charlotte points out.

The Event Storm maps domain events – things that happen in the system. Each event implies data flowing, actors acting, and state changing. Each of those is a potential attack surface.

The threat model doesn’t need to start from scratch. It starts from the Event Storm. Every orange sticky note (domain event) is a place where data changes. Every blue sticky note (command) is an action that could be spoofed. Every yellow sticky note (actor) is a potential threat vector if their identity isn’t verified.

The Event Storm that the team ran in month one – the wall of sticky notes that mapped the entire GreenBox domain – is still photographed and pinned in the Perth squad’s area. It’s the same wall that identified the hotspots about substitution policy and delivery logistics. Now it’s pulling double duty as the starting point for security analysis.

Charlotte adds a new colour of sticky note to the Event Storm: red for security-sensitive events. Payment Submitted, Customer Data Updated, Farm Availability Changed, Box Contents Finalised. These are the events where STRIDE analysis is mandatory. The rest are reviewed on a case-by-case basis.

The discovery techniques compose. Event Storming maps the domain. Cynefin tells you which parts need deep analysis. STRIDE tells you what could go wrong at the boundaries. The team isn’t learning new techniques in isolation – they’re building a toolkit where each piece connects to the others.

And the composition goes further. The Decision Tables that encode Maya’s substitution rules? Each row is an assertion about safe behaviour. The threat model asks: “What happens if an input violates the assumptions behind this table?” What if a produce item is mislabelled and something containing tree nuts ends up in the “nut-free” substitution pool? The decision table handles the normal case. The threat model handles the adversarial case. Together, they cover the space.

The ADRs that record architectural decisions? Each one now includes a security section. “We chose to store allergen data in the customer profile context. STRIDE analysis: Information Disclosure risk if the profile API is accessed without proper authorisation. Mitigation: API endpoints require authentication and filter by the authenticated customer’s ID.” The architectural decision and its security implications are recorded together, in the same document, so that anyone reading the ADR understands not just what was decided but what risks were considered.

When the building is on fire

Three weeks after the first threat modelling session, the team gets a real test.

A Saturday morning. Sam gets an alert: the payment webhook handler is returning 500 errors. Stripe is retrying. The retry queue is building up. Some customers are seeing their subscriptions marked as “payment pending” when they’ve already been charged.

This is Chaotic – act first, sense later.

Ravi is on call. He diagnoses the problem in twenty minutes: a database migration from Friday afternoon added a new column to the audit log table but didn’t update the webhook handler’s insert query. Every webhook that tries to write an audit entry fails, and the error cascades to the webhook response.

The fix is two lines. Ravi ships it, the queue drains, and the pending subscriptions resolve within an hour.

After the dust settles, Charlotte asks: “What’s the runbook?” Blank stares. “If this happens at 2am on a Saturday, who gets called? In what order? What’s the first thing they do?” Nobody has answers. Charlotte: “You just had an incident. You handled it because Ravi happened to be on call and Charlotte happened to be available. That’s luck, not process.” The team writes their first incident runbook that afternoon: who to call (Tom, then Charlotte), what to check (logs, then customer impact), how to communicate (Slack channel, then customer email if affected). It’s one page. It lives in the team wiki. It gets used six weeks later when a database migration fails at 6am.

But here’s the thing. The audit logging that caused the outage was itself a mitigation from the threat model. The threat model identified “no audit trail for subscriptions” as a high-severity gap. The team built the audit logging to close that gap. And a deployment mistake in the audit logging caused an incident.

The post-incident review is honest. “We built the right thing,” Charlotte says. “The audit logging is important and it should exist. We deployed it without adequate testing of the migration. The fix isn’t to remove the audit logging – it’s to improve the deployment process.”

Tom, who has a habit of over-reacting, starts to suggest threat-modelling the deployment pipeline. Charlotte redirects: “The deployment issue is Complicated – we know what went wrong and we know how to fix it. Better migration testing, a staging environment check for schema changes, and a rollback plan. We don’t need a STRIDE analysis for the deployment pipeline. We need a checklist.”

Cynefin meets threat modelling meets incident response. Each framework tells the team something different about the same situation. Threat modelling identified the missing audit trail. The deployment failure was a Complicated problem with a known fix. The incident response was Chaotic – act first, stabilise, then reflect. The tools compose. They don’t compete.

Six months later

Six months after the first threat modelling session, Sam – the same Sam who caught the credit card logging incident – reviews the security posture.

The webhook handlers are all signature-verified. The API endpoints all enforce access control by authenticated identity. Payment data is scrubbed from all logs. Every subscription action has an audit trail. The farm portal has been penetration tested by an external firm, who found two medium-severity issues that the team had already identified in their threat model but hadn’t yet fixed.

“The pen test basically confirmed our own threat model,” Sam says. “They found what we found. That’s either really good or it means we’re both missing the same things.”

Charlotte smiles. “It means the process works. But you’re right to be cautious. The things we’re all missing are the things we’ll discover the hard way. That’s why we keep doing this.”

The near-miss with the credit card data could have been a disaster. If it had reached production, if a customer had noticed, if a journalist had written about it – GreenBox’s reputation would have been shattered. Five thousand subscribers trust the company with their payment details, their addresses, their dietary information.

That trust is earned through code that’s written with security in mind, not just functionality. The LLM writes the code. The team writes the thinking – including the thinking about what could go wrong.

Kai, who pushed the original offending code, now runs threat modelling sessions for the Melbourne squad. He’s become the team’s most thorough security thinker, precisely because he experienced the near-miss personally. He keeps a sticky note on his monitor that says What’s in the request? – the question he didn’t ask on the day Sam caught the bug.

“I used to review code by asking ‘does it work?’” he says at a Melbourne retro, six months later. “Now I ask ‘does it work, and what happens if someone tries to make it work in a way we didn’t intend?’” He pauses. “Sam caught it. Not a developer. Not a security expert. The person who tests things because she cares about what customers experience. I think about that a lot. About who catches the things that the people closest to the code can’t see.”

The broader lesson

Threat modelling is often presented as a security technique. And it is. But at GreenBox, it’s become something broader: a practice of thinking about what you’re not thinking about.

Every discovery technique in this series addresses a specific blind spot. Event Storming reveals the domain model you don’t share. Example Mapping reveals the edge cases you haven’t considered. Impact Mapping reveals the features that don’t connect to goals. Cynefin reveals the approach that doesn’t match the problem. Threat modelling reveals the bad things that could happen that nobody has imagined.

The common thread is systematic thinking about invisible risks. In each case, the technique provides a structure that forces the team to look where they naturally wouldn’t. Left to their own devices, teams think about functionality because that’s what they’re building. They don’t naturally think about security, because security is the absence of bad things happening. You don’t notice it working. You only notice it failing.

LLMs amplify this blind spot. A human developer writing code slowly might pause and think, “Should I be logging this data? Is this safe?” The LLM generates code at a pace that leaves no room for that kind of reflection. The code arrives complete, clean, and professionally structured. It looks safe. It might not be.

Threat modelling is the team’s deliberate replacement for the accidental reflection that slow coding used to provide. It’s scheduled, structured, and systematic. It doesn’t depend on someone having a moment of inspiration during a code review. It depends on the team sitting down and asking six specific questions about every place where data enters or leaves the system.

The most secure systems aren’t built by teams that never make mistakes. They’re built by teams that catch mistakes systematically, before the mistakes reach production and before they reach customers. Threat modelling is how GreenBox does that – not perfectly, but deliberately.

The toolkit is now complete. Event Storming, Example Mapping, JTBD, Cynefin, ensemble programming, threat modelling – the team has techniques for every type of problem. But having the tools isn’t the same as using them consistently. The real challenge isn’t learning discovery techniques. It’s making them stick as a weekly practice that survives holidays, deadlines, and the constant temptation to just start building. That’s the final chapter (coming 20 October).