How Fraud Detection Works

In How Payments Work, we followed a card transaction from tap to settlement, the chain of messages that crosses four organisations in under two seconds. At step 4 of that chain, the issuing bank makes a decision: approve or decline. That decision looks simple, but it’s actually one of the hardest classification problems in computing. The bank has to distinguish legitimate purchases from fraud, in real time, with imperfect information, while keeping the false positive rate low enough that genuine customers aren’t constantly blocked. Get it wrong one way and you lose money to criminals. Get it wrong the other way and you lose customers to frustration.

The scale of the problem

Global card fraud losses hit $33.8 billion in 2023, according to the Nilson Report. That sounds enormous, and it is, but it’s also only about 6.5 basis points (0.065%) of total card transaction volume. The fraud rate on card payments is remarkably low, and the reason it’s low is that the detection systems are remarkably good. They just have to be, because the volume is so high that even a tiny increase in the fraud rate translates to billions of dollars.

Visa alone processes roughly 300 million transactions per day. If fraud detection accuracy dropped by just one-tenth of a percentage point, from 99.9% to 99.8%, that’s an additional 300,000 undetected fraudulent transactions per day. The maths are unforgiving.

The fraud landscape has shifted dramatically over the past two decades. In the early 2000s, most card fraud was counterfeit, criminals skimming magnetic stripe data from legitimate cards and creating clones. The introduction of EMV chip cards (named after Europay, Mastercard, and Visa, the consortium that developed the standard) largely killed counterfeit fraud. The chip generates a unique cryptogram for each transaction, so copying the card data doesn’t let you create a working clone. Countries that adopted EMV early saw counterfeit fraud drop by 70-80%.

But fraud didn’t disappear. It migrated online. Card-not-present (CNP) fraud, where someone uses stolen card details on a website or over the phone, now accounts for roughly 73% of all card fraud globally (European Central Bank, 2023). You can’t verify a chip cryptogram when there’s no chip reader involved. The battlefield moved, and the defences had to follow.

Rules engines: the first generation

The earliest fraud detection systems were rules engines, lists of if-then conditions written by human analysts.

A simple rule might look like: If the transaction amount exceeds $5,000 and the card has not been used in more than 30 days, flag for review. Or: If a card-present transaction occurs in Australia less than 2 hours after a card-present transaction in the United Kingdom, decline, no human can travel that fast. This second rule is called a velocity check, and it’s one of the oldest and most effective fraud signals.

Rules engines are intuitive and explainable. When a transaction is declined, you can point to exactly which rule triggered it. Regulators like this. Customers can understand it (“your card was blocked because we saw a transaction in another country”). And the rules can be updated quickly, if a new fraud pattern emerges, an analyst writes a new rule and deploys it.

The problem is that rules don’t scale. As fraud patterns multiply, the rule sets grow to thousands of conditions, and they start conflicting with each other. Worse, rules are static, they capture known patterns, not emerging ones. Fraudsters are adaptive. They learn the rules (often by testing small transactions to probe the boundaries) and adjust their behaviour to slip through. A rule that blocks transactions over $5,000 just teaches fraudsters to make transactions of $4,999.

By the mid-2000s, the largest issuers and networks were running tens of thousands of rules, and the false positive rate was climbing. Legitimate cardholders were being declined at rates that damaged customer relationships and drove people to competitors. The industry needed something better.

Machine learning: pattern recognition at scale

The shift to machine learning models for fraud detection began in earnest in the late 2000s and early 2010s. The core idea is simple: instead of having humans write rules, you feed historical transaction data, labelled as fraudulent or legitimate, into a model, and the model learns the patterns itself.

The models commonly used include gradient-boosted trees (XGBoost, LightGBM), random forests, and neural networks. Each has trade-offs. Gradient-boosted trees are fast to train, handle tabular data well, and produce models that can be partially interpreted. Neural networks can capture more complex patterns but are harder to explain and require more data to train.

The features (inputs) these models use are extensive. A modern fraud model might consider hundreds of variables for each transaction:

Transaction features: amount, currency, merchant category code (MCC), whether the card is present, time of day, day of week.
Card history features: average transaction amount over the past 30/60/90 days, number of transactions today, time since last transaction, number of distinct merchants in the past week, number of distinct countries in the past month.
Behavioural features: does this cardholder typically buy coffee at 7.30am? Do they usually shop online on weekday evenings? Have they ever transacted at this merchant before? How far is this merchant from their home address?
Merchant features: is this merchant in a high-risk category (gambling, cryptocurrency, adult entertainment)? What’s the merchant’s historical chargeback rate? Is this a new merchant?
Device and network features (for online transactions): IP address geolocation, device fingerprint, browser characteristics, whether a VPN or proxy is detected.

The model produces a score, typically a number between 0 and 1, where higher numbers indicate higher fraud probability. The issuer sets a threshold: transactions scoring above the threshold are declined or sent for manual review; transactions below are approved.

Setting the threshold is the fundamental trade-off of fraud detection. Set it too low (aggressive) and you catch more fraud but decline more legitimate transactions. Set it too high (permissive) and you let more fraud through but improve the customer experience. There is no correct threshold, only a trade-off between false positives (legitimate transactions wrongly declined) and false negatives (fraudulent transactions wrongly approved).

The economics favour different thresholds for different scenarios. A $10,000 transaction on a card that’s never been used for online purchases deserves scrutiny. A $4.50 coffee at a shop the cardholder visits every morning barely needs a second look. Modern systems adjust the threshold dynamically based on the risk profile of each transaction.

The false positive problem

False positives are the silent cost of fraud detection. When your card is wrongly declined at a restaurant, the bank doesn’t lose money, you do, in embarrassment and inconvenience. Studies estimate that 30-40% of declined transactions are false positives (Javelin Strategy & Research, 2022). For every legitimate fraud caught, the system wrongly blocks a genuine purchase.

The cost is real. A 2019 study by Aite Group estimated that US retailers lost $443 billion in sales annually from false declines, more than twelve times the cost of actual fraud. Some of those declined customers go elsewhere and never come back. The fraud team’s success metric (fraud prevented) and the revenue team’s success metric (sales completed) are in direct tension.

This is why fraud detection is increasingly treated as an optimisation problem rather than a pure security problem. The goal isn’t to block all fraud, it’s to minimise total loss, where total loss includes both fraud losses and the revenue lost from false declines. A sophisticated issuer will accept a slightly higher fraud rate if it significantly reduces false declines, because the net result is better for the business and the customer.

Device fingerprinting

For online transactions, where there’s no physical card to authenticate, the device itself becomes a signal.

Device fingerprinting builds a profile of the device making the transaction. It collects dozens of attributes: screen resolution, operating system version, installed fonts, browser plugins, language settings, time zone, whether JavaScript is enabled, the specific GPU (inferred from WebGL rendering), the audio context fingerprint (subtle differences in how different hardware processes audio signals). Individually, none of these identify a device uniquely. Together, they create a fingerprint that is unique to perhaps one in several hundred thousand devices.

The Electronic Frontier Foundation’s Panopticlick project (now called Cover Your Tracks) demonstrated in 2010 that the combination of browser attributes was unique for 83.6% of browsers tested, and 94.2% of those with Flash or Java enabled. The technique has only grown more sophisticated since.

If a transaction comes from a device fingerprint that matches the cardholder’s known devices, that’s a strong legitimacy signal. If it comes from a device never seen before, associated with an IP address in a different country from the cardholder’s home, with a VPN detected, that’s a strong fraud signal.

Fraudsters know this, so they use anti-fingerprinting tools, specialised browsers like Multilogin or GoLogin that let them spoof device attributes, making their device appear to be a common configuration rather than a unique one. The detection systems respond by looking for the signatures of anti-fingerprinting tools themselves, specific JavaScript behaviours, inconsistencies between reported and actual attributes, timing anomalies in how the browser responds to probes. It’s an arms race, and it’s ongoing.

Behavioural biometrics

The newest frontier in fraud detection is behavioural biometrics, the way you interact with your device, as opposed to what device you’re using.

Your typing pattern is distinctive. The time between keystrokes (called keystroke dynamics or flight time), the pressure you apply to a touchscreen, the angle at which you hold your phone, the way you move your mouse, the speed at which you scroll, all of these form a behavioural profile that is remarkably consistent for a given person and remarkably difficult for someone else to replicate.

Companies like BioCatch (founded 2011, acquired by Permira in 2023 for $1.3 billion) and Behaviosec (acquired by LexisNexis Risk Solutions in 2022) build profiles based on these patterns. If someone logs into your banking app but types differently, swipes differently, and holds the phone at a different angle, the system can flag the session as suspicious even though the username, password, and device fingerprint are all correct.

A 2020 BioCatch case study described detecting a fraud ring that had obtained legitimate credentials through social engineering. The credentials were correct. The device was the victim’s own phone (the fraudsters had convinced the victim to install remote access software). Traditional authentication would have passed. But BioCatch flagged the sessions because the mouse movements were too smooth, the remote access software was translating touch input to mouse events, and the dynamics were subtly different from direct mouse use. The fraudsters had the keys and the car, but they were driving it wrong.

3D Secure: shifting the liability

3D Secure (3DS) is the protocol behind the pop-up window that sometimes appears when you’re buying something online, asking you to enter a code sent to your phone or confirm the purchase in your banking app. The name stands for “Three-Domain Secure”, the three domains being the issuer, the acquirer, and the interoperability domain (the card network). Visa’s implementation is called “Visa Secure” (formerly “Verified by Visa”); Mastercard’s is “Mastercard Identity Check” (formerly “Mastercard SecureCode”).

The original 3D Secure (version 1.0, launched in 2001) was widely hated. It redirected you to a separate webpage, often slow, poorly designed, and visually inconsistent with the merchant’s site, where you had to enter a static password that you’d set up and forgotten. It increased cart abandonment rates significantly. Merchants resented it because it drove away customers. Cardholders resented it because it was clunky.

3D Secure 2.0 (3DS2), rolled out from 2019, is dramatically better. Instead of requiring a password every time, 3DS2 performs a risk-based assessment behind the scenes. The issuer receives rich data about the transaction, device information, transaction history, shipping address, billing address, and decides whether the risk is low enough to approve without cardholder interaction (a frictionless flow) or whether to step up to a challenge (typically a one-time code sent via SMS or push notification to the banking app).

In practice, 80-90% of 3DS2 transactions go through frictionlessly, the cardholder never sees the challenge because the risk assessment determined the transaction was low-risk. The remaining 10-20% get a challenge, which is far less annoying than the old static password.

The real significance of 3DS is liability shift. When a merchant processes a transaction through 3DS and the issuer authenticates it, the liability for fraud shifts from the merchant to the issuer. If the transaction turns out to be fraudulent, the issuer eats the loss, not the merchant. This gives merchants a strong incentive to implement 3DS and gives issuers an incentive to make their risk assessments accurate.

SCA and PSD2: Europe regulates

In September 2019, the European Union’s Payment Services Directive 2 (PSD2) introduced a requirement called Strong Customer Authentication (SCA). The rule is simple in principle: electronic payments in the European Economic Area must be authenticated using at least two of three factors:

Something you know, a password or PIN.
Something you have, a phone or a hardware token.
Something you are, a fingerprint or face scan.

This is standard multi-factor authentication applied to payments. The regulation mandated it for all electronic payments above €30 (with some exemptions) and required issuers to implement 3DS2 as the primary mechanism for online card payments.

The rollout was messy. The original deadline of September 2019 was extended repeatedly because the industry wasn’t ready. The UK (pre-Brexit, then with its own FCA deadline) pushed its enforcement to March 2022. Different EU member states enforced the rules at different times. Merchants, issuers, and acquirers all pointed at each other. Transaction decline rates spiked in the early months as systems that hadn’t been tested properly went live.

But once the dust settled, the impact was significant. Card-not-present fraud in the UK dropped by 20% in the year following SCA enforcement, according to UK Finance’s 2023 fraud report. The additional authentication friction, the code sent to your phone, the biometric check in your banking app, added a second or two to the checkout process but made stolen card details substantially less useful to criminals. A stolen card number without access to the cardholder’s phone is, in a post-SCA world, nearly useless for online transactions in Europe.

Australia hasn’t mandated SCA equivalent legislation, but the major issuers have implemented 3DS2 widely, and the RBA has signalled interest in stronger authentication requirements. The US remains further behind, 3DS2 adoption is lower, and there’s no federal equivalent of PSD2.

The cat and the mouse

The fundamental dynamic of fraud detection is adversarial. Unlike most machine learning problems, where the data distribution is relatively stable, fraud detection operates against an opponent who actively adapts.

When issuers deployed EMV chips and killed counterfeit fraud, criminals pivoted to card-not-present fraud. When ML models learned to catch common CNP patterns, criminals started using account takeover (ATO), gaining control of a legitimate cardholder’s online banking account and making transactions that look, to the model, like the real cardholder. When behavioural biometrics caught ATO by detecting unfamiliar interaction patterns, criminals started using authorised push payment (APP) fraud, social engineering the cardholder into making the payment themselves, so the biometrics are genuine because it genuinely is the cardholder doing it.

APP fraud is the current crisis. In the UK, APP fraud losses reached £459 million in 2023 (UK Finance). The criminal calls you, pretends to be your bank’s fraud department, tells you your account has been compromised, and convinces you to transfer your money to a “safe account” that’s actually controlled by the criminal. The irony is savage: the fraud is committed by using the fraud detection system’s own credibility against the victim.

Every defence creates a new attack surface. Every new attack creates a new defence. The fraud detection industry employs thousands of people and spends billions of dollars annually on a problem that can never be fully solved, only managed. The economics are the same as any security problem: the defenders need to be right every time; the attackers only need to be right once.

The human in the loop

Despite all the ML models and real-time scoring, many fraud decisions still involve humans. Large issuers maintain fraud operations centres staffed around the clock. When a model flags a transaction as suspicious but not clearly fraudulent, say, a score of 0.6 out of 1.0, it goes to a human analyst who reviews the transaction in context and makes a judgement call.

These analysts are typically looking at a screen showing the transaction details, the cardholder’s recent history, the model’s risk score, and which specific features drove the score. They might call the cardholder to verify. They might approve the transaction but add the card to a watch list. They might block the card entirely.

The human analysts also provide the feedback loop that keeps the models accurate. When a flagged transaction turns out to be legitimate, or when an approved transaction is later reported as fraud, that outcome is fed back into the training data. The model learns. The next generation of the model is slightly better at distinguishing fraud from “person on holiday buying something unusual.”

This feedback loop has a timing problem. The label delay, the time between a transaction occurring and the fraud being reported, can be weeks or months. A cardholder might not notice a fraudulent transaction until their next statement. A chargeback can take 45-120 days to resolve. During that delay, the model is training on incomplete data. It knows about transactions, but it doesn’t yet know which of them were fraudulent. Various techniques exist to handle this (semi-supervised learning, delayed labelling strategies), but the fundamental problem, that the ground truth arrives late, makes fraud model training harder than most ML applications.

What’s next

Fraud detection is converging on a model where every transaction is assessed across multiple layers simultaneously: device, behaviour, transaction pattern, network graph (who’s transacting with whom), and real-time threat intelligence shared across institutions. The models are getting better. But so are the criminals.

The next piece of the puzzle isn’t how to catch fraud, but how to manage the money supply itself, the interest rates, the printing, the policy levers that determine how much money exists and what it’s worth. That’s the domain of central banks, and it’s stranger than you might expect.

How Central Banks Work is next, the institutions that control the money supply, set interest rates, and occasionally save the world from financial collapse.