How Reputation Systems Work

August 03, 2026 · 36 min read

This is the final post in the Trust series. We’ve covered the fundamental problem, identity, encryption, and certificates, all of which are mechanisms for establishing trust between machines, or between a machine and a person. But what about trust between people who’ve never met? You can’t demand a cryptographic certificate from an eBay seller. You can’t verify the identity of an Airbnb host through a certificate authority. You need a different kind of trust, one built from observation, feedback, and the collective judgement of strangers.

The cold start problem

Every marketplace begins with the same chicken-and-egg dilemma: buyers won’t come without sellers, sellers won’t come without buyers, and neither will come without trust. This is the cold start problem, and every platform that connects strangers has to solve it before anything else.

eBay, founded in 1995 by Pierre Omidyar, is the canonical case. The story goes that the first item sold on eBay was a broken laser pointer, for $14.83. Omidyar contacted the buyer to make sure they understood the laser pointer was broken. The buyer replied that they were a collector of broken laser pointers. (The story may be apocryphal, eBay’s early records are contested, but it illustrates the point: early marketplace transactions happen between people with unusual levels of trust or unusual interests.)

eBay’s initial trust mechanism was the Feedback Forum, launched in 1996. After a transaction, both buyer and seller could leave feedback: positive, negative, or neutral, plus a short text comment. Your feedback score, the total number of unique users who left positive feedback minus negative, was displayed next to your username. Stars indicated milestones: yellow star at 10, blue at 50, turquoise at 100, and so on up to a silver shooting star at 1,000,000.

This system is simple, transparent, and revolutionary. It solved the trust problem by creating a public, persistent, portable reputation. Before eBay, your reputation existed only within your physical community. After eBay, your reputation was a number visible to every potential counterparty on the platform. The Maghribi traders of the 11th century (whom we met in Why Trust Is Hard) would have recognised it instantly: a gossip network, systematised and scaled.

How ratings actually work

The mathematics of reputation systems is deceptively complex. A number next to a username feels simple, but the design choices behind it determine whether the system builds trust or destroys it.

Binary vs granular feedback. eBay’s original system was trinary (positive/negative/neutral). Uber uses a 1-5 star rating. Amazon uses 1-5 stars. Airbnb uses 1-5 stars across multiple dimensions (cleanliness, accuracy, communication, location, check-in, value). Netflix once used 1-5 stars, then switched to thumbs up/down in 2017 because they found that people predicted their own enjoyment more accurately with a binary choice than a granular one.

The choice matters enormously. Rating inflation is the tendency for ratings to cluster at the top of any scale. On Airbnb, the average listing rating is approximately 4.7 out of 5. On Uber, a driver rating below 4.6 triggers warnings, and below 4.0 risks deactivation. The useful signal is compressed into a tiny range at the top of the scale. A 4.2 on Airbnb isn’t “good”, it’s a red flag. The five-star scale has effectively become a two-point scale: 5 (acceptable) and everything else (bad).

Why does inflation happen? Three reasons. First, social pressure: leaving a negative review for someone you’ve met face-to-face (an Uber driver, an Airbnb host) feels like a personal attack. Most people avoid confrontation. Second, reciprocity: on platforms where both parties rate each other, leaving a low rating invites retaliation. eBay eventually made buyer feedback visible only after both parties had rated, to mitigate this. Third, selection effects: unhappy customers are more likely to stop using the service than to leave a negative review. The people who stay and rate are disproportionately satisfied.

The economists Chris Nosko and Steven Tadelis studied this in the context of eBay and found that the feedback system significantly overstated seller quality, most sellers had near-perfect scores regardless of actual service quality. The signal was real but heavily compressed.

Bayesian approaches offer a more sophisticated alternative. Instead of a simple average, you can model each seller’s true quality as a probability distribution that updates with each new review. A seller with 10 reviews averaging 4.8 stars is less certain than a seller with 1,000 reviews averaging 4.8 stars. The Bayesian approach captures this uncertainty naturally. The beta distribution is commonly used: two parameters (alpha and beta, loosely corresponding to positive and negative observations) define a probability distribution over the seller’s “true” quality. New observations update the parameters. A seller with no reviews has a flat distribution (maximum uncertainty); a seller with many reviews has a tight distribution (high confidence). Reddit’s comment ranking algorithm, described by Randall Munroe (xkcd) in a widely cited 2009 blog post, uses a version of this approach: it ranks comments by the lower bound of a Wilson score confidence interval, which naturally penalises items with few votes even if those votes are all positive.

Sybil attacks: the identity problem, again

The most fundamental attack on any reputation system is the Sybil attack, named after the 1973 book Sybil (about a woman with multiple personality disorder). The attack is simple: create multiple fake identities and use them to manipulate the system. Leave positive reviews for yourself. Leave negative reviews for competitors. Inflate your reputation from nothing.

The name was coined by Brian Zill at Microsoft Research, and the attack was formalised by John Douceur in a 2002 paper, “The Sybil Attack.” Douceur showed that in any system without a trusted central authority that verifies identity, Sybil attacks are always possible. If creating an identity is cheap, manufacturing fake reputation is cheap.

The defences are all imperfect:

Identity verification makes creating accounts expensive. Requiring a phone number, a credit card, or government ID raises the cost of a Sybil attack from zero to nonzero. But phone numbers can be bought in bulk (Google Voice, virtual number services), credit cards can be prepaid, and ID verification can be defeated with synthetic identities. Amazon has reported that counterfeit reviews are a “persistent problem” despite requiring accounts linked to real purchases.

Purchase verification ties reviews to actual transactions. Amazon marks reviews as “Verified Purchase” if the reviewer actually bought the product. This helps but doesn’t eliminate the problem, sellers can send free products to reviewers or pay for fake purchases through third-party services. The UK’s Competition and Markets Authority opened an investigation into Amazon and Google over fake reviews in 2021; Amazon itself has said it blocks hundreds of millions of suspected fake reviews a year, which suggests the scale of the problem is enormous.

Behavioural detection uses machine learning to identify suspicious patterns: accounts that only review one seller, clusters of reviews posted at the same time, reviews with suspiciously similar language, accounts that appear in geographic clusters inconsistent with organic behaviour. Yelp’s “recommended review” algorithm is one of the most aggressive: it hides roughly 25% of all submitted reviews that it considers unreliable, based on user history, review patterns, and other signals. The algorithm is a trade secret, which means legitimate reviewers sometimes have their genuine reviews suppressed, a source of constant complaint.

Graph-based approaches analyse the social network of interactions. In a healthy marketplace, the graph of who-transacts-with-whom has organic structure: buyers interact with many sellers, sellers interact with many buyers, and the connections form a rich, tangled web. A Sybil attacker’s fake accounts tend to interact primarily with each other and with the attacker’s main account, creating a dense cluster weakly connected to the honest graph. The SybilGuard algorithm (Yu et al., 2006) and its successor SybilLimit (2008) exploit this structural signature to identify likely Sybil nodes.

But no defence is complete. The fundamental asymmetry is that creating a fake identity is cheap, and detecting it with certainty is expensive. Reputation systems live in a permanent arms race between the platform and the fraudsters, with each side adapting to the other’s latest move.

PageRank: trust as an algorithm

Google’s PageRank algorithm, published by Larry Page and Sergey Brin in 1998, is arguably the most influential reputation system ever built, even though it’s usually described as a search ranking algorithm rather than a trust system.

The insight: a web page is important if important pages link to it. This is circular (importance is defined in terms of importance), but it can be resolved mathematically. Model the web as a directed graph (pages are nodes, links are edges). Assign each page an initial score. Then iteratively redistribute scores: each page divides its score equally among its outgoing links, and each page’s new score is the sum of the scores flowing in from pages that link to it. Repeat until the scores converge. The steady-state scores are the PageRank values.

Formally, PageRank is the stationary distribution of a random walk on the web graph. Imagine a person clicking links at random, starting from an arbitrary page. At each step, they either click a random link on the current page (with probability d, the “damping factor,” typically set to 0.85) or jump to a completely random page (with probability 1-d). After an infinite number of steps, the fraction of time spent on each page is its PageRank. Pages that are linked to by many high-PageRank pages are visited more often, and thus ranked higher.

This is a trust algorithm. A link from a reputable page to your page is a vote of confidence, the digital equivalent of a letter of introduction. A link from a spammy page is worth less, because the spammy page’s own PageRank is low. Trust flows through the graph, weighted by the trustworthiness of the source. It’s the web of trust from the first post, formalised in linear algebra.

PageRank was revolutionary, but it was immediately gamed. Link farms, networks of sites that existed solely to link to each other, boosting each other’s PageRank, appeared within months. Google has spent the subsequent decades in an arms race against search engine optimisation (SEO) manipulation, adding hundreds of signals beyond PageRank and penalising manipulative link patterns. The original algorithm is now just one of thousands of signals in Google’s ranking system. But the core idea, that trust can be computed from the structure of relationships, remains foundational.

The web of trust: decentralised reputation

The PGP web of trust is a reputation system that works without any central authority. PGP (Pretty Good Privacy), created by Phil Zimmermann in 1991, uses public-key cryptography for email encryption and signing. But how do you know that a public key belongs to the person it claims to? Without a certificate authority, PGP relies on a decentralised model: users sign each other’s keys, attesting “I have verified that this key belongs to this person.”

Key signing creates a directed graph. If Alice signs Bob’s key and Bob signs Carol’s key, Alice can tentatively trust Carol’s key. Alice trusts Bob’s verification, and Bob verified Carol. The chain can be extended arbitrarily: Alice might trust Carol because she trusts Bob who trusts Carol, or because she trusts Dave who trusts Eve who trusts Carol.

In practice, the web of trust never scaled beyond the cryptography community. The UX was terrible (key management is hard), the trust model was confusing (what does it mean to sign someone’s key?), and the network effects never reached a tipping point. To use PGP effectively, you needed a critical mass of people you communicate with to also use PGP, and that mass never materialised outside certain technical communities.

The web of trust is interesting as a contrast to the CA model. The CA model is hierarchical: trust flows from root authorities downward. The web of trust is peer-to-peer: trust flows along personal relationships. The CA model scales well but creates single points of failure (compromise a root CA and the system breaks). The web of trust avoids single points of failure but doesn’t scale (you can’t bootstrap trust with a stranger who has no path to you in the graph).

Keybase (2014) attempted to modernise the web of trust by linking cryptographic keys to publicly verifiable social media identities. Instead of key signing parties, you proved your identity by posting a signed message on Twitter, GitHub, Reddit, or your personal website. Anyone could verify the link between the key and the social identity. Zoom acquired Keybase in 2020, and the service’s independent identity features have since been wound down, but the idea of using publicly verifiable actions to bootstrap trust was elegant.

Zero-knowledge proofs: trust without disclosure

One of the most remarkable developments in modern cryptography is the zero-knowledge proof (ZKP): a way to prove you know something without revealing what you know.

The concept was formalised by Shafi Goldwasser, Silvio Micali, and Charles Rackoff in a 1985 paper, “The Knowledge Complexity of Interactive Proof Systems.” The classic illustration is the Ali Baba cave analogy (proposed by Jean-Jacques Quisquater and others): imagine a cave with a ring-shaped passage and a locked door in the middle. Alice claims she knows the door’s secret code. Bob stands at the entrance and can’t see past the fork. Alice enters and takes a random path (left or right). Bob then calls out which side he wants her to emerge from. If she knows the code, she can always comply, she opens the door if necessary. If she doesn’t know the code, she can only comply 50% of the time (when Bob happens to call the side she entered from). After 20 rounds, the probability that Alice is bluffing and got lucky every time is less than one in a million. Bob is convinced Alice knows the code, but he never learned the code himself.

Zero-knowledge proofs have profound implications for trust and reputation:

Age verification without revealing age. A bar needs to know you’re over 18. With a ZKP, you can prove your age exceeds 18 without revealing your actual age, your name, your address, or anything else on your ID. The proof is mathematically convincing but reveals nothing beyond the single fact being proved.

Credential verification without revealing credentials. You could prove you hold a valid university degree without revealing which university, or prove you have a credit score above 700 without revealing the score itself. The EU’s proposed European Digital Identity framework includes provisions for selective disclosure based on zero-knowledge techniques.

Anonymous reputation. A user could prove they have a positive transaction history on a platform without revealing their identity. This separates reputation from identity, you carry your trustworthiness without carrying your name. Systems like zk-SNARKs (Zero-Knowledge Succinct Non-Interactive Arguments of Knowledge) and zk-STARKs (the transparent variant, requiring no trusted setup) make this computationally feasible.

The practical deployment of ZKPs is still early-stage for reputation systems, but the theoretical implications are significant: they offer a path to trust that doesn’t require surrendering privacy. In a world where reputation systems accumulate vast amounts of personal data, the possibility of proving trustworthiness without disclosing identity is genuinely exciting.

Platform trust: who watches the marketplace?

Every reputation system exists within a platform, and the platform itself is a trust actor with enormous power.

Airbnb guarantees bookings with a Host Guarantee (now called AirCover), providing damage protection and liability insurance. This isn’t peer-to-peer trust, it’s platform trust. You trust the stranger’s apartment partly because of their reviews, but mostly because Airbnb will compensate you if things go badly wrong. The reputation system is supplemented by institutional backstop.

Amazon curates the Buy Box, the default “Add to Cart” button when multiple sellers offer the same product. Which seller gets the Buy Box depends on price, shipping speed, seller metrics, and Amazon’s own assessment of reliability. This is Amazon acting as a trust intermediary: it’s making a recommendation about which seller to trust, based on data the buyer can’t see. The power this gives Amazon over sellers is extraordinary, and has been the subject of antitrust scrutiny in the US and EU.

Uber and Lyft rate both drivers and passengers. A driver with a low rating gets fewer rides and may be deactivated. A passenger with a low rating may find it harder to get pickups. The rating system is bilateral, which creates interesting dynamics: both parties are incentivised to behave well, but both parties also know that a low rating from the other side is possible retaliation for a low rating from their side.

The common thread: platforms don’t just host reputation systems. They are the trust infrastructure. They set the rules, they adjudicate disputes, they decide who’s allowed to participate. When a platform fails at this role, when Airbnb doesn’t handle a safety incident well, when Amazon can’t stop fake reviews, when Uber doesn’t protect drivers from abusive passengers, the trust failure isn’t in the reputation system. It’s in the institution.

The deepest problem: trust without institutions

All of the trust mechanisms in this series, cryptographic, institutional, reputational, share a common structure: they create a framework within which strangers can interact with manageable risk. The framework is always imperfect, always gameable, always dependent on some combination of mathematics, institutions, and human behaviour.

The most ambitious trust systems try to remove the institution entirely. Blockchain-based platforms like Bitcoin attempt to create trust through mathematics alone, no central authority, no reputation intermediary, just a consensus algorithm and a public ledger. The promise is trustless trust: you don’t need to trust anyone, because the protocol enforces the rules.

In practice, “trustless” is a misleading term. You’re still trusting the protocol designers, the miners or validators, the wallet software, and the exchange where you buy and sell. You’ve replaced trust in a bank with trust in a distributed system, but trust hasn’t been eliminated, it’s been redistributed. And the failure modes are different: a bank can reverse a fraudulent transaction; a blockchain cannot. Whether that’s a feature or a bug depends on whether you’re the victim of fraud or the victim of a capricious bank.

Full circle

We started this series with a simple observation: trust is a prediction about future behaviour, and making that prediction is hard when you can’t look someone in the eye. Every post since has explored a different mechanism for making the prediction easier: identity systems that verify who you’re talking to, encryption that keeps the conversation private, certificates that extend trust through chains of authority, and reputation systems that build trust from accumulated observation.

None of them is perfect. All of them are breakable. The history of trust is the history of inventing mechanisms and watching clever people defeat them, then inventing better mechanisms and watching cleverer people defeat those.

But here’s the thing: it works. Not perfectly, not always, but well enough. Billions of people buy things from strangers, share their homes with strangers, get into cars with strangers, and send money to strangers every day, with remarkably low rates of betrayal. The trust infrastructure, cryptographic, institutional, reputational, is invisible when it works, which is almost always. The padlock appears, the stars display, the payment clears, and the package arrives.

The infrastructure is imperfect, it’s held together by mathematics and goodwill and the constant vigilance of people who understand that trust is easier to destroy than to build. But it’s the best system we’ve built so far. And every time you hand your credit card to a stranger, or click “Buy Now” from a seller you’ve never met, or log in with a password you really should change, you’re trusting it. You just don’t notice.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.