In Why Trust Is Hard we saw that digital trust breaks down into authentication, integrity, and confidentiality. In How Identity Works we tackled the identity question, passports to passkeys. Now we go deeper, into the mathematics that makes all of it possible. Encryption is the foundation of digital trust: without it, passwords are visible, messages are readable, and identity is unfalsifiable. Here’s how it works.
Hiding in plain sight
Encryption is the transformation of readable information (plaintext) into unreadable information (ciphertext) using a set of rules and a key, such that only someone with the right key can reverse the transformation. That’s the entire idea. Everything else is implementation.
The earliest known encryption is the Caesar cipher, used by Julius Caesar for military correspondence (described by Suetonius in The Twelve Caesars, c. 121 CE). The rule: shift every letter in the alphabet by a fixed number. With a shift of 3, A becomes D, B becomes E, ATTACK becomes DWWDFN. The key is the shift number. If you know the key, you shift back. If you don’t, you have a string of nonsense.
The Caesar cipher has a fatal weakness: there are only 25 possible keys (26 letters minus the identity shift). An attacker can try all of them in under a minute. This is brute force, trying every possible key until one produces readable plaintext. Any cipher whose key space is small enough to brute-force is useless.
The fix seems obvious: use a bigger key space. The substitution cipher replaces each letter with a different, arbitrary letter. A might become Q, B might become M, C might become Z. The key is the entire substitution table, 26 letters mapped to 26 letters. The number of possible keys is 26! (26 factorial), which is roughly 4 x 10^26. That’s 400 million billion billion possible keys. Brute force is hopeless.
And yet substitution ciphers were broken routinely by the 9th century. The Arab polymath Al-Kindi (Abu Yusuf Ya’qub ibn Ishaq al-Kindi, c. 801-873 CE) described the technique in his manuscript On Deciphering Cryptographic Messages, the oldest known work on cryptanalysis. His method: frequency analysis. In English, the letter E appears roughly 13% of the time. T appears about 9%. A about 8%. If you count the frequency of each letter in the ciphertext and match the distribution to the known distribution of the plaintext language, you can reconstruct the substitution table without ever knowing the key.
Frequency analysis is devastating because it exploits the structure of the plaintext. The cipher hides individual letters but preserves statistical patterns. Every substitution cipher in every language is vulnerable to it. Al-Kindi’s insight, that the weakness isn’t in the cipher mechanism but in the structure of the message, remains one of the deepest ideas in cryptanalysis.
Breaking out of substitution
The response to frequency analysis was the polyalphabetic cipher, which uses multiple substitution alphabets, switching between them according to a keyword. The most famous is the Vigenere cipher, described by Blaise de Vigenere in 1586 (though earlier versions were developed by Leon Battista Alberti around 1467 and Giovanni Battista Bellaso in 1553).
Here’s how it works. You choose a keyword, say, “LEMON”. You repeat it to match the length of your message:
Plaintext: A T T A C K A T D A W N
Keyword: L E M O N L E M O N L E
Each keyword letter specifies a shift: L shifts by 11, E by 4, M by 12, and so on. Each plaintext letter is shifted by a different amount depending on its position. The result: the same plaintext letter encrypts to different ciphertext letters depending on where it appears. Frequency analysis breaks because the frequencies are smeared across multiple alphabets.
The Vigenere cipher was considered unbreakable for nearly 300 years, it was called le chiffre indechiffrable. Then, in 1863, Friedrich Kasiski published a method for breaking it. If the keyword is shorter than the message (and it always is, in practice), the encryption pattern repeats. By finding repeated sequences in the ciphertext and measuring the distance between them, you can deduce the keyword length. Once you know the keyword length, you can split the ciphertext into groups (every 1st letter, every 2nd letter, etc.), and each group is a simple Caesar cipher, vulnerable to frequency analysis.
This arms race, between ciphers that try to hide structure and analysts who find it anyway, is the story of cryptography until the 20th century. The ciphers got more complex. The analysts got more clever. Neither side ever won permanently.
The machine age
World War II transformed cryptography from an art into an industrial process.
Enigma, the German cipher machine, was an electromechanical device that implemented a polyalphabetic cipher with a key space so large that brute force was impractical. It used three (later four) rotors, each wiring 26 input contacts to 26 output contacts in a scrambled pattern. The rotors stepped with each keypress, like an odometer, so the substitution alphabet changed with every letter. A plugboard at the front provided an additional layer of substitution. The total number of possible configurations was roughly 1.59 x 10^20, about 159 million trillion.
The British code-breaking effort at Bletchley Park, led by mathematicians including Alan Turing, broke Enigma by exploiting procedural weaknesses rather than mathematical ones. German operators used predictable message formats, weather reports always started with “WETTER” (weather), daily reports included standard headings. These cribs (known or guessed plaintext segments) gave the analysts a foothold. Turing designed the Bombe, an electromechanical device that could test Enigma configurations against a crib at mechanical speed, eliminating impossible settings until only the correct one remained.
The lesson is timeless: the cipher was strong, but the implementation was weak. The mathematics of Enigma was sound. The humans using it made it breakable. This pattern repeats endlessly in modern cryptography. The algorithm is rarely the problem. The key management, the implementation, the human procedures, those are where things fail.
The Lorenz cipher (codenamed “Tunny” by the British) was even more complex than Enigma, used for high-command communications. Breaking it led to the construction of Colossus, built by Tommy Flowers at the Post Office Research Station in 1943-44, one of the world’s first programmable electronic computers, designed specifically for cryptanalysis. Ten were built. They were classified for decades. Flowers received an award of £1,000 for his work, which didn’t cover his personal expenses on the project. The contribution to the Allied war effort was immeasurable.
Symmetric encryption: one key to rule them all
Everything described so far is symmetric encryption: the same key encrypts and decrypts. The Caesar cipher’s shift number, the Vigenere keyword, the Enigma machine’s rotor settings, all are shared between sender and receiver. If you know the key, you can both encrypt and decrypt.
Modern symmetric ciphers are orders of magnitude more sophisticated than Enigma, but the principle is identical.
DES (Data Encryption Standard) was adopted by the US government in 1977, based on a cipher developed by IBM. It uses a 56-bit key, meaning 2^56 (roughly 72 quadrillion) possible keys. In 1977, this was considered adequate. By 1998, the Electronic Frontier Foundation built a machine called Deep Crack for $250,000 that could try every possible DES key in about 56 hours. DES was dead, not because the algorithm was flawed, but because the key was too short. Technology made brute force practical.
AES (Advanced Encryption Standard) replaced DES in 2001 after a public competition run by NIST (the US National Institute of Standards and Technology). The winner was Rijndael, designed by two Belgian cryptographers, Joan Daemen and Vincent Rijmen. AES uses key lengths of 128, 192, or 256 bits. A 128-bit key has 2^128 possible values, that’s roughly 3.4 x 10^38, a number so large that if every atom in the observable universe were a computer trying a billion keys per second, they wouldn’t finish before the heat death of the universe.
AES is everywhere. It encrypts your hard drive (FileVault, BitLocker). It protects your web traffic (TLS). It secures your WiFi (WPA2, WPA3). It encrypts your iPhone backups. The US government uses it for classified information up to TOP SECRET (with 256-bit keys). When you hear “military-grade encryption” in a product advertisement, they almost certainly mean AES-256. The phrase is marketing, but the algorithm is real.
The key distribution problem
Symmetric encryption has an agonising weakness: how do you get the key to the other person?
If Alice wants to send Bob an encrypted message, they need to share a key first. But if they can securely share a key, they could presumably also securely share the message itself, and wouldn’t need encryption in the first place. This is the key distribution problem, and it was considered unsolvable for most of human history. Military forces used couriers, diplomatic pouches, one-time pad books physically delivered in advance. All of these require a secure channel that already exists before the encrypted channel can be established.
In 1976, Whitfield Diffie and Martin Hellman published “New Directions in Cryptography”, one of the most important papers in computer science, which proposed a way for two parties to agree on a shared secret over an insecure channel. Their method, now called Diffie-Hellman key exchange, works by exploiting a mathematical one-way function.
The intuition is easier than the mathematics. Imagine Alice and Bob each choose a private colour of paint. They agree on a public colour (say, yellow). Alice mixes her private colour with yellow and sends the result to Bob. Bob mixes his private colour with yellow and sends the result to Alice. Now Alice mixes Bob’s result with her private colour, and Bob mixes Alice’s result with his private colour. They both arrive at the same final colour, but an eavesdropper who saw only the public colour and the two mixed colours can’t easily work out the final colour, because unmixing paint is hard.
In the real protocol, “colours” are replaced by numbers, and “mixing” is replaced by modular exponentiation, raising a number to a power and taking the remainder after dividing by a large prime. The one-way-ness comes from the discrete logarithm problem: given g, p, and g^a mod p, it’s computationally infeasible to find a. You can mix the paint, but you can’t unmix it.
Diffie-Hellman didn’t solve encryption itself, it solved key exchange. Once Alice and Bob have a shared secret, they use it as a symmetric key for AES or another cipher. The asymmetric operation is expensive; the symmetric operation is cheap. So you use asymmetric cryptography to bootstrap the symmetric key, then switch to symmetric for the actual data. This is exactly what happens when your browser establishes a TLS connection.
(An important historical note: the British Government Communications Headquarters (GCHQ) independently discovered public-key cryptography before Diffie and Hellman. James Ellis conceived the idea in 1970, Clifford Cocks developed what is essentially the RSA algorithm in 1973, and Malcolm Williamson developed key exchange in 1974. All of this was classified. Ellis, Cocks, and Williamson received no public credit until the work was declassified in 1997.)
Asymmetric encryption: two keys are better than one
Diffie-Hellman solved key exchange, but it doesn’t provide encryption directly. For that, you need public-key cryptography, a system where each person has two keys: a public key (shared freely) and a private key (kept secret). Anything encrypted with the public key can only be decrypted with the private key, and vice versa.
RSA, published in 1977 by Ron Rivest, Adi Shamir, and Leonard Adleman at MIT, was the first practical public-key encryption system. Its security rests on a different one-way function: integer factorisation. It’s easy to multiply two large prime numbers together (a computer can do it in microseconds). It’s extraordinarily hard to factor the result back into its prime components. An RSA key is, at its heart, the product of two very large primes. The public key is derived from that product. Recovering the private key requires factoring it, which is computationally infeasible for sufficiently large numbers.
How large? The current recommended minimum RSA key size is 2048 bits, a number with 617 digits. The largest RSA key publicly factored (RSA-250, 829 bits) required 2,700 CPU-core-years of computation, announced in 2020 by a team led by Fabrice Boudot. Extrapolating, factoring a 2048-bit key with current technology is estimated to take longer than the age of the universe.
But RSA has a practical problem: it’s slow. Encrypting a large file with RSA is roughly 1,000 times slower than encrypting it with AES. The solution is the hybrid approach mentioned earlier: use RSA (or Diffie-Hellman, or elliptic curve cryptography) to exchange a symmetric key, then use AES for the actual data. Speed where you need it, security where you need it.
Elliptic curves: smaller keys, same strength
RSA works, but its keys are large and its operations are computationally expensive. Elliptic curve cryptography (ECC), independently proposed by Neal Koblitz and Victor Miller in 1985, offers the same security with dramatically smaller keys.
An elliptic curve, in this context, is a mathematical curve defined by an equation of the form y^2 = x^3 + ax + b, where the points on the curve (over a finite field) form a mathematical group. You can “add” points on the curve using a geometric operation: draw a line through two points, find where it intersects the curve, and reflect. Repeated addition gives you scalar multiplication: starting from a known point G (the generator), multiply by a secret integer k to get a public point Q = kG.
The one-way-ness: given G and Q, finding k is the elliptic curve discrete logarithm problem (ECDLP), which is believed to be harder than the ordinary discrete logarithm problem for the same key size. A 256-bit elliptic curve key provides roughly the same security as a 3072-bit RSA key. Smaller keys mean faster operations, less bandwidth, and less storage, critical for mobile devices and constrained environments.
The most widely used curve in practice is Curve25519, designed by Daniel J. Bernstein in 2005. It was engineered for safety: the parameters were chosen to be resistant to known classes of attacks and to make implementation errors less dangerous. It’s used in Signal, WhatsApp, SSH, TLS 1.3, and many other systems. Bernstein chose the curve’s parameters with deliberate transparency, every constant is derived from simple, verifiable sources, leaving no room for hidden backdoors. (This was a direct response to concerns about the NIST P-256 curve, whose parameters were generated by the NSA using a process that was never fully explained. After the Snowden revelations in 2013, those concerns intensified.)
Hashing: the fingerprint of data
A cryptographic hash function takes an input of any size and produces a fixed-size output (the hash or digest). The properties that make it useful:
- Deterministic: the same input always produces the same output.
- Fast: computing the hash is quick.
- One-way: given a hash, you can’t practically find the input that produced it.
- Collision-resistant: it’s practically impossible to find two different inputs that produce the same hash.
- Avalanche effect: changing even one bit of the input changes roughly half the bits of the output.
The most widely used hash function today is SHA-256 (part of the SHA-2 family, published by NIST in 2001). It produces a 256-bit hash, 64 hexadecimal characters. The SHA-256 hash of “hello” is 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824. Change “hello” to “Hello” and the hash becomes 185f8db32271fe25f561a6fc938b2e264306ec304eda518007d1764826381969. Completely different. No pattern. No way to predict how the output changes from the input change.
Hashes are used everywhere: password storage (as we saw in How Identity Works), file integrity verification (download a file, compare its hash to the published hash), digital signatures (sign the hash of a document, not the document itself), blockchain (each block contains the hash of the previous block, creating a tamper-evident chain).
MD5 (1991, by Ronald Rivest) and SHA-1 (1995, by the NSA) were once standard but are now broken for security purposes. In 2004, Xiaoyun Wang and colleagues demonstrated practical collision attacks against MD5, they could find two different inputs with the same MD5 hash in less than an hour on a standard PC. In 2017, a team from Google and CWI Amsterdam produced the first SHA-1 collision (the SHAttered attack), requiring 6,500 CPU-years and 110 GPU-years of computation. The result: two different PDF files with the same SHA-1 hash. SHA-1 was already being phased out, but SHAttered made the retirement urgent.
Digital signatures: proof that can’t be faked
Combine hashing with asymmetric encryption and you get digital signatures, the mathematical equivalent of a wax seal, but one that can’t be forged.
To sign a document, you hash it (producing a fixed-size digest), then encrypt the hash with your private key. The encrypted hash is the signature. Anyone can verify the signature by decrypting it with your public key and comparing the result to their own hash of the document. If the hashes match, two things are proven:
- Authentication: the document was signed by the holder of the private key.
- Integrity: the document hasn’t been changed since it was signed (if it had, the hash wouldn’t match).
Digital signatures also provide non-repudiation: the signer can’t plausibly deny having signed the document, because only their private key could have produced the signature. (Unless they claim their key was stolen, which is the digital equivalent of “that’s not my handwriting.”)
This mechanism is what makes digital certificates work, what makes software updates verifiable, what makes cryptocurrency transactions irreversible, and what makes your passport chip tamper-proof. It’s the single most important primitive in digital trust.
What encryption doesn’t protect
There’s a persistent myth that encryption makes you safe. It doesn’t. It makes specific things harder for specific attackers. Understanding what it doesn’t do is as important as understanding what it does.
Encryption doesn’t hide metadata. If you send an encrypted message to someone, an observer can still see that you sent a message, when you sent it, how long it was, and who you sent it to. The contents are hidden; the fact of communication is not. This metadata is often more revealing than the content. The former NSA general counsel Stewart Baker reportedly said, “Metadata absolutely tells you everything about somebody’s life. If you have enough metadata, you don’t really need content.” The NSA’s bulk metadata collection programme, revealed by Edward Snowden in 2013, was based on exactly this principle.
Encryption doesn’t protect endpoints. If malware on your phone can read your screen, it doesn’t matter that your messages are encrypted in transit, the attacker reads them before encryption or after decryption. This is the endpoint problem, and it’s why nation-state adversaries invest heavily in device exploitation rather than trying to break encryption directly. Why attack the mathematics when you can attack the phone?
Encryption doesn’t authenticate by itself. Encrypting a message doesn’t tell you who sent it. A perfectly encrypted message from an attacker is still an attack. This is why encryption and authentication are usually combined in practice. TLS provides both, and using one without the other is dangerous.
Encryption doesn’t survive the future. A message encrypted with today’s best algorithms is safe today. But an adversary could record the ciphertext and store it for decades, waiting for a computer powerful enough to break the encryption. This is the harvest now, decrypt later threat, and it’s why the cryptographic community is already working on post-quantum cryptography, algorithms designed to resist attacks from quantum computers that don’t yet exist but probably will. NIST announced the first post-quantum standards in 2024: CRYSTALS-Kyber for key exchange and CRYSTALS-Dilithium for digital signatures, both based on lattice problems rather than factoring or discrete logarithms.
The beautiful constraint
What makes cryptography remarkable isn’t the complexity of the mathematics. It’s the simplicity of the constraint. Every secure system is built on a single idea: there exist mathematical operations that are easy to perform in one direction and hard to reverse. Multiplying primes is easy; factoring is hard. Raising to a power mod p is easy; taking the discrete logarithm is hard. Adding points on an elliptic curve is easy; finding the scalar multiplier is hard.
If any of these “hard” problems turns out to be easy, if someone finds a fast factoring algorithm, or if quantum computers scale enough to run Shor’s algorithm on large numbers, the security of everything built on that foundation collapses simultaneously. It happened to DES when brute force became practical. It happened to MD5 when collisions became cheap. It will happen to RSA and ECC when (not if) quantum computers mature.
The entire edifice of digital trust rests on the assumption that certain problems are hard. Not proven hard, assumed hard. No one has proved that integer factorisation is inherently difficult. No one has proved that the discrete logarithm problem has no efficient solution. These are open problems in mathematics. The security of your bank account, your medical records, your private messages, all of it rests on problems we believe are hard because, after decades of effort by the world’s best mathematicians, nobody has found an efficient solution.
That’s either deeply reassuring or deeply terrifying, depending on your temperament.
Next: How Certificates Work, how the internet uses all of this cryptography to build chains of trust, and why your browser trusts your bank.