How Randomness Works — Barking Iguana

You call Math.random() and get a number. It looks random. It feels random. But it’s not random at all – it’s the output of a deterministic algorithm, computed from a seed, producing a sequence that’s completely predictable if you know where it started. And for most purposes, that’s fine. Until it isn’t.

What randomness actually means

Randomness is about unpredictability. A sequence is random if you can’t predict the next value from the previous ones. That sounds simple, but it immediately raises a philosophical question: unpredictable to whom?

A coin flip is unpredictable to you because you don’t know the exact force, angle, air resistance, and surface properties involved. But it’s governed by Newtonian mechanics – in principle, with perfect information, you could predict the outcome. Is it random? For all practical purposes, yes. In the strict mathematical sense, no.

Information theory gives us a precise definition. Claude Shannon, in his 1948 paper that founded the field of information theory, defined entropy as a measure of uncertainty. A system has high entropy when there are many equally likely outcomes. A fair coin has 1 bit of entropy per flip: two equally likely outcomes. A fair six-sided die has about 2.58 bits of entropy per roll: six equally likely outcomes. A biased coin that lands heads 90% of the time has low entropy: the outcome is mostly predictable.

In computing, entropy measures the amount of genuine unpredictability in a system. A computer, being a deterministic machine, doesn’t naturally produce entropy. Left to itself, it does exactly what its program tells it to do, the same way every time. To get entropy, you need to look outside the computation – at the physical world.

Pseudorandom number generators (PRNGs)

Most of the “random” numbers your programs use aren’t random at all. They’re pseudorandom – produced by deterministic algorithms that generate sequences that look random to statistical tests but are completely predictable given the starting state.

A PRNG takes a seed – an initial value – and applies a mathematical function repeatedly to produce a sequence of numbers. Given the same seed, you get the same sequence. Every time. Without exception.

The simplest PRNG is the linear congruential generator (LCG), defined by the recurrence:

X(n+1) = (a * X(n) + c) mod m

where a, c, and m are carefully chosen constants. For example, the Numerical Recipes LCG uses a = 1664525, c = 1013904223, m = 2^32. Given a seed X(0) = 42, the next value is (1664525 * 42 + 1013904223) mod 2^32 = 1083814273. The value after that is computed from 1083814273. And so on.

LCGs are fast and simple, but they have serious statistical flaws. The low-order bits cycle with short periods. Sequential values fall on a small number of hyperplanes in multi-dimensional space (a problem discovered by George Marsaglia in 1968). They’re fine for video game shuffles. They’re terrible for anything requiring statistical quality or security.

The Mersenne Twister

The Mersenne Twister, developed by Makoto Matsumoto and Takuji Nishimura in 1998, was the default PRNG in most programming languages for over two decades. When you called random() in Python, rand() in Ruby, or Math.random() in older JavaScript engines, you were probably getting Mersenne Twister output.

Its name comes from the Mersenne prime 2^19937 - 1, which is the period of the generator – the sequence repeats after 2^19937 - 1 values. That’s a number with about 6,000 digits. For all practical purposes, the sequence never repeats.

The Mersenne Twister passes most standard statistical tests for randomness. Its output distribution is extremely uniform. It’s fast. It’s well-studied. It was a massive improvement over the LCGs it replaced.

But it is not cryptographically secure.

The Mersenne Twister maintains an internal state of 624 32-bit integers (about 2.5 KB). If an attacker can observe 624 consecutive outputs, they can reconstruct the entire internal state and predict every future output. This was demonstrated practically by James Roper in 2010 and others. The attack isn’t theoretical – it’s straightforward.

For shuffling a playlist, simulating dice rolls, or generating random terrain in a video game, this doesn’t matter. For generating cryptographic keys, session tokens, or anything security-sensitive, it’s catastrophic.

True random number generators

A true random number generator (TRNG) derives its output from a physical process that is genuinely unpredictable – not just hard to predict, but fundamentally non-deterministic according to our best understanding of physics.

Common entropy sources include:

Thermal noise (Johnson-Nyquist noise): the random voltage fluctuations in a resistor caused by the thermal motion of electrons. This is fundamentally quantum-mechanical in origin.
Shot noise: random fluctuations in electrical current caused by the discrete nature of electron flow.
Radioactive decay: the timing of nuclear decay events, which is genuinely random according to quantum mechanics.
Atmospheric noise: random radio-frequency signals from natural atmospheric processes. Random.org uses this source.

Intel’s modern CPUs include a hardware random number generator (the RDRAND/RDSEED instructions) that uses thermal noise in a digital circuit. When you execute RDRAND, the CPU samples a physical noise source, passes it through a conditioning algorithm, and returns genuinely random bits. This happens in hardware, on-chip, at speeds measured in hundreds of megabits per second.

/dev/random and /dev/urandom

On Unix-like systems (Linux, macOS, BSD), the operating system provides two special files for accessing randomness: /dev/random and /dev/urandom. Understanding the difference between them – and when to use which – is one of those things that separates people who understand security from people who think they do.

Both are interfaces to the kernel’s entropy pool – a store of randomness that the kernel collects from various sources:

Timing jitter in disk I/O operations
Timing jitter in keyboard and mouse events
Timing jitter in interrupt handling
Network packet timing
Hardware RNG output (RDRAND on Intel, etc.)
CPU execution time variability

The kernel continuously gathers entropy from these sources and feeds it into a cryptographic hash function (typically ChaCha20 or SHA-1/SHA-256, depending on the kernel version) to produce a stream of random bytes.

On Linux, historically, /dev/random would block – stop and wait – if the kernel estimated that the entropy pool was depleted. The idea was that if you’d used up all the “real” randomness, you should wait until more physical events occurred to replenish it. /dev/urandom would never block; it would continue producing output from the CSPRNG even if the entropy estimate was low.

This distinction caused enormous confusion and, ironically, security problems. Developers who wanted “the most secure option” would use /dev/random, and their applications would block unpredictably, especially on headless servers with no keyboard or mouse activity. Some would fall back to less secure alternatives when /dev/random blocked. Others would just hang, which is its own kind of security failure (denial of service).

The expert consensus, articulated clearly by Thomas Huehn and Daniel J. Bernstein, has long been that /dev/urandom is the right choice for virtually all purposes, including cryptographic key generation. The entropy estimation used by /dev/random is conservative to the point of being counterproductive, and the CSPRNG seeded with even modest amounts of initial entropy produces output that is computationally indistinguishable from true randomness.

As of Linux 5.6 (March 2020), the kernel developers agreed: /dev/random now behaves identically to /dev/urandom once the CSPRNG has been initially seeded. The only difference is that /dev/random will block during early boot, before sufficient entropy has been gathered for initial seeding. After that point, they’re the same.

On macOS and most BSDs, /dev/random and /dev/urandom have been functionally identical for years. The macOS kernel uses the Yarrow CSPRNG (now replaced by Fortuna), seeded from hardware entropy sources.

CSPRNGs: cryptographically secure pseudorandom number generators

A cryptographically secure PRNG (CSPRNG) is a PRNG that satisfies an additional requirement: even if an attacker observes arbitrarily many output values, they cannot predict the next output value or reconstruct the internal state.

More formally, a CSPRNG must satisfy two properties:

Next-bit unpredictability: given the first k bits of a random sequence, there is no polynomial-time algorithm that can predict the (k+1)th bit with probability significantly better than 0.5
State compromise recovery: if the internal state is compromised (leaked to an attacker), the CSPRNG must be able to recover security once re-seeded with fresh entropy

The Mersenne Twister fails property 1 (you can predict future outputs from 624 observed values). LCGs fail it spectacularly (you can recover the state from a handful of outputs).

Modern CSPRNGs include:

CSPRNG	Used by	Notes
ChaCha20	Linux kernel (since 4.8), OpenBSD, many applications	Stream cipher by Daniel Bernstein. Fast in software, resistant to timing attacks
AES-CTR-DRBG	NIST-compliant systems, many HSMs	Uses AES in counter mode. NIST SP 800-90A standard
Fortuna	macOS, FreeBSD	Designed by Bruce Schneier and Niels Ferguson. Multiple entropy pools for resilience
HMAC-DRBG	NIST-compliant systems	Uses HMAC for extraction and generation. NIST SP 800-90A standard

The general pattern is: take a well-studied cryptographic primitive (a stream cipher, a block cipher, or an HMAC), seed it with real entropy, and use it to generate a stream of random bytes. The security of the CSPRNG reduces to the security of the underlying primitive – if ChaCha20 is secure (and the cryptographic community believes it is), then a ChaCha20-based CSPRNG is secure.

Seeding: where randomness begins

A CSPRNG is only as good as its seed. If the seed is predictable, the entire output stream is predictable. This is the most common source of real-world randomness failures.

The Debian OpenSSL disaster (2008) is the canonical example. In 2006, a Debian maintainer commented out two lines of code in the OpenSSL random number generator because the Valgrind memory analysis tool flagged them as using uninitialised memory. Those two lines were responsible for adding entropy to the seed. Without them, the RNG’s seed was determined entirely by the process ID – a number between 1 and 32,768. This meant there were only 32,768 possible keys for every SSL certificate, SSH key, and other cryptographic material generated on Debian and Ubuntu systems for two years. The vulnerability (CVE-2008-0166) affected millions of keys.

The lesson: entropy gathering is load-bearing infrastructure. It looks like noise. It looks like “who cares about a few lines of random state.” And then everything built on top of it collapses.

Other notable seeding failures:

The PlayStation 3 ECDSA hack (2010): Sony’s implementation of ECDSA used the same random value k for every signature, instead of generating a fresh random k each time. This made it possible to extract the private key from two signatures, allowing arbitrary code to be signed and run on the PS3. The “random” wasn’t random. It was a constant.
The Android SecureRandom vulnerability (2013): A bug in Android’s Java Secure Random implementation meant that the PRNG was seeded with insufficient entropy on some devices. This allowed attackers to steal Bitcoin from Android wallets by predicting the k values used in ECDSA transaction signatures.

Why randomness matters for TLS

When you visit a website over HTTPS, the TLS handshake depends critically on randomness at multiple points:

Client and server randoms: the client and server each send a 32-byte random value during the handshake. These values are mixed into the key derivation, ensuring that even identical conversations between the same parties produce different session keys.
Ephemeral key generation: in modern TLS (1.3), the client and server each generate an ephemeral Diffie-Hellman key pair for each connection. The private key must be generated from a CSPRNG. If the private key is predictable, an eavesdropper can compute the shared secret and decrypt the traffic.
Session ticket encryption: TLS session tickets (which enable session resumption without a full handshake) are encrypted with keys that must be randomly generated.
Certificate private keys: the RSA or ECDSA private key associated with a server’s TLS certificate was generated by a CSPRNG at certificate creation time. If the PRNG was weak (as in the Debian case), the private key is recoverable.

Every link in this chain depends on the quality of the random numbers involved. Break the randomness at any point and the entire security model falls apart. The encryption is still mathematically sound – the random number generator just isn’t feeding it enough unpredictability.

Why randomness matters for TOTP and MFA

Time-based One-Time Passwords, as described in the MFA post, work by combining a shared secret with the current time to produce a 6-digit code. The security of the system rests on the shared secret being unpredictable.

When you scan a QR code to set up a TOTP app (like Google Authenticator), the QR code contains a base32-encoded shared secret – typically 160 bits of random data. That secret was generated by the server’s CSPRNG. If the CSPRNG is weak, an attacker who knows the weakness can predict the secrets assigned to other users and generate valid TOTP codes for their accounts.

The TOTP specification (RFC 6238) recommends a minimum shared secret length of 128 bits, generated by a “secure random generator.” The word “secure” is doing a lot of heavy lifting in that sentence. It means: not the Mersenne Twister. Not Math.random(). Not time() as a seed. A proper CSPRNG, properly seeded, producing output that is computationally indistinguishable from true randomness.

UUIDs: not all randomness is equal

UUIDs (Universally Unique Identifiers) come in several versions, and only one of them is based on randomness.

Version	Based on	Randomness
UUID v1	Timestamp + MAC address	None (deterministic from inputs)
UUID v3	MD5 hash of namespace + name	None (deterministic from inputs)
UUID v4	Random	122 bits of random data
UUID v5	SHA-1 hash of namespace + name	None (deterministic from inputs)
UUID v7	Timestamp + random	62 bits of random data + sortable timestamp

UUID v4 – the most commonly used version – is 122 bits of random data with 6 bits reserved for version and variant indicators. If generated from a CSPRNG, the probability of collision is vanishingly small: you’d need to generate about 2.71 x 10^18 UUIDs (2.71 quintillion) before the probability of a single collision reaches 50%. That’s about 103 trillion UUIDs per second for a year.

But if the PRNG is weak, UUID v4 loses its uniqueness guarantees. A Mersenne Twister with a predictable seed can produce the same sequence of UUIDs on different machines. And UUID v1, which uses the machine’s MAC address and the current time, leaks information about when and where it was generated – potentially a privacy concern, and definitely not suitable as a security token.

The newer UUID v7 (RFC 9562, 2024) combines a Unix timestamp with random data, producing UUIDs that are both unique and sortable by creation time. This is useful for database primary keys, where random v4 UUIDs cause performance problems because they scatter inserts across the index. The v7 format preserves temporal locality while maintaining uniqueness through the random component.

Entropy starvation and virtual machines

One of the practical challenges of randomness in modern infrastructure is entropy starvation – running out of entropy sources in environments where the traditional sources don’t exist.

A physical server has keyboard interrupts, mouse movements, disk I/O timing, and network packet jitter. A virtual machine running in AWS, Azure, or Google Cloud might have none of these. There’s no keyboard. There’s no mouse. The disk is a virtualised abstraction. Network timing is filtered through the hypervisor.

This matters in practice. A fresh virtual machine booting for the first time needs random numbers almost immediately – to generate SSH host keys, to seed the web server’s TLS implementation, to initialise the application’s session token generator. If the entropy pool is empty at boot time, these early random numbers will be drawn from a poorly-seeded CSPRNG.

Cloud providers have addressed this with virtio-rng – a virtualised hardware random number generator that passes entropy from the host machine (which has access to hardware RNG) to the guest VM. AWS provides this through its Nitro hypervisor. Linux VMs in the cloud also benefit from the haveged daemon, which generates entropy from CPU timing jitter.

But the problem is not fully solved. Containers, which share the host kernel’s entropy pool, can drain it faster than it’s replenished if many containers are generating keys simultaneously. Serverless functions, which start cold and need randomness immediately, face the same early-boot problem. And IoT devices – resource-constrained, often lacking hardware RNG, sometimes running on cheap microcontrollers – are the worst case. Many IoT security failures trace back to predictable random numbers generated on devices that simply don’t have access to enough entropy.

Randomness in everyday life

Randomness isn’t just a concern for cryptographers. It appears in places you might not expect:

Database sharding: when you distribute data across multiple database servers, you often hash a key to determine which shard holds a given record. The quality of the hash function’s distribution affects whether shards are evenly loaded.

Load balancing: random or weighted-random algorithms distribute requests across backend servers. A biased random number generator would overload some servers and underutilise others.

A/B testing: assigning users to experiment groups requires randomisation that’s both uniform (each group gets the same proportion of users) and deterministic per user (the same user should see the same variant on repeat visits). This is typically done by hashing the user ID with the experiment ID.

Shuffle algorithms: the Fisher-Yates shuffle, which produces an unbiased permutation, requires a uniform random number generator. A biased RNG produces a biased shuffle. In 2001, an online poker site was compromised because its shuffle used a PRNG seeded with the system clock in milliseconds, limiting the possible shuffles to about 86.4 million out of the 52! (approximately 8 x 10^67) possible orderings of a deck. Players who knew the algorithm could predict the deck.

The philosophical question

Is anything truly random? Quantum mechanics says yes – the outcome of a quantum measurement is fundamentally non-deterministic. Einstein famously disagreed (“God does not play dice”), but Bell’s theorem (1964) and subsequent experiments (Aspect et al., 1982; the 2022 Nobel Prize in Physics was awarded for this work) have conclusively shown that local hidden variable theories cannot explain quantum correlations. Quantum randomness is, as far as we can tell, irreducible.

This has practical implications. Quantum random number generators (QRNGs) – devices that generate random bits from quantum processes like photon detection or vacuum fluctuations – are commercially available. They provide randomness that is provably unpredictable, not just computationally unpredictable. For most applications, a well-seeded CSPRNG is indistinguishable from a QRNG. But for applications where you need to prove the randomness was genuine (certain cryptographic protocols, quantum key distribution), QRNGs are the only option.

For everyone else – for your web application, your database, your session tokens, your TLS keys – the practical guidance is straightforward:

Use your language’s or operating system’s CSPRNG (/dev/urandom, crypto.getRandomValues(), secrets.token_bytes() in Python, SecureRandom in Java/Ruby)
Never use Math.random(), rand(), or similar non-cryptographic PRNGs for anything security-sensitive
Don’t try to build your own entropy gathering – the operating system does it better than you will
Be aware of entropy starvation in virtualised and containerised environments
If you’re generating cryptographic keys, use the key generation functions provided by your cryptographic library, which handle the randomness correctly

Randomness is one of those foundations that only becomes visible when it breaks. When it works – when your CSPRNG is well-seeded, your TLS keys are strong, your TOTP secrets are unpredictable – everything built on top of it holds firm. When it fails, everything falls. The Debian OpenSSL bug affected two years of keys across millions of machines. The PS3 hack broke the entire platform’s code signing model. The Android vulnerability drained Bitcoin wallets.

The numbers look random. Whether they actually are determines whether the lock on your door is real or decorative.

Series

Under the Hood — deep dives into the technology we use every day.

What Time Is It? — coming around 21 April
The Wonderful, Absurd Science of Colour — coming around 28 April
How LLMs Actually Work — coming around 26 May
Time Is Weirder Than You Think — coming around 2 June
How Estimation Works (And Why It Doesn't) — coming around 30 June
How Clocks Work — coming around 21 July
How CI/CD Pipelines Work — coming around 18 August
How Email Works — coming around 15 September
How Computer Networks Work — coming around 22 September
How Randomness Works (you are here)
A Gentle Guide to Typography: From Chisels to Character Sets — coming around 17 November
How TLS Works: From Trusted Networks to Trust-No-One — coming around 24 November
How Multi-Factor Authentication Works — coming around 12 January
How the Internet Routes Traffic — coming around 2 March
A Guide to DNS — coming around 23 March