How Databases Actually Work

July 22, 2026 · 29 min read

A database is the most important piece of software most developers never look inside. You write a query, data comes back, and the space between is a black box full of decades of computer science: B-trees for finding data fast, write-ahead logs for surviving crashes, MVCC for letting readers and writers coexist, and a query planner that makes hundreds of decisions you never see. Understanding what’s happening in that box changes how you write queries, how you design schemas, and how you diagnose problems when everything suddenly gets slow.

A brief history: from navigational to relational

Databases haven’t always worked the way they do now. Before SQL, before tables, before anyone had thought about relational algebra, databases were navigational: you found data by following pointers from one record to another, like traversing a linked list.

IBM’s IMS (Information Management System, 1966), built for the Apollo space programme, used a hierarchical model: data was organised in parent-child trees, and you navigated by walking the tree. CODASYL databases (late 1960s) used a network model: records could have multiple parents, forming a graph. Both were fast for their intended access patterns and painful for anything else.

In 1970, Edgar F. Codd, a researcher at IBM, published “A Relational Model of Data for Large Shared Data Banks”, which proposed organising data into relations (tables) and querying it with a formal language based on relational algebra. The idea was radical: separate the logical structure of data from its physical storage. Let the database figure out how to find the data efficiently. The programmer should describe what they want, not how to get it.

IBM built System R (1974-1979) as a research prototype, inventing SQL in the process. Larry Ellison, hearing about System R, founded Oracle and beat IBM to market with the first commercial SQL database in 1979. IBM eventually released DB2 in 1983. Michael Widenius created MySQL in 1995 as a lightweight alternative. And PostgreSQL traces its lineage to the POSTGRES project at UC Berkeley (1986-1994), led by Michael Stonebraker, which was designed to push the boundaries of what a relational database could do: extensible types, rules, and an emphasis on correctness.

The relational model won. Not because it was the fastest for any specific task, but because it was the most flexible. You could add new queries without restructuring the data. You could join tables in ways the original designers hadn’t anticipated. The separation of logical structure from physical storage meant the database could optimise independently of the application. Forty years later, this is still the dominant paradigm.

Why PostgreSQL is the default

If you’re starting a new project and need a relational database, the answer is almost certainly PostgreSQL. This wasn’t always the case. MySQL dominated the web era (the M in “LAMP stack”), and Oracle and SQL Server dominated the enterprise. But over the past decade, PostgreSQL has become the default choice for a confluence of reasons:

It’s genuinely open source. PostgreSQL is developed by a community under a permissive license (the PostgreSQL License, similar to BSD/MIT). There’s no company that owns it, no commercial version with extra features, no licence audit waiting to happen. MySQL, by contrast, is owned by Oracle, and the relationship between Oracle’s commercial interests and MySQL’s community development has been a source of friction since the 2010 acquisition of Sun Microsystems.

It’s standards-compliant. PostgreSQL follows the SQL standard more closely than any other major database. Features like window functions, common table expressions (CTEs), recursive queries, and lateral joins work exactly as the standard specifies. This matters when you need to write complex queries; PostgreSQL doesn’t force you into proprietary syntax.

It’s extensible. PostgreSQL supports custom data types, custom functions (in SQL, PL/pgSQL, Python, Perl, JavaScript, and others), custom index types, and extensions that add entire feature sets. PostGIS adds geographic data support. TimescaleDB adds time-series optimisation. pg_trgm adds trigram-based text similarity search. The extension ecosystem means PostgreSQL can adapt to use cases that would otherwise require a specialised database.

It’s rock-solid. PostgreSQL has a conservative release culture. Features are thoroughly reviewed before inclusion. Upgrades are well-documented. Data corruption bugs are rare and treated with extreme seriousness. When you put data in PostgreSQL, it stays there.

ACID compliance is the foundation: Atomicity (transactions are all-or-nothing), Consistency (data satisfies all constraints after each transaction), Isolation (concurrent transactions don’t interfere with each other), and Durability (committed data survives crashes). These properties aren’t just theoretical; they’re the reason you can trust a relational database with your business data.

How a query runs: parse, plan, execute

When you send a SQL query to PostgreSQL, it goes through a pipeline that would surprise most developers with its sophistication.

Parsing converts the SQL text into a parse tree, an internal representation of the query’s structure. The parser checks syntax (is this valid SQL?) and resolves identifiers (does this table exist? does this column exist in this table?). If you’ve made a typo, this is where you find out.

Planning (also called optimisation) is where the magic happens. The query planner takes the parse tree and produces an execution plan: a step-by-step recipe for retrieving the data. This is not a simple translation. For any non-trivial query, there are dozens or hundreds of possible execution strategies, and the planner’s job is to find a good one.

Consider a query that joins three tables with a WHERE clause:

SELECT o.id, c.name, p.title
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id
WHERE c.country = 'AU'
  AND o.created_at > '2026-01-01';

The planner must decide:

Join order: should it join orders-customers first, then products? Or orders-products first, then customers? With three tables there are only a few orderings, but with ten tables, there are millions.
Join algorithm: for each join, should it use a nested loop (scan one table, look up each row in the other), a hash join (build a hash table from one side, probe with the other), or a merge join (sort both sides and merge)?
Index usage: should it use the index on customers.country to find Australian customers first? Or the index on orders.created_at to find recent orders first? Or do a sequential scan on one of the tables because the index wouldn’t help?
Filter placement: should it filter by country before or after the join?

The planner makes these decisions using statistics: information about the data stored in each table. PostgreSQL maintains statistics about the number of rows in each table, the distribution of values in each column (histograms), the number of distinct values, the correlation between physical row order and value order, and more. These statistics are gathered by the ANALYZE command (which runs automatically as part of autovacuum).

The planner estimates the cost of each possible plan (in terms of disk I/O and CPU time) and chooses the plan with the lowest estimated cost. You can see the plan it chose with EXPLAIN:

EXPLAIN ANALYZE
SELECT o.id, c.name, p.title
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id
WHERE c.country = 'AU'
  AND o.created_at > '2026-01-01';

The ANALYZE keyword actually executes the query and shows real timing alongside the estimates. This is your primary diagnostic tool when a query is slow. The plan tells you exactly what the database is doing and where it’s spending time.

Execution follows the plan. The executor walks the plan tree, calling operators (scan, join, sort, aggregate) that read from indexes and tables, apply filters, combine results, and eventually return rows to the client.

The important insight is that SQL is declarative: you specify what data you want, not how to get it. The query planner decides the “how.” This is both SQL’s greatest strength (you don’t need to think about access patterns for simple queries) and its greatest pitfall (when the planner makes a bad decision, the query is slow, and understanding why requires understanding the planner).

A common cause of planner mistakes is stale statistics. If a table has grown from 1,000 rows to 10 million rows since the last ANALYZE, the planner still thinks it’s small and might choose a nested loop join where a hash join would be far better. Autovacuum runs ANALYZE automatically, but if you’ve just loaded a large batch of data, running ANALYZE manually can immediately improve query plans.

B-trees: the data structure behind your indexes

When you create an index on a column, PostgreSQL (by default) creates a B-tree. B-trees are the most important data structure in database engineering, and understanding them explains most index behaviour.

A B-tree is a balanced tree structure where:

Each node can hold multiple keys (not just one, like a binary tree)
All leaf nodes are at the same depth (the tree is balanced)
Internal nodes contain keys and pointers to child nodes
Leaf nodes contain keys and pointers to the actual table rows

The “B” doesn’t officially stand for anything (Rudolf Bayer and Edward McCreight, who invented them in 1972, never said), though “balanced” is the popular guess.

A B-tree with a branching factor of 100 (each node holds up to 100 keys) can index one million rows in just three levels: the root holds 100 keys, each child holds 100 keys (10,000 total), and each grandchild holds 100 keys (1,000,000 total). Finding any key requires reading at most three nodes: three disk reads. For a billion rows, you need five levels. Five disk reads to find any row among a billion. This logarithmic scaling is why B-trees work.

For range scans (WHERE age > 30 AND age < 50), B-trees are equally efficient. The leaf nodes are linked in order, so once you find the starting point (age = 30), you can walk the leaves sequentially until you reach the end point (age = 50). No tree traversal needed for subsequent rows.

B-trees are stored on disk as pages (8 KB by default in PostgreSQL). Each node is one or more pages. The branching factor is determined by how many keys fit in a page, which depends on the key size. Smaller keys mean more keys per page, which means fewer levels, which means fewer disk reads. This is one reason why indexing a small integer column is more efficient than indexing a large text column.

PostgreSQL also supports other index types for specific use cases: hash indexes for equality-only lookups, GiST (Generalised Search Tree) for geometric and full-text data, GIN (Generalised Inverted Index) for arrays and full-text search, and BRIN (Block Range Index) for very large tables where the indexed column is correlated with physical row order (like a timestamp on an append-only table).

But for the vast majority of indexes (the ones you create every day), B-trees are what you’re using, and their logarithmic lookup and efficient range scan behaviour are why your indexed queries are fast.

One thing that catches people: indexes aren’t free. Every index must be updated on every INSERT, UPDATE, and DELETE that touches the indexed column. A table with ten indexes requires ten index updates per row change. For write-heavy tables, excessive indexing can be a performance problem. The discipline is: add indexes that your queries need, remove indexes that nothing uses. PostgreSQL’s pg_stat_user_indexes view shows which indexes are being scanned and which are sitting idle, costing write performance and providing nothing in return.

The write-ahead log (WAL): surviving crashes

Every database faces the same existential question: what happens if the power goes out in the middle of a write?

If the database writes data directly to the table files, a crash mid-write leaves the files in an inconsistent state. Half a row might be written. An index might point to a row that no longer exists. A transaction that was supposed to be atomic might be half-applied.

PostgreSQL (and virtually every other serious database) solves this with a write-ahead log (WAL). The principle is simple: before you change the data, write down what you’re about to change in a separate, append-only log file. Then make the actual change. If the system crashes:

If the crash happened before the WAL entry was written, the change never happened; the data is still in its old, consistent state
If the crash happened after the WAL entry was written but before the actual data was changed, PostgreSQL replays the WAL on startup and applies the change
If the crash happened after both the WAL and the data were written, everything is fine

The WAL turns an arbitrary series of random writes (updating rows scattered across the disk) into a sequential append to a log file. Sequential writes are dramatically faster than random writes on both spinning disks and SSDs. The actual data pages can be written to disk later, in the background, whenever it’s convenient. This is called checkpointing: periodically flushing dirty pages from memory to disk and recording how far the WAL has been applied.

The WAL also enables replication. A standby server can receive WAL records from the primary and replay them, maintaining an identical copy of the database with a delay of only seconds or less. This is how PostgreSQL achieves high availability: if the primary fails, the standby can take over with minimal data loss. Streaming replication sends WAL records as they’re generated, and synchronous replication ensures the standby has received (or applied) each WAL record before the primary acknowledges the transaction to the client, guaranteeing zero data loss at the cost of write latency.

MVCC: readers don’t block writers

One of the hardest problems in database engineering is concurrency control: allowing multiple transactions to access the same data simultaneously without producing inconsistent results.

The simplest approach is locking: when a transaction reads a row, it takes a shared lock (other readers can proceed, but writers must wait). When a transaction writes a row, it takes an exclusive lock (everyone else waits). This works, but it means readers block writers and writers block readers. Under heavy load, everyone is waiting for everyone else.

PostgreSQL uses Multi-Version Concurrency Control (MVCC), an approach where readers never block writers and writers never block readers. The idea: instead of modifying a row in place, create a new version of the row and let each transaction see the version that was current when the transaction started.

When you update a row in PostgreSQL, the database doesn’t overwrite the old data. It marks the old row version as expired and inserts a new row version with the updated data. Both versions exist simultaneously in the table. A transaction that started before the update sees the old version. A transaction that starts after the update sees the new version. Neither transaction blocks the other.

Each row version has two hidden system columns: xmin (the transaction ID that created this version) and xmax (the transaction ID that expired this version, or 0 if it’s still current). When a transaction reads a row, it checks these values against its own snapshot: a record of which transactions were committed at the time the reading transaction started. If the row version was created by a committed transaction and hasn’t been expired by a committed transaction, it’s visible. Otherwise, it isn’t.

This is elegant but has a cost: dead tuples. Every update creates a new row version and leaves an old one behind. Every delete marks a row as expired but doesn’t remove it. Over time, the table accumulates dead row versions that no transaction can see but that still occupy space on disk and slow down sequential scans.

PostgreSQL’s VACUUM process reclaims dead tuples. Autovacuum, enabled by default, runs VACUUM automatically based on configurable thresholds (typically when 20% of a table’s rows have been updated or deleted). Understanding autovacuum is essential for PostgreSQL administration: if it falls behind, table bloat degrades performance. If it’s too aggressive, it competes with application queries for I/O.

The buffer pool: the real speed of your database

Databases don’t read from disk for every query. That would be impossibly slow. Instead, they maintain a large region of memory called the buffer pool (or shared buffers in PostgreSQL) that caches frequently accessed data pages.

When PostgreSQL needs to read a page (an 8 KB block of table or index data), it first checks the buffer pool. If the page is there (a cache hit), the read completes from memory in microseconds. If the page isn’t there (a cache miss), PostgreSQL reads it from disk (milliseconds for SSDs, tens of milliseconds for spinning disks) and stores it in the buffer pool for future access.

The buffer pool is managed by a clock-sweep algorithm (a variant of LRU) that evicts the least recently used pages when space is needed. In a well-tuned system, the vast majority of reads are cache hits. PostgreSQL’s pg_statio_user_tables view shows the number of heap blocks read from disk versus read from the buffer pool, and you can compute your cache hit ratio:

SELECT
  schemaname, relname,
  heap_blks_hit * 100.0 / NULLIF(heap_blks_hit + heap_blks_read, 0) AS cache_hit_pct
FROM pg_statio_user_tables
ORDER BY heap_blks_read DESC;

A healthy system typically has a cache hit ratio above 99%. If it’s significantly lower, either your buffer pool is too small for your working set, or your query patterns are scanning large amounts of data that don’t fit in memory.

The default shared_buffers setting in PostgreSQL is typically 128 MB, far too low for production. The common recommendation is 25% of available system memory, up to about 8-16 GB. Beyond that, diminishing returns set in because the operating system’s filesystem cache also caches the same data, and the two caches can work together.

Understanding the buffer pool explains most database performance characteristics. A query that touches only cached pages is fast. A query that causes many cache misses is slow. An index that reduces the number of pages touched makes the query faster not because “indexes are fast” but because they direct the query to the specific pages that contain the relevant data, and those pages are likely to be in the buffer pool because they’re accessed frequently.

When PostgreSQL isn’t the answer

PostgreSQL is excellent. It’s also not the correct tool for every job. The database landscape is full of specialised systems optimised for specific access patterns, and knowing when to reach for them is as important as knowing PostgreSQL well.

Document stores (MongoDB) store data as flexible JSON-like documents rather than fixed-schema tables. They’re a good fit when your data is genuinely document-shaped: different records have different fields, schemas evolve rapidly, and you typically retrieve entire documents by key. They’re a poor fit when you need complex queries across documents (joins), strict data integrity (foreign keys), or strong consistency. MongoDB has added transactions and schema validation over the years, making it more capable but also more like a relational database, which raises the question of why you’d choose it over PostgreSQL with its JSONB support.

Key-value stores (Redis, DynamoDB) provide sub-millisecond reads and writes by key. Redis keeps data in memory, making it ideal for caches, session stores, rate limiters, and any access pattern that’s “get this thing by its ID, very fast.” DynamoDB is AWS’s managed key-value/document store, designed for massive scale with single-digit millisecond performance. The tradeoff is flexibility: if your access pattern doesn’t fit the key structure, you’re in trouble. There are no ad-hoc joins, no flexible queries, no “give me all records where the amount is greater than 100.”

Column stores (BigQuery, Redshift, ClickHouse) are optimised for analytical queries over large datasets. Instead of storing data row by row (all columns of row 1, then all columns of row 2), they store data column by column (all values of column 1, then all values of column 2). This means a query like “sum the revenue column for the last year” only reads the revenue column and the date column, not the customer name, address, description, and fifty other columns. For analytical workloads that scan millions or billions of rows but only touch a few columns, column stores are orders of magnitude faster than row stores.

Time-series databases (TimescaleDB, InfluxDB) are optimised for data where time is the primary dimension: metrics, logs, sensor readings, stock prices. They handle high-volume ingestion of timestamped data, efficient time-range queries, automatic downsampling (averaging hourly data into daily data as it ages), and time-based retention policies. TimescaleDB is built as a PostgreSQL extension, which means you get time-series optimisation with full SQL and PostgreSQL compatibility.

Graph databases (Neo4j) store and query relationships as first-class citizens. In a relational database, finding all friends-of-friends requires multiple self-joins that get expensive as the depth increases. In a graph database, traversing relationships is the fundamental operation, and depth doesn’t significantly affect performance. If your data is a graph (social networks, recommendation engines, fraud detection, knowledge graphs) a graph database can express queries that would be impractical in SQL.

Search engines (Elasticsearch) are optimised for full-text search: finding documents that match a search query, ranking them by relevance, handling typos, synonyms, stemming, and faceted navigation. PostgreSQL has full-text search (and it’s quite capable for moderate use cases), but Elasticsearch is purpose-built for search at scale with sophisticated relevance tuning, near-real-time indexing, and distributed architecture.

The pattern: PostgreSQL is the correct default because it handles most access patterns well. You reach for a specialised database when your primary access pattern is something PostgreSQL handles adequately but not well enough, and the performance or scalability difference justifies the operational complexity of running another data store.

A common mistake is reaching for a specialised database too early. MongoDB for a startup that could use PostgreSQL with JSONB. Redis as a primary data store instead of a cache. Elasticsearch for a search feature that PostgreSQL’s full-text search could handle. Each additional database is another system to operate, monitor, back up, and keep available. The correct number of databases for most applications is one. Add a second when the first genuinely can’t serve a critical access pattern.

The CAP theorem: the distributed tradeoff

In 2000, Eric Brewer proposed (and in 2002 Seth Gilbert and Nancy Lynch proved) the CAP theorem: a distributed data store can provide at most two of three guarantees:

Consistency: every read returns the most recent write (all nodes see the same data at the same time)
Availability: every request receives a response (the system doesn’t refuse requests)
Partition tolerance: the system continues to operate when network communication between nodes is lost

The popular framing is “pick two.” The practical reality is: network partitions happen (cables get cut, switches fail, cloud availability zones lose connectivity), so you’re really choosing between consistency and availability during a partition.

A CP system (consistent + partition-tolerant) refuses requests during a partition rather than return potentially stale data. PostgreSQL with synchronous replication is CP: if the standby is unreachable, the primary can be configured to stop accepting writes rather than risk the standby falling behind.

An AP system (available + partition-tolerant) continues serving requests during a partition, accepting that different nodes may have different data. DynamoDB (in its default eventually-consistent mode) is AP: reads might return slightly stale data, but the system never refuses a request.

In practice, the CAP theorem is less of a binary choice and more of a spectrum. Most systems let you tune the tradeoff per operation. DynamoDB offers strongly consistent reads at higher cost. PostgreSQL can be configured for asynchronous replication that sacrifices some durability for availability.

The important thing for practitioners is to understand what your database promises and what it doesn’t. “Eventual consistency” means reads might be stale. “Strong consistency” means writes might fail during partitions. There’s no free lunch, only informed tradeoffs.

ORMs: the leaky abstraction

Object-Relational Mappers (ORMs: ActiveRecord, SQLAlchemy, Django ORM, Hibernate) translate between your application’s objects and the database’s tables. They’re convenient for CRUD operations: create a record, read a record, update a record, delete a record. You write user.save() instead of INSERT INTO users (name, email) VALUES ('Craig', 'craig@example.com').

The problem, famously described by Joel Spolsky as the Law of Leaky Abstractions, is that ORMs hide the SQL without eliminating the need to understand it.

The N+1 query problem is the classic ORM trap. You load a list of 100 orders, then access each order’s customer. The ORM loads the orders with one query, then issues a separate query for each customer: 101 queries instead of 1 query with a JOIN. The ORM’s lazy loading behaviour, designed to avoid loading data you don’t need, has turned one efficient operation into 101 inefficient ones.

ORMs also generate SQL that the query planner may not optimise well. A hand-written query can use database-specific features (window functions, CTEs, lateral joins, partial indexes) that the ORM doesn’t express. Complex reporting queries, aggregations across multiple tables, and analytical workloads are almost always better written as raw SQL.

The pragmatic approach: use the ORM for CRUD. It saves time and reduces boilerplate. But learn SQL. Know how to write a JOIN, a subquery, a window function. Know how to read an EXPLAIN plan. When the ORM-generated query is slow, drop down to raw SQL and fix it. The ORM is a productivity tool, not a replacement for understanding your database.

Migrations: changing the shape of production data

Schema changes in production are one of the most nerve-wracking operations in software engineering. Your application is serving traffic. Users are reading and writing data. And you need to add a column, change a type, create an index, or restructure a table, without downtime, without data loss, and without breaking the application.

PostgreSQL’s approach to schema changes has important performance implications:

Adding a nullable column with no default is fast; PostgreSQL just updates the system catalogue. No table rewrite needed. The new column is NULL for all existing rows, and PostgreSQL handles this without touching the data.

Adding a column with a default value used to require a full table rewrite (touching every row to set the default). Since PostgreSQL 11, this is fast for non-volatile defaults: the default is stored in the catalogue and applied lazily.

Creating an index locks the table against writes (by default). For a large table, this can take minutes or hours, during which no inserts, updates, or deletes can proceed. The solution is CREATE INDEX CONCURRENTLY, which builds the index without holding a write lock, at the cost of taking longer and requiring two passes over the table.

Changing a column type usually requires a full table rewrite. On a table with millions of rows, this takes time and holds locks. The workaround is to add a new column with the desired type, backfill it in batches, swap the columns, and drop the old one: a multi-step process that avoids long-running locks.

The discipline of migrations is: make changes that are backwards-compatible with the currently running application code, deploy the migration, then deploy the code that uses the new schema. This means:

Add new columns/tables, don’t remove or rename existing ones
Deploy the migration
Deploy application code that uses the new schema
(Later) Deploy a migration to remove the old columns/tables
Deploy application code that no longer references the old schema

This two-phase approach, sometimes called “expand and contract”, ensures that at every step, the running application code is compatible with the current database schema. It’s more work. It’s also the only way to achieve zero-downtime deployments with schema changes.

Transactions and isolation levels

ACID’s “I”, Isolation, is more nuanced than it first appears. The SQL standard defines four isolation levels, and understanding them matters because the default is usually not what you think it is.

Read Uncommitted: a transaction can see changes made by other transactions that haven’t committed yet. These are called “dirty reads.” Almost no one uses this intentionally because it means you can read data that might be rolled back a moment later.

Read Committed: a transaction only sees changes from committed transactions. This is PostgreSQL’s default. Each statement within a transaction sees a fresh snapshot: if another transaction commits between your first and second SELECT, your second SELECT sees the new data. This can lead to non-repeatable reads: the same query returns different results within the same transaction.

Repeatable Read: the transaction sees a snapshot taken at the start of the transaction’s first statement. No matter what other transactions commit while yours is running, your reads are consistent. PostgreSQL implements this with MVCC snapshots. The cost: if your transaction tries to update a row that another transaction has already modified and committed, PostgreSQL aborts your transaction with a serialisation error, and you need to retry.

Serialisable is the strongest level. Transactions behave as if they were executed one at a time, in some serial order. PostgreSQL implements this using Serialisable Snapshot Isolation (SSI), which detects potential anomalies and aborts transactions that would violate serialisability. This is the safest but most restrictive level; you’ll see more aborted transactions that need retrying.

Most applications run at Read Committed and handle the edge cases in application logic. But if you’ve ever seen a race condition in your application, two users booking the last seat, two processes decrementing the same inventory count below zero, the root cause is often that your isolation level doesn’t prevent the anomaly you’re experiencing. Before adding application-level locks, check whether a higher isolation level solves the problem more cleanly.

Connection pooling: the hidden bottleneck

Every connection to PostgreSQL creates a new operating system process (not a thread; PostgreSQL uses a process-per-connection model). Each process consumes memory: typically 5-10 MB of resident memory, plus whatever memory is needed for query execution. PostgreSQL’s default maximum connections is 100.

Most web applications use connection pools in their application framework (Rails’ connection pool, Django’s CONN_MAX_AGE, HikariCP for Java). But when you have multiple application servers, each with their own connection pool, the total number of database connections can quickly exceed PostgreSQL’s limits.

PgBouncer is a lightweight connection pooler that sits between your application and PostgreSQL. It maintains a pool of connections to PostgreSQL and multiplexes client connections onto them. In transaction pooling mode (the most common), a PostgreSQL connection is assigned to a client only for the duration of a transaction, then returned to the pool. This means 1,000 application connections can share 50 PostgreSQL connections, as long as fewer than 50 are in a transaction simultaneously.

The numbers matter: if each PostgreSQL connection uses 10 MB of memory, 100 connections use 1 GB. With PgBouncer, you can support thousands of application connections with a fraction of the PostgreSQL connections, keeping memory usage manageable and avoiding the performance cliff that hits when PostgreSQL runs out of connections and starts refusing them.

Connection pooling is one of those things that doesn’t matter at all until it matters enormously. If your application occasionally sees “too many connections” errors under load, or if your database server’s memory usage grows linearly with application server count, connection pooling is the fix.

Backups: the discipline you hope to never need

All of the engineering above, the WAL, MVCC, the buffer pool, the planner, is irrelevant if you lose your data. Backups are not optional.

PostgreSQL offers two backup strategies:

Logical backups (pg_dump) export the database as SQL statements or a custom archive format. They’re portable (you can restore to a different PostgreSQL version), flexible (you can dump specific tables), and self-contained. But they require reading every row in every table, which takes time on large databases and puts load on the server. For a 500 GB database, pg_dump might take hours.

Physical backups (using pg_basebackup or tools like pgBackRest and Barman) copy the raw data files and WAL segments. They’re much faster for large databases because they copy files at the filesystem level rather than reading individual rows. Combined with continuous WAL archiving, physical backups enable point-in-time recovery (PITR): you can restore to any moment in time, not just to when the backup was taken. This is invaluable when someone runs DELETE FROM users without a WHERE clause at 3:47 PM and you need to restore to 3:46 PM.

The rule of thumb: use logical backups for small databases and for migration between PostgreSQL versions. Use physical backups with WAL archiving for anything in production. Test your restores regularly. A backup you’ve never tested is a backup that might not work.

What this means for you

Databases are not magic. They’re software, built on data structures and algorithms that have been refined over forty years:

Read EXPLAIN output. When a query is slow, the execution plan tells you why. A sequential scan on a large table means there’s no useful index (or the planner chose not to use one). A nested loop join on a large table means the planner estimated the inner table was small (and was wrong). The plan is the diagnostic.

Understand the buffer pool. Most database performance is memory performance. If your working set fits in the buffer pool, queries are fast. If it doesn’t, queries hit disk. Sizing shared_buffers and understanding your cache hit ratio is more impactful than any query tuning.

Respect the WAL. Write-heavy workloads are ultimately limited by WAL write speed. Batching writes, using COPY instead of INSERT for bulk loads, and sizing WAL-related parameters appropriately (wal_buffers, checkpoint_timeout, max_wal_size) can dramatically affect write throughput.

Use PostgreSQL until you have a specific reason not to. It handles relational data, JSON documents, full-text search, geographic data, and time-series data. It’s the Swiss Army knife of databases. Reach for specialised databases when your scale or access pattern demands it, but know what you’re gaining and what you’re giving up.

Monitor what matters. Track cache hit ratio, transaction rate, replication lag, connection count, and autovacuum activity. PostgreSQL’s built-in statistics views (pg_stat_activity, pg_stat_user_tables, pg_stat_bgwriter) tell you most of what you need to know. Tools like pganalyze, Datadog, or even a simple Grafana dashboard with the pg_stat_statements extension (which tracks query performance statistics) give you visibility into what your database is actually doing.

Between your SQL and your data, there’s a query planner making hundreds of decisions, a buffer pool serving cached pages, a WAL guaranteeing durability, and an MVCC system allowing concurrent access. You don’t need to understand all of it all the time. But when your application is slow, when your database is struggling, when you need to make an architectural decision about where to store data: understanding what’s actually happening on the other side of that SQL query is the difference between debugging and guessing.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.