You type a URL and a page appears. Between that keystroke and those pixels, your browser performs DNS resolution, TCP handshake, TLS negotiation, HTTP request, HTML parsing, CSS computation, layout calculation, paint, and compositing, and it has about 100 milliseconds to feel fast.
The journey of a URL
Let’s trace what happens when you type https://example.com and press Enter. Every step between that keystroke and the page appearing is a potential bottleneck, and understanding the chain explains most of the performance advice you’ve ever heard.
Actually, the journey starts before you even finish typing. Modern browsers run an omnibox that predicts URLs as you type, pre-resolves DNS for likely candidates, and in some cases even pre-renders the predicted page in a hidden tab before you hit Enter. If the prediction is correct, the page appears instantly: not because it loaded fast, but because it loaded before you asked for it. This is Chrome’s “prerender” feature, and it’s genuinely eerie the first time you notice it.
Step 1: DNS resolution. Your browser needs to convert example.com into an IP address (a numeric identifier like 93.184.216.34). This is the internet’s phone book lookup: you have a name, you need a number.
The browser checks its own cache first. Then the operating system’s cache. Then it asks your configured DNS resolver (usually your ISP’s, or a public one like Cloudflare’s 1.1.1.1 or Google’s 8.8.8.8). That resolver may need to walk the DNS hierarchy (root servers, then the .com top-level domain servers, then the authoritative server for example.com) before returning the answer.
This can take anywhere from under a millisecond (cached locally) to 100+ milliseconds (cold lookup requiring multiple round trips across the planet). The answer gets cached at every level, so subsequent requests to the same domain skip most of these steps. But that first lookup, for a domain your browser has never seen before, on a resolver that hasn’t seen it recently? That’s real, measurable latency before a single byte of your page has been fetched.
Step 2: TCP handshake. With the IP address in hand, your browser opens a TCP connection. This is a three-step dance: your machine sends a SYN (synchronise) packet, the server responds with SYN-ACK (synchronise-acknowledge), and your machine sends a final ACK (acknowledge). Three packets, one and a half round trips.
If the server is 100 milliseconds away (roughly the latency between Perth and a server in London), the TCP handshake alone takes 150 milliseconds. If it’s 200 milliseconds away (Perth to the east coast of the United States), that’s 300 milliseconds. Just for permission to start talking. No data has been exchanged yet.
This is, incidentally, why CDNs exist: they put servers physically closer to users, reducing round-trip time and therefore handshake cost.
Step 3: TLS negotiation. Since the URL starts with https, the connection needs encryption. TLS (Transport Layer Security) adds another handshake on top of TCP. In TLS 1.2, that’s two more round trips; TLS 1.3 reduced it to one. During this handshake, client and server agree on encryption algorithms, exchange cryptographic keys, and verify the server’s certificate (which proves the server is who it claims to be, not an impersonator).
More round trips, more latency. By the time the first byte of actual content can flow, you’ve already spent 300 to 500 milliseconds on handshakes alone, depending on distance and protocol versions. That’s half a second of the user staring at a blank screen, and you haven’t even started fetching the page.
HTTP/3 and QUIC, the latest protocol improvements, address this by combining the transport and encryption handshakes into a single round trip, and by supporting 0-RTT resumption for repeat connections (if you’ve connected to this server before, you can start sending data immediately, with zero round trips of handshake). These aren’t just incremental improvements; they fundamentally change the cost of establishing a connection, especially on high-latency mobile networks.
Step 4: HTTP request. Now your browser sends the actual request: GET / HTTP/2 (or HTTP/3, if supported), along with headers describing what it can accept, what cookies it holds, what language you prefer, and other metadata. The server processes the request (fetching from a database, rendering a template, pulling from a CDN cache, running server-side code) and starts streaming back the response.
HTTP/2 brought a major improvement here: multiplexing. In HTTP/1.1, each TCP connection could carry only one request at a time. If you needed ten resources (an HTML file, three stylesheets, six images), the browser had to open multiple connections or queue requests sequentially. HTTP/2 lets the browser send all ten requests over a single connection, interleaved, and the server can send the responses back in any order, interleaved on the same connection. This eliminates head-of-line blocking (where a slow response holds up everything behind it) and dramatically reduces the overhead of fetching many resources.
Step 5: The response arrives. The first byte of the response reaching your browser is called Time to First Byte (TTFB), and it’s your first measurement of how fast this whole process is going. A good TTFB for a page served from a CDN cache is under 100 milliseconds. A page that needs to query a database and render a template might take 200 to 500 milliseconds. A page that hits a cold cache on an overloaded server can take seconds. From here, the browser’s real work begins.
Parsing HTML: building the DOM
The browser doesn’t wait for the entire HTML document to download before starting work. It streams the HTML, parsing it as bytes arrive, building a tree structure called the Document Object Model (DOM) incrementally. This is important: it means the browser can start building the page while the server is still sending the rest of the response. A well-designed server takes advantage of this by putting the most important content early in the HTML.
The DOM is a tree. The <html> element is the root. Inside it, <head> and <body> are children. Inside <body>, your <div>, <p>, <h1>, and other elements nest into a hierarchy. Every element, every attribute, every piece of text becomes a node in this tree. The browser constructs it top-down, left-to-right, as it reads the HTML.
The parser is remarkably tolerant of bad HTML. Miss a closing </p> tag? The parser infers one. Nest a <div> inside a <p> (which the HTML spec says you shouldn’t)? The parser quietly rearranges the tree to make it valid. Put a <body> inside a <head>? The parser fixes it. This error tolerance was a deliberate design decision from the early days of the web; Tim Berners-Lee wanted HTML to be forgiving, so that anyone could write it without needing a compiler. The HTML5 specification actually defines exactly how each type of error should be handled, so all browsers produce the same DOM from the same malformed HTML. It’s a specification for how to interpret mistakes, and it runs to hundreds of pages.
But there’s a catch, and it’s a big one.
When the parser encounters a <script> tag, it stops. Full stop. The parser halts, the browser downloads the script (if it’s external), compiles it, and executes it, because the script might call document.write() or modify the DOM in some way that changes how the rest of the HTML should be parsed. The entire parsing pipeline blocks on a single script. If that script is hosted on a slow third-party server, or if it’s a megabyte of minified JavaScript, your page sits there doing nothing while it loads.
This is why the oldest performance advice in web development is “put your scripts at the bottom of the body.” If the <script> tag is at the end of the HTML, the parser has already built the rest of the DOM before it hits the blocking point. The page can start rendering while the script downloads.
The reason scripts block parsing, by the way, is historical. In the early days of the web, document.write() was commonly used to inject HTML during page load. A script calling document.write('<h1>Hello</h1>') would insert that HTML at the script’s position in the document, and the parser needed to account for it. Today, document.write() is widely considered harmful (Chrome even blocks it in some situations), but the blocking behaviour remains the default because changing it would break backward compatibility with millions of existing pages.
Modern HTML gives you better options. The defer attribute tells the browser to download the script in parallel with parsing but wait to execute it until the DOM is fully built. Multiple defer scripts execute in the order they appear in the HTML, which means they’re safe for scripts that depend on each other. The async attribute tells the browser to download in parallel and execute as soon as it’s ready, which means it might execute before or after the DOM is complete, and the execution order of multiple async scripts is unpredictable. It’s only safe for scripts that don’t touch the DOM during initial load and don’t depend on other scripts.
Understanding the difference between defer and async is understanding the difference between cooperative and chaotic parallelism.
There’s also a newer approach: module scripts (<script type="module">). These are deferred by default, support import/export syntax, and have their own scope (they don’t pollute the global namespace). They’re the modern way to load JavaScript, and they bring the module system that Node.js developers have used for years to the browser.
The browser also has a speculative parser (sometimes called the preload scanner) that runs ahead of the main parser. When the main parser blocks on a script, the speculative parser continues scanning the HTML, looking for resources it can start downloading in advance: stylesheets, scripts, images. By the time the blocking script finishes and the main parser catches up, some of those resources may already be downloaded. This is a genuinely clever optimisation, and it means that blocking scripts aren’t quite as catastrophic as they sound, but they’re still bad.
CSS: the other blocking resource
While the HTML parser builds the DOM, it also encounters <link> tags pointing to CSS stylesheets. CSS is render-blocking: the browser will not paint a single pixel until all CSS in the <head> has been downloaded and parsed into a structure called the CSS Object Model (CSSOM).
Why? Because the browser can’t know what anything looks like until it has all the style rules. If a stylesheet loaded later changes the background colour of the entire page from white to black, painting the page white first and then repainting it black would look terrible: a flash of wrong content. So the browser waits. All CSS must be resolved before anything gets drawn.
Note that CSS is render-blocking but not parser-blocking. The HTML parser can continue building the DOM while CSS downloads; it just won’t paint anything until the CSS is ready. This is a crucial distinction: a blocking script stops the parser dead (nothing below it gets parsed), but blocking CSS only stops the renderer (the parser keeps building the DOM, accumulating work that can be painted the instant the CSS arrives). This is why it’s sometimes better to have your CSS load slowly than to have a script load slowly: at least with CSS, the DOM is still being prepared in the background.
The CSSOM is another tree, mirroring the DOM’s structure but describing the computed styles for each element. Building it requires the browser to resolve the cascade: the set of rules that determines which styles apply when multiple rules target the same element. The word “cascade” is right there in the name: Cascading Style Sheets. It refers to the waterfall of priority rules that determine which style wins when multiple rules compete.
Specificity (inline styles beat IDs beat classes beat element selectors), source order (later rules override earlier ones for equal specificity), and inheritance (children inherit certain properties like font-family and color from their parents) all feed into the cascade. Then there are !important declarations (which override normal specificity), user-agent stylesheets (the browser’s built-in defaults), and the relatively new cascade layers (@layer), which add yet another dimension to the priority system.
The computation is not trivial, especially on large pages with complex stylesheets. A modern web application might have thousands of CSS rules, and the browser must evaluate every one against every element in the DOM to determine which rules match. Browsers use clever data structures (hash maps keyed on tag names, class names, and IDs) to avoid checking every rule against every element, but for pages with many elements and many rules, style recalculation can take tens of milliseconds: a significant chunk of the rendering budget.
This is why critical CSS matters. The idea: identify the CSS rules needed to render the content visible in the initial viewport (the “above-the-fold” content), inline those rules directly in the <head> as a <style> block, and load the rest of the CSS asynchronously. The browser can paint the visible content immediately, without waiting for the full stylesheet to download. The rest loads in the background and gets applied once it arrives.
The render tree and layout
Once the browser has both the DOM and the CSSOM, it combines them into a render tree: a tree of only the visible elements, with their computed styles attached.
This combination is where several interesting decisions happen. Elements with display: none don’t appear in the render tree at all; they exist in the DOM but they’re invisible and take up no space, so the browser simply skips them. But elements with visibility: hidden do appear in the render tree: they’re invisible but still take up space, which means the browser must include them in layout calculations. It’s a subtle difference that trips up many developers.
Pseudo-elements like ::before and ::after appear in the render tree even though they’re not in the DOM: they’re generated by CSS, so they only exist after the DOM and CSSOM are combined. The <head> element and its children (which contain metadata, not visual content) are excluded from the render tree. The render tree is the browser’s definitive plan for what needs to be drawn and how.
Next comes layout (also called reflow in some browsers). This is where the browser calculates the exact position and size of every element on the page. Width, height, margin, padding, border, position: all of it gets computed into concrete pixel values.
Layout is a constraint-solving problem, and it’s more complex than most people realise.
A <div> set to width: 50% needs to know the width of its parent. A float: left element affects the position of everything after it. A flex container distributes space among its children according to rules that can involve minimum widths, maximum widths, growth factors, and wrapping. CSS Grid introduces a two-dimensional layout model where rows and columns interact with each other, and where auto sizing depends on content that might itself depend on the available space. The browser walks the render tree, resolving these constraints from the outside in (the viewport width determines the body width, which determines the content width, and so on down the tree).
Text layout is its own deep problem. The browser must measure each glyph, calculate line breaks (considering word boundaries, hyphenation, and the CSS word-break and overflow-wrap properties), handle bidirectional text (mixing left-to-right English with right-to-left Arabic in the same paragraph), apply kerning and ligatures, and respect line-height, letter-spacing, and text-align. All of this feeds back into the box model: the height of a text block depends on how many lines it wraps to, which depends on the available width, which depends on the layout of its container.
Layout is expensive. Changing the width of a single element can trigger layout recalculation for every element in its subtree, and potentially further, if the change affects the height of a parent that affects the layout of siblings. This cascading effect is why changing one CSS property can sometimes take far longer than you’d expect; the browser isn’t just recalculating one box, it’s recalculating every box that depends on it.
This is why performance guides warn against “layout thrashing”: the pattern of reading a layout property (like element.offsetWidth, which forces the browser to calculate layout to give you a current value), writing a change (which invalidates that layout), reading again (which forces another full calculation), and so on in a tight loop. Each read-write cycle forces a full layout pass. A loop that reads and writes layout properties a hundred times triggers a hundred layout passes in a single frame. The browser can’t batch them because each read needs the result of the previous write.
The fix is to batch your reads and writes: do all the reading first (the browser calculates layout once), then do all the writing (the browser invalidates layout but doesn’t recalculate until the next read or the next frame). Libraries like fastdom exist specifically to help with this pattern.
Paint and compositing
After layout, the browser knows where everything goes. Now it needs to actually put pixels on the screen.
Paint is the process of filling in the pixels: drawing backgrounds, borders, text, images, shadows, and anything else that’s visible. It’s where the abstract tree of boxes and styles finally becomes something visual.
The browser doesn’t paint everything onto a single canvas. Instead, it identifies layers: portions of the page that can be painted independently. An element with position: fixed, an element with a CSS transform, an element with will-change: transform, a <video> element, a <canvas> element: these all get their own layer. The browser paints each layer into its own bitmap.
Why layers? Because if something on the page changes, the browser only needs to repaint the affected layer, not the entire page. A fixed header that sits on top of scrolling content gets its own layer; when you scroll, only the content layer moves. The header layer stays put. Neither needs to be repainted. This separation is what makes scrolling feel smooth even on pages with complex layouts.
Compositing is the final step: taking all those painted layers and combining them into the final image that appears on screen. It’s the browser’s version of Photoshop’s layer stack: each layer is a transparent bitmap, and compositing arranges them in the correct order, applies any transforms or opacity changes, and produces the final pixel output. This step is handled by the compositor thread, which runs separately from the main thread. The compositor can manipulate layers (moving them, scaling them, adjusting their opacity) without touching the main thread at all.
On most modern devices, compositing is hardware-accelerated: the GPU does the work. GPUs are extraordinarily good at the operations compositing requires: moving rectangles of pixels around, blending them with transparency, and applying affine transforms. This is what GPUs were designed for (before machine learning hijacked them), and it’s why compositing-only operations are so much faster than anything that involves the main thread.
This is the key to understanding why some CSS animations are smooth and others are janky.
Animating transform or opacity is smooth because these properties only affect compositing. The layer has already been painted; the compositor just moves it or fades it, entirely on the GPU, sixty (or more) times per second. The main thread is free to do other work. You can literally have JavaScript doing heavy computation while a CSS transform animation runs at a buttery 60fps, because they’re on different threads.
Animating width, height, margin, top, left, or any property that affects layout is expensive because it triggers the full pipeline: layout, repaint, composite. Every frame. At 60 frames per second, the browser has 16.7 milliseconds per frame to recalculate layout, repaint the affected layers, and composite the result. If layout takes longer than that (and on complex pages, it easily can) frames get dropped and the animation stutters. At 120fps (increasingly common on modern devices), the budget shrinks to 8.3 milliseconds. At 240fps on a gaming monitor, it’s 4.2 milliseconds. The faster the refresh rate, the less time the browser has per frame, and the more important it is to keep work off the main thread.
This is why “use transform: translateX() instead of left” isn’t a micro-optimisation; it’s the difference between an animation that runs on the GPU compositor thread and one that blocks the main thread sixty times per second.
The will-change CSS property is worth mentioning here. Adding will-change: transform to an element tells the browser to promote it to its own layer in advance, so when the animation starts, the layer is already prepared and compositing can begin immediately. Without it, the browser might need to create the layer on the first frame of the animation, causing a visible stutter. But will-change has a cost: each layer consumes GPU memory, and promoting too many elements can actually degrade performance. It’s a hint, not a magic fix, and it should be applied surgically to elements that will actually animate.
The main thread bottleneck
Understanding the browser’s threading model explains nearly all web performance advice.
Modern browsers (Chrome, Firefox, Edge, Safari) use a multi-process architecture. Each tab typically gets its own process, providing isolation (a crash in one tab doesn’t bring down the browser) and security (a compromised page can’t read another tab’s memory). Within each tab’s process, the critical distinction is between the main thread and everything else.
The main thread handles: HTML parsing, CSS computation, JavaScript execution, layout, paint, DOM manipulation, event handling, and garbage collection. It does all of this on a single thread. When JavaScript is executing, the browser cannot respond to user input. When layout is being calculated, JavaScript cannot run. Everything queues up, waiting for its turn.
This is why a page “freezes” when heavy JavaScript runs. The main thread is busy executing your script, and until it’s done, it can’t process the click event, run the layout pass, or paint the next frame. The page is alive (the network thread is still fetching resources, the compositor is still displaying the last painted frame) but the main thread is blocked, and nothing interactive can happen.
The compositor thread handles scrolling and compositing-only animations. This is why you can usually still scroll a page even when JavaScript is running heavily: scrolling is handled by the compositor, which operates independently. (Unless, of course, JavaScript has attached a scroll event handler that calls preventDefault(), which forces the compositor to check in with the main thread before scrolling. This is why passive event listeners ({ passive: true }) exist: they’re a promise to the browser that your handler won’t cancel the scroll, so the compositor can scroll immediately without waiting.)
The network thread handles resource fetching. It runs independently from the main thread, which is why resources continue downloading even when JavaScript is blocking everything else.
There are other threads too. The raster threads (sometimes called tile workers) handle the actual pixel-painting work, converting paint commands into bitmaps on background threads. The audio thread handles Web Audio processing. Web Workers give JavaScript developers a way to run CPU-intensive code off the main thread: a web worker runs in its own thread with its own global scope, communicating with the main thread via message passing. Workers can’t touch the DOM (only the main thread can), but they can do heavy computation, data processing, or cryptography without blocking user interaction.
The main thread is the bottleneck, and nearly every performance problem is a main thread problem. If you understand what the main thread does and what it doesn’t do, you understand where performance problems come from and how to fix them.
The metrics that matter
Web performance measurement has converged on a set of standardised metrics, most championed by Google as Core Web Vitals. They measure different aspects of the user experience.
First Contentful Paint (FCP) is when the first text, image, or canvas render appears on screen. It’s the moment the user sees something instead of a blank page. FCP is affected by everything in the critical rendering path: DNS, TCP, TLS, TTFB, HTML parsing, CSS downloading and parsing, and the initial paint.
Largest Contentful Paint (LCP) is when the largest visible content element (usually a hero image or main heading) finishes rendering. This is a better proxy for “the page looks ready” than FCP, because FCP might just be a navigation bar or a loading spinner. Google considers an LCP under 2.5 seconds to be good.
Interaction to Next Paint (INP) replaced the older First Input Delay (FID) metric in 2024. While FID measured the delay before the browser started processing your first interaction, INP measures the time from any user interaction (click, tap, keypress) to the next visual update, capturing the entire processing time, not just the initial delay. An INP under 200 milliseconds is the target. High INP means the main thread is busy when you’re trying to interact.
INP is arguably the most important of the Core Web Vitals because it captures what users actually care about: “I clicked, and something happened quickly.” A page with perfect LCP and zero CLS but terrible INP feels broken. You can see the content, it’s not jumping around, but nothing responds to your input. The page is a painting, not an application.
The 200-millisecond threshold isn’t arbitrary. It comes from research on human perception of responsiveness. Below about 100 milliseconds, an interaction feels instant: the response and the input seem simultaneous. Between 100 and 300 milliseconds, the user perceives a slight delay but considers it acceptable. Above 300 milliseconds, the delay is noticeable and frustrating. Above a second, the user’s attention breaks and they start wondering if the page is broken.
Cumulative Layout Shift (CLS) measures visual stability: how much elements on the page jump around unexpectedly as it loads. If you’ve ever started reading an article and had the text leap downward because an ad loaded above it, that’s a layout shift. If you’ve tried to tap a button on your phone and the page shifted at the last millisecond so you tapped an ad instead, that’s a layout shift with real consequences.
CLS quantifies this misery. A score under 0.1 is the target. The most common causes: images without explicit width and height attributes (the browser doesn’t know how much space to reserve until the image loads), ads that inject themselves into the page, dynamically injected content (cookie consent banners, newsletter popups, notification bars), and web fonts that change the dimensions of text when they load.
CLS is measured over the entire lifespan of the page, not just during load. A page that’s perfectly stable during load but then shifts everything when a lazy-loaded ad appears ten seconds later still gets a bad CLS score. This is intentional; the metric captures the full experience, not just the first few seconds.
Why pages feel slow
Armed with an understanding of the rendering pipeline and the metrics, the usual suspects become clear. Most performance problems fall into a small number of categories, and most of them trace back to the same root cause: too much work on the main thread, too early in the page load.
Too much JavaScript. The median web page in 2025 ships over 500 kilobytes of compressed JavaScript, which decompresses to several megabytes of source code. Every byte must be downloaded, decompressed, parsed into an abstract syntax tree, compiled to bytecode (or just-in-time compiled to machine code for hot paths), and executed. All of this happens on the main thread.
The cost isn’t just the download. On a mid-range mobile phone (the kind most of the world actually uses, not the flagship in your pocket) parsing and compiling 1 MB of JavaScript can take 2 to 4 seconds. That’s 2 to 4 seconds where the main thread is completely occupied with JavaScript compilation and cannot respond to any user input. The page is there, on screen, but tapping a button does nothing. The user taps again, harder, as if force will help. It won’t. The main thread is busy.
A page that sends 2 MB of JavaScript before showing any content is asking the main thread to do an enormous amount of work before it can respond to a click. And that JavaScript often includes entire UI frameworks, state management libraries, utility libraries, and polyfills, much of which may never be needed for the current page.
Third-party scripts. Analytics, ad networks, chat widgets, A/B testing frameworks, consent management platforms, social media embeds, customer support widgets, heatmap trackers, session recorders: each one adds JavaScript that competes for main thread time. A single poorly-written third-party script can block rendering for hundreds of milliseconds. And because they’re third-party, you don’t control when they load, how big they are, or what they do.
Tag managers (Google Tag Manager being the most common) make this worse by making it trivially easy for marketing teams to add scripts without engineering review. A tag manager is a script that loads other scripts, and the scripts it loads can load still more scripts. The result is a cascade of third-party code that nobody on the engineering team approved, nobody monitors for performance regressions, and nobody can easily remove because “marketing needs it.”
The typical news or e-commerce site loads scripts from twenty to fifty different third-party domains. Each script has its own download, parse, compile, and execute cost. Many of them fight for the same main thread time. Some of them add their own event listeners that fire on every scroll, click, or keystroke. The cumulative effect can add seconds to page load time and hundreds of milliseconds to every interaction.
Render-blocking resources. CSS and synchronous JavaScript in the <head> block rendering until they’ve fully loaded and been processed. A large external stylesheet hosted on a different domain requires DNS resolution, TCP handshake, TLS negotiation, and a full download before the browser can paint anything. Multiply this by several stylesheets and scripts from different domains, and the first paint can be delayed by seconds.
Here’s a concrete example of how this cascades. Your HTML references a CSS file on your domain and a JavaScript file from a third-party analytics provider. The CSS blocks rendering. The JavaScript blocks parsing. The analytics script is hosted on a domain the browser hasn’t connected to before, so it needs DNS + TCP + TLS before it can even start downloading. Meanwhile, the CSS file references a web font hosted on yet another domain. The font won’t start downloading until the CSS is parsed and the browser discovers the @font-face rule. Each dependency creates a chain: you can’t start step N until step N-1 completes. This is the critical path, and minimising its length is the single most impactful thing you can do for page load performance.
Images without dimensions. An <img> tag without width and height attributes causes a layout shift when the image loads, because the browser initially allocates zero space for it and then has to reflow the page when it discovers the image’s actual dimensions. Setting dimensions (or using the CSS aspect-ratio property) lets the browser reserve the correct space upfront. Modern browsers are clever about this: if you set width and height attributes, the browser automatically calculates the aspect ratio and reserves space even before the image starts downloading. It’s one of the simplest, most effective performance improvements you can make: add two attributes to your <img> tags and your CLS score drops.
Web fonts. When a page specifies a custom font, the browser has to download the font file before it can render text in that font. During the download, the browser has two choices: show invisible text (called FOIT, Flash of Invisible Text) or show text in a fallback font, then swap to the custom font when it arrives (called FOUT, Flash of Unstyled Text). FOIT means the user sees blank space where text should be, which feels like the page is broken. FOUT means the text jumps when the font loads, which causes layout shifts. The font-display: swap CSS property forces FOUT, which is generally preferable: at least the user can read something while the font loads.
There’s a clever middle ground: font-display: optional. This tells the browser to use the custom font if it’s already cached from a previous visit, but to use the fallback font without swapping if the custom font isn’t immediately available. No invisible text, no layout shift, and repeat visitors get the custom font. First-time visitors see the fallback, which is a trade-off many designers are willing to make.
Font file size matters too. A full font file with every glyph for every language can be hundreds of kilobytes. Subsetting (including only the characters your page actually uses) can shrink font files dramatically. The unicode-range descriptor in @font-face lets you split a font into multiple files by script (Latin, Cyrillic, CJK), downloading only the chunks needed for the current page’s content. Google Fonts does this automatically: when you request a font, it serves a subset tailored to your page’s content.
What makes pages feel fast
If the diagnosis is “too much work before the first paint,” the treatment is moving work out of the critical path. Every technique in this section is a different way of saying the same thing: do less, sooner, and defer the rest.
Server-side rendering (SSR) means the server sends complete HTML with actual content, rather than an empty <div id="root"> that client-side JavaScript populates after it downloads and executes. With SSR, the browser can parse and render the HTML immediately; the content is there, in the response. The user sees a page while JavaScript loads in the background.
This is why frameworks like Next.js, Nuxt, and SvelteKit have made SSR a first-class feature; they recognised that the single-page-application pattern of “send empty HTML, download framework, fetch data, render” was creating a terrible first-load experience. The pendulum has swung: after a decade of client-side rendering being the default for new web applications, the industry has largely concluded that server-rendered HTML is faster for initial loads and that client-side rendering should be reserved for interactive behaviour after the page is visible.
There’s a hybrid approach called hydration: the server sends fully rendered HTML (so the page is visible immediately), and then the client-side JavaScript “hydrates” it, attaching event handlers and making it interactive. The tricky part is that during hydration, the page looks interactive but isn’t yet; buttons are visible but don’t respond to clicks until hydration completes. This gap is sometimes called the “uncanny valley” of web performance. Newer approaches like partial hydration (only hydrate the interactive parts of the page) and islands architecture (treat each interactive component as an isolated “island” in a sea of static HTML) aim to shrink this gap.
Progressive rendering takes this further. HTTP supports chunked transfer encoding, which lets the server send the response in pieces. A smart server can send the <head> (with critical CSS) and the beginning of the <body> immediately, while it’s still computing the rest of the page. The browser starts parsing and rendering that first chunk while the server is still generating the remainder. This is streaming in its most elegant form: the server and browser working in parallel.
The React team’s streaming SSR and the newer React Server Components take this idea to its logical conclusion. Components that need to fetch data render as placeholders initially, and when their data arrives, the server streams the completed HTML into the page. The browser receives a trickle of HTML chunks that progressively fill in the page, each chunk immediately parseable and renderable. The user sees content appearing in stages, from fastest to slowest, rather than waiting for the slowest data source to return before seeing anything.
This is, in a sense, a return to how the early web worked. In the 1990s, pages loaded progressively because connections were slow and servers were simple: the HTML came down the wire byte by byte, and the browser rendered it as it arrived. Then we built single-page applications that waited for everything to download before rendering anything, and we lost that progressive quality. Now we’re engineering our way back to it, with considerably more sophistication.
Lazy loading defers resources that aren’t visible in the initial viewport. Adding loading="lazy" to an image tag tells the browser not to fetch that image until the user scrolls near it. This can dramatically reduce initial page weight: if a page has twenty images but only three are visible without scrolling, lazy loading avoids downloading the other seventeen until they’re needed.
A common mistake: applying loading="lazy" to images in the initial viewport. This actually hurts performance, because the browser’s preload scanner would normally discover and start downloading those images immediately, but loading="lazy" tells it to wait until a layout is computed and the browser determines whether the image is near the viewport. For above-the-fold images, use loading="eager" (the default) and fetchpriority="high" to ensure they load as fast as possible.
Image formats matter enormously for page weight. A hero image saved as a high-quality JPEG might be 500 KB. The same image in WebP format is typically 25 to 35 percent smaller. In AVIF format (a newer codec based on the AV1 video format), it can be 50 percent smaller. Modern browsers support both WebP and AVIF, and the <picture> element lets you serve the optimal format to each browser with graceful fallback.
Preconnect and preload let you give the browser a head start on resources it’ll need.
<link rel="preconnect" href="https://fonts.googleapis.com"> tells the browser to start DNS resolution, TCP handshake, and TLS negotiation with that server immediately, before it discovers a font request in the CSS. By the time it needs the font, the connection is already warm. This can save 100 to 300 milliseconds per cross-origin resource: a significant gain for something that’s a single line of HTML.
<link rel="preload" href="/hero.webp" as="image"> tells the browser to start fetching that image immediately at high priority, before the parser encounters it in the HTML. Without preload, the browser wouldn’t discover it needs the image until it encounters the <img> tag in the body, by which point it’s already spent time parsing everything before it.
<link rel="dns-prefetch" href="https://analytics.example.com"> is a lighter-weight version of preconnect: it only does the DNS lookup, not the full TCP/TLS handshake. It’s useful for resources you might need but aren’t certain about.
And the fetchpriority attribute (e.g., <img fetchpriority="high">) lets you tell the browser which resources matter most, overriding its default heuristics. The browser normally assigns priorities based on resource type (CSS is high, images are low, scripts are medium), but sometimes you know better; your hero image is more important than the decorative icon in the footer, and fetchpriority lets you say so.
Service workers are scripts that sit between the browser and the network, intercepting every fetch request. A well-configured service worker can cache your site’s assets on first visit and serve them from the local cache on subsequent visits, making repeat loads effectively instant, even offline. This is the technology behind Progressive Web Apps (PWAs), and it’s what makes some sites feel like native apps on return visits.
The service worker has a lifecycle that’s worth understanding. It’s installed on first visit, activated on the next navigation, and then sits dormant until a fetch event fires. When a request comes in, the service worker can respond from cache (instant), fetch from the network (normal speed), or use a strategy like “cache first, network fallback” or “network first, cache fallback.” The choice of caching strategy determines the trade-off between freshness and speed, and getting it right is one of those things that separates good web experiences from great ones.
Content Delivery Networks (CDNs) solve a different latency problem: the physical distance between the user and the server. A CDN caches your site’s static assets (images, CSS, JavaScript, fonts) on servers distributed around the world. When a user in Perth requests your stylesheet, the CDN serves it from a server in Sydney or Singapore rather than from your origin server in London. This can cut hundreds of milliseconds off the TTFB for each resource. Modern CDNs like Cloudflare, Fastly, and AWS CloudFront can also cache and serve dynamic HTML at the edge, running your application logic closer to the user.
Testing: seeing the pipeline
All of this (the critical path, the main thread contention, the layout thrashing, the render-blocking resources) is visible if you know where to look. And the tools for looking have never been better.
Chrome DevTools’ Performance tab is a profiler that records everything the browser does during a page load or interaction: HTML parsing, script evaluation, style recalculation, layout, paint, composite, and idle time. The output is a flame chart showing exactly which functions ran, how long they took, and how they related to frames.
The colour coding is worth learning. Long yellow bars are JavaScript execution; if you see a yellow bar that spans an entire frame, JavaScript is blocking the main thread for that whole frame. Purple bars are style recalculation and layout. Green bars are paint and compositing. Blue bars are HTML parsing. Grey bars are idle time (the browser had nothing to do, which is good, because idle time means headroom for user interaction).
A frame that took longer than 16.7 milliseconds will show a red triangle: a dropped frame, a moment of jank. String enough of these together and you have a visibly stuttering animation or a noticeably sluggish scroll.
The flame chart’s depth tells its own story. A shallow chart means the browser is doing simple work: parse some HTML, apply some styles, paint. A deeply nested chart means complex call stacks: JavaScript calling into framework code calling into library code calling into DOM methods. The deeper the stack, the harder it is to optimise, because the critical path is buried under layers of abstraction.
The Network tab shows a waterfall chart: a timeline of every resource the browser requested. Each bar represents a resource (an HTML file, a stylesheet, a script, an image, a font), and the bar’s length shows how long it took to fetch. The colours indicate the phases: DNS (dark green), TCP (orange), TLS (purple), waiting for the server (green), and downloading (blue). A good waterfall is short and parallel: many resources loading simultaneously, with none blocking others. A bad waterfall is long and serial: resources loading one after another, each waiting for the previous one to complete, usually because of blocking scripts or missing preloads.
Lighthouse, built into Chrome DevTools, runs an automated audit and scores your page on performance, accessibility, best practices, and SEO. It simulates a mid-range mobile device on a throttled connection (roughly equivalent to a Moto G4 on a slow 4G network) and reports the Core Web Vitals alongside specific, actionable recommendations. The throttling is important; it approximates the experience of a typical user, not a developer with a fast MacBook and gigabit fibre. Many teams are shocked by their Lighthouse scores the first time they run it, because they’ve only ever experienced their site on developer hardware.
A word of caution: Lighthouse scores are useful benchmarks but they’re not the whole picture. A score of 100 doesn’t guarantee a great user experience, and a score of 60 doesn’t necessarily mean a bad one. The scores are computed from a synthetic test in a controlled environment. Real-world performance depends on the user’s device, connection, location, browser version, and extensions, none of which Lighthouse can simulate.
WebPageTest (webpagetest.org) runs real browsers from real locations around the world and produces detailed waterfall charts, filmstrips (showing what the page looked like at each point in the load), and a wealth of metrics. It’s the closest thing to seeing your page the way a real user on a real connection in a real city experiences it.
The filmstrip view is revealing. It shows a sequence of screenshots taken at intervals during the page load, typically every 100 milliseconds. You can watch your page materialise from blank screen to first paint to fully loaded, and you can see exactly where time is being spent. A page that shows a blank screen for 2 seconds before suddenly appearing all at once has a very different user experience from one that progressively fills in content over 2 seconds, even if both take the same total time to load. The filmstrip shows you the difference.
Google also provides Chrome User Experience Report (CrUX): real-world performance data collected anonymously from Chrome users who have opted in. CrUX tells you how your actual users experience your site, not how a synthetic test experiences it. A page that scores perfectly in Lighthouse but poorly in CrUX is being tested under conditions that don’t match reality; the lab is clean, but the field is messy. Real users are on slow connections, old phones, crowded Wi-Fi networks, and trains entering tunnels. CrUX captures that reality.
A brief history of speed
It’s worth stepping back to appreciate how fast browsers have become.
In the mid-1990s, Netscape Navigator on a 56k modem took minutes to load a page with a few images. In 2005, Gmail launched as a single-page application and was considered bleeding-edge because it loaded content without refreshing the page. In 2010, Google announced that page speed would be a ranking factor in search results, and the performance race began in earnest. In 2015, HTTP/2 shipped, bringing multiplexing and header compression. In 2020, Core Web Vitals became a ranking signal, tying performance directly to search visibility. Each step raised the bar for what users and search engines consider “fast.”
The hardware has improved dramatically too. A mid-range phone in 2025 has more processing power than a desktop computer from 2010. But the web has grown to fill the available resources: pages are heavier, scripts are larger, and expectations are higher. It’s a Red Queen’s race: hardware gets faster, but websites get heavier, and the user experience stays roughly the same. The pages that feel fast are the ones where the developer deliberately chose restraint.
An operating system in a tab
Step back for a moment and consider what the browser is actually doing.
It’s running a network stack (DNS, TCP, TLS, HTTP/2, HTTP/3). It’s parsing two languages (HTML and CSS). It’s compiling and executing a third (JavaScript), including a just-in-time compiler that optimises hot code paths to near-native speed. It’s running a layout engine that implements the CSS specification, a document of thousands of pages covering flexbox, grid, floats, positioning, text layout, writing modes, and dozens of other layout models. It’s managing a compositing pipeline that talks to the GPU. It’s handling user input (touch events, mouse events, keyboard events, gamepad input, accessibility tree updates). It’s managing memory and running garbage collection. It’s sandboxing untrusted code from millions of different websites. It’s doing all of this in real time, sixty or more times per second, on everything from a flagship desktop to a five-year-old phone.
The engineering that goes into modern browser engines is staggering. Chromium (which powers Chrome, Edge, Opera, Brave, and many others) has over 35 million lines of code. WebKit (Safari) and Gecko (Firefox) are smaller but still measured in millions of lines. These are among the most complex software systems ever built, and they’re open source: anyone can read the code that renders the web.
The browser is, by any reasonable definition, an operating system. Tabs are processes. The main thread is the CPU. The compositor is the GPU scheduler. The service worker is the filesystem cache. JavaScript is the application layer. Web APIs (fetch, IndexedDB, WebSocket, WebRTC, Web Audio, WebGL, WebGPU) are the system calls. The URL bar is the command line.
Consider what the browser’s JavaScript engine alone does. V8 (Chrome’s engine) compiles JavaScript to machine code using multiple compilation tiers: an interpreter (Ignition) for quick startup, an optimising compiler (TurboFan) for hot code paths, and a deoptimisation pathway for when the compiler’s assumptions are invalidated. It manages a garbage-collected heap, handling memory allocation and collection without exposing any of this complexity to the developer. It implements a specification (ECMAScript) that’s updated yearly with new language features. And it does all of this while maintaining backward compatibility with JavaScript written in 1995.
Firefox’s SpiderMonkey, Safari’s JavaScriptCore, and the various JavaScript engines that power server-side runtimes all solve the same problems with different approaches, but V8’s multi-tier compilation is representative. The point is that “executing JavaScript” isn’t simple; it’s a deeply optimised compilation pipeline that happens to run inside a tab in your browser.
When a page feels slow, it’s almost always because someone asked this operating system to do too much work in the critical path. Too many resources to fetch. Too much JavaScript to parse and execute. Too much layout to calculate. Too many layers to paint. The rendering pipeline is a marvel of engineering (it turns text files into interactive visual experiences in fractions of a second) but it has limits, and every millisecond spent on one thing is a millisecond not spent on responding to the user.
The fastest page is the one that sends the least work to the browser: HTML with content, critical CSS inline, images with dimensions and modern formats, fonts with font-display: swap and subsetting, scripts deferred, third-party scripts audited and minimised, and a service worker catching repeat visits.
There’s a useful mental model here: every resource you add to a page is a tax. The HTML itself is free (the browser was going to parse HTML anyway, and it’s fast at it). CSS is a modest tax (render-blocking but usually small). Images are a moderate tax (they don’t block rendering if they have dimensions, and they can load lazily). JavaScript is the heaviest tax: it blocks the main thread during parse and execution, it can trigger layout and paint, and it can grow without anyone noticing until the page feels slow.
The best-performing pages on the web tend to be the simplest. Wikipedia loads fast not because of clever engineering (though their engineering is good) but because the pages are mostly HTML and CSS with minimal JavaScript. Hacker News loads fast because it’s server-rendered HTML with almost no client-side code. The blog you’re reading right now (if I’ve done my job) loads fast because it’s static HTML generated at build time, served from a CDN, with no JavaScript required to display the content.
It’s not complicated. It’s just hard, because the incentives of the modern web (analytics, ads, A/B tests, chat widgets, consent banners, tracking pixels) all push in the opposite direction. Every one of those scripts is someone’s priority. None of them are the user’s priority. The user’s priority is: I typed a URL. Show me the page.
The tension between business requirements and user experience is the central drama of web performance. The marketing team wants analytics. The legal team wants the cookie banner. The support team wants the chat widget. The sales team wants the A/B test. Each request is individually reasonable. Cumulatively, they create a page that takes eight seconds to become interactive on a mid-range phone over a 3G connection. And the people who suffer most are the users with the cheapest devices and the slowest connections, which is to say, the users who can least afford to wait.
Alex Russell, a former Google Chrome engineer, has written extensively about this as a performance inequality gap: the web gets faster on developer hardware (high-end laptops on fast Wi-Fi) and slower on user hardware (budget Android phones on congested mobile networks). Developers don’t feel the pain because they’re testing on the best possible hardware. Their users are not.
The browser is trying. It’s performing a small miracle between your keystroke and those pixels: resolving names, negotiating encryption, parsing languages, solving layout constraints, painting pixels, compositing layers, and doing it all in the time it takes you to blink.
The least we can do, as the people who build the things it renders, is not make it harder than it has to be.