Web Performance Indicators and Optimization Methods

Today, I accidentally clicked on a series of articles on Performance optimization in MDN, learned a lot of new things, and reorganized a lot of old things.

For example, the loading order of different types of HTML resources, whether they will block each other, etc., and for example, learned some new optimization methods, such as 14K optimization, dns-prefetch, etc

This is the entry address: https://developer.mozilla.org/zh-CN/docs/Web/Performance

Performance Metrics for Web Rendering

Before proceeding with Performance optimization, we need to choose the right metric (performance metric) for the application and set a reasonable optimization goal.

Not all metrics are equally important, depending on your application. Finally, set a realistic goal based on the metric.

Here are some indicators worth considering:

First effective drawing (First Meaningful Paint, or FMP for short, when the main content is rendered on the page)

  • Hero Rendering Times (a new metric for measuring User Experience when the content users care about most is rendered)

  • Time to Interactive (TTI for short, refers to the fact that the page layout has been stabilized, key page fonts are visible, and the main process can be used to process user input, basically the user can click on the UI and interact with it)

Input responsiveness (the time it takes for the interface to respond to user input)

  • Perceptual Speed Index (PSI for short, measures the visual change speed of the page during loading, the lower the score, the better)

Custom metrics, determined by business needs and User Experience.

FMP is very similar to hero rendering time, but they are different in that FMP does not distinguish whether the content is useful or not, and does not distinguish whether the rendered content is of concern to the user.

Based on these metrics and the RAIL performance model, we can set some goals, such as

  • 100ms interface response time with 60FPS

  • Speed Index less than 1250ms

  • 3G network environment interaction time is less than 5s

  • The size budget of important files is less than 170kb

Browser rendering principle and critical rendering path

Responsive websites provide a better User Experience. Users expect fast content loading and smooth interactive web experiences.

Waiting for resources to load and single-threaded browser execution in most cases are the two main reasons that affect web performance.

Latency is a major threat that needs to be overcome to get the browser to load resources quickly. In order to achieve fast loading, the developer’s goal is to send the requested information as quickly as possible, at least seemingly fairly fast. Network latency is the link transfer time it takes to transfer the binary on the link to the computer. All Web Performance optimization needs to do is to make the page load as fast as possible.

In most cases, browsers execute single-threaded. In order to have smooth interactions, developers aim to ensure interactive performance of the website from smooth page scrolling to click-response. Rendering time is a key element, ensuring that the main thread can complete all the tasks given to it and still handle user interactions all the time. By understanding the single-threaded nature of browsers and minimizing the responsibility of the main thread, web performance can be optimized to ensure smooth rendering and timely interaction responses.

Step 1: Navigation

Navigation is the first step in loading a web page. It occurs in the following situations: users enter a URL in the address bar, click a link, submit a form, or other actions.

One of the goals of Web Performance optimization is to reduce the time it takes for navigation to complete. Ideally, it usually doesn’t take much time, but waiting time and bandwidth can cause it to lag.

DNS query

The first step in navigating a web page is to find the location of the page resource. If you navigate to the https://example.com, the HTML page is located on a server with an IP address of 93.184.216.34. If you have not visited this website before, you need to do a DNS query.

The browser initiates a DNS query request to the name server and finally gets an IP address. After the first request, this IP address may be cached for a period of time, which can speed up subsequent requests by retrieving the IP address from the cache instead of querying it through the name server.

Loading a page by hostname usually requires only one DNS query. However, multiple DNS queries are required for different hostnames pointed to by the page. If fonts, images, scripts, ads, and metrics all have different hostnames, a DNS query is required for each hostname.

TCP handshake

Once the server IP address is obtained, the browser will establish a connection with the server through the TCP “three-way handshake” (en-US). This mechanism is used to allow both ends to try to communicate - some parameters of the network TCP socket connection can be negotiated before the browser and server send data through the upper layer protocol HTTPS.

TCP’s “three-way handshake” technique is often referred to as “SYN-SYN-ACK” - more specifically SYN, SYN-ACK, ACK - because three messages are first sent over TCP for negotiation, and then a TCP session is started between two computers. Yes, this means that three messages are sent back and forth between the end point and each server, and the request has not yet been issued.

TLS Negotiation

In order to establish a secure connection over HTTPS, another handshake is necessary. More specifically, TLS negotiation, which determines what password will be used to encrypt communication, authenticate the server, and establish a secure connection before the real data transmission takes place. Three round trips to the server are required before sending the real requested content.

Although establishing a secure connection increases the waiting time for loading pages, it is worth the cost of increasing the waiting time for establishing a secure connection, because the data transmitted between the browser and the web server cannot be decrypted by a third party.

After 8 round trips, the browser can finally make the request.

Step 2: Respond

Once we establish a connection to the web server, the browser sends an initial HTTP GET request on behalf of the user. For websites, this request is usually an HTML file. Once the server receives the request, it will reply with the relevant response headers and HTML content.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<!doctype HTML>
<html>
<head>
<meta charset="UTF-8"/>
<title>My simple page</title>
<link rel="stylesheet" src="styles.css"/>
<script src="myscript.js"></script>
</head>
<body>
<h1 class="heading">My Page</h1>
<p>A paragraph with a <a href="https://example.com/about">link</a></p>
<div>
<img src="myimage.jpg" alt="image description"/>
</div>
<script src="anotherscript.js"></script>
</body>
</html>

The response to the initial request contains the first byte of the received data. Time to First Byte (TTFB) is the time between the user making the request by clicking on the link and receiving the first HTML data packet. The first content block is usually 14KB of data.

In the above example, this request is definitely less than 14KB, but it will not request the linked resource until the browser encounters the link in the parsing phase, which is described below.

TCP slow start/14kB rule

The first response data packet is 14KB in size. This is part of Slow Start, an algorithm to equalize network connection speeds. Slow Start gradually increases the amount of data sent until the maximum bandwidth of the network is reached.

In TCP Slow Start, after receiving the initial packet, the server doubles the size of the next data packet to approximately 28KB. Subsequent data packets in turn double the size of the previous packet until a predetermined threshold is reached, or congestion is encountered.

If you’ve heard of the 14KB rule for initial page load, TCP slow start is the reason why the initial response is 14KB, and why web Performance optimization needs to make this initial 14KB response the focus of optimization. TCP slow start gradually establishes a transmission speed suitable for network capacity to avoid congestion.

Step 3: Parsing and Rendering

Once the browser receives the first piece of data, it can start parsing the received information. “Parsing” is the step by which the browser converts the data received over the network into DOM and CSSOM, which are drawn on the screen as pages through renderers.

The DOM is the internal representation of browser markup. The DOM is also exposed and can be manipulated through various APIs in JavaScript.

Even if the HTML of the requested page is larger than the initial 14KB data packet, the browser will start parsing and trying to render based on the data it has. This is why it is important for web Performance optimization to include everything the browser needs to start rendering the page in the first 14KB, or at least page templates (CSS and HTML required for the first render). But before rendering onto the screen, HTML, CSS, JavaScript must be parsed.

Critical Rendering Path (CRP)

A Document Object Model is created when parsing HTML. HTML can request JavaScript, which in turn, can change the DOM. HTML contains or requests styles, which in turn build the CSS object model. The browser engine combines the two to create a render tree. Layout determines the size and position of everything on the page. Once layout is determined, pixels are drawn onto the screen.

Optimizing the critical rendering path can shorten the time to first render. Understanding and optimizing the critical rendering path is important to ensure that rearrangements and repaints can be done at a speed of 60 frames per second to ensure efficient user interaction and avoid nuisance.

Web performance includes server requests and responses, loading, executing scripts, rendering, layout, and drawing each pixel to the screen.

A web page request starts with a request for an HTML file. The server returns HTML – response headers and data. Then the browser starts parsing the HTML, transforming the received data into a DOM tree. The browser initializes the request every time it finds an external resource, whether it’s a style, script, or embedded image reference. Sometimes requests block, which means parsing the remaining HTML is terminated until important resources are processed. The browser then parses the HTML, sends requests and constructs the DOM until the end of the file, at which point it starts constructing the CSS object model. Once the DOM and CSSOM are complete, the browser constructs the render tree and calculates the style of all visible content. Once the render tree is complete, the layout begins, defining the location and size of all render tree elements. After completion, the page is rendered, or drawn onto the screen.

DOM (Document Object Model)

DOM builds are incremental. HTML responses become tokens, tokens become nodes, and nodes become DOM trees. A single DOM node starts with a startTag token and ends with an endTag token. Nodes contain all relevant information about HTML elements. This information is described using tokens. Nodes are connected into the DOM tree according to the token hierarchy. If another set of startTag and endTag tokens is between a set of startTag and endTag, you have a node within the node, which is how we define the DOM tree hierarchy.

The greater the number of nodes, the longer subsequent events in the critical rendering path will take.

CSSOM

The DOM contains all the content of the page. CSSOM contains all the styles of the page, that is, how to display information about the DOM. CSSOM is similar to DOM, but different. DOM construction is incremental, CSSOM is not. CSS is render-blocking: the browser blocks the page rendering until it has received and executed all CSS. CSS is render-blocking because rules can be overridden, so content cannot be rendered until CSSOM is complete.

CSS has its own set of rules for defining tags. Note that the C in CSS stands for “cascading”. CSS rules are cascaded. As the parser transforms tags into nodes, descendants of nodes inherit styles. Incremental processing functions like processing HTML are not applied to CSS because subsequent rules may be overwritten by previous ones. The CSS object model is built as CSS is parsed, but cannot be used to build render trees until it is finished, because styles will be overwritten by subsequent parsing and should not be rendered to the screen.

From a selector performance perspective, fewer specific selectors are faster than more. For example, .foo {} is faster than .bar.foo {} because when the browser finds .foo, it must then go up the DOM to check if .foo has an ancestor .bar. The more specific the tag browser requires more work, but such drawbacks are not necessarily worth optimizing.

If you measure the time it takes to parse CSS, you’ll be amazed at how fast browsers are. More specific rules are more expensive because they have to traverse more DOM tree nodes, but the additional overhead is usually small. Measure first. Then optimize on demand. Specification may not be your low-hanging fruit. Performance optimization of selectors in CSS, the improvement is only millisecond-level. There are other ways to optimize CSS, such as compression and using media queries to asynchronously handle requests that CSS is non-blocking.

Rendering tree

The render tree includes content and styles: the DOM and CSSOM trees are combined into a render tree. To construct the render tree, the browser checks each node, starting from the root node of the DOM tree, and decides which CSS rules to add.

The render tree only contains visible content. The header (usually) does not contain any visible information, so it will not be included in the render tree. If there is an element with display: none; on it, neither it nor its descendants will appear in the render tree.

Layout

Once the render tree is built, layout becomes possible. Layout depends on the size of the screen. Layout This step determines where and how elements are placed on the page, the width and height of each element, and the correlation between them.

What is the width of an element? Block-level elements, by definition, have 100% of the parent’s width by default. An element with a width of 50%, will occupy half of the parent’s width. Unless otherwise defined, the body is 100% wide, meaning it occupies 100% of the window. The width of the device affects the layout.

The meta tag of the window defines the width of the layout window, thus affecting the layout. If not, the browser uses the default width of the window, and the default full-screen browser is usually 960px. By default in a full-screen browser like your mobile browser, by setting < meta name = “viewport” content = “width = device-width” >, the width will be the width of the device instead of the default window width. Device width will change when users rotate their phone in landscape and portrait modes. Layout occurs every time the device rotates or the browser scales.

Layout performance is affected by the DOM - the more nodes, the longer the layout takes. The layout will become a bottleneck and will cause hysteresis if scrolling or other animation is required during the process. A 20ms delay may be acceptable for loading or direction changes, but it will be sluggish for animation or scrolling. Layout occurs whenever the render tree changes, such as adding nodes, changing content, or updating the box model style at a node.

To reduce the frequency and duration of layout events, batch update or avoid changing box model properties.

Draw

The final step is to draw the pixels on the screen. Once the render tree is created and the layout is complete, the pixels can be drawn on the screen. When loaded, the entire screen is drawn. After that, only the affected screen area will be redrawn, and the browser is optimized to redraw only the smallest area that needs to be drawn. Draw time depends on what type of update is attached to the render tree. Drawing is a very fast process, so focusing on improving performance is probably not the most effective part. The important thing to remember is that when measuring the time required for an animation frame, layout and redraw time need to be taken into account. Styles added to the node will increase the render time, but removing the 0.001ms added by the style may not give your optimization value for money. Remember to measure first. Then you can decide its optimization priority.

Preload scanner

When the browser builds the DOM tree, this process takes up the main thread. When this happens, the preload scanner will parse the available content and request high-priority resources such as CSS, JavaScript, and web fonts. Thanks to the preload scanner, we don’t have to wait until the parser finds a reference to an external resource to request it. It will retrieve resources in the background so that by the time the main HTML parser reaches the requested resource, they may already be running, or have already been downloaded. The optimization provided by the preload scanner reduces blocking.

1
2
3
4
<link rel="stylesheet" src="styles.css"/>
<script src="myscript.js" async></script>
<img src="myimage.jpg" alt="image description"/>
<script src="anotherscript.js" async></script>

In this example, when the main thread is parsing HTML and CSS, the preload scanner will find the script and image and start downloading them. To ensure that the script does not block the process, the async attribute or defer attribute can be added when the order of JavaScript parsing and execution is not important.

Waiting to fetch CSS will not block HTML parsing or downloading, but it does block JavaScript, because JavaScript is often used to query CSS properties of elements.

Step 4: Interaction

Once the main thread draws the page, you would think we are “ready”, but this is not the case. If the load contains JavaScript (and is delayed until the onload event fires), the main thread may be busy and cannot be used for scrolling, touching, and other interactions.

Time to Interactive (en-US) (TTI) is a measure of the time it takes from the first request to cause DNS queries and SSL connections to make the page interactive - interoperable is the point in time after First Contentful Paint (en-US), and the page responds to user interaction within 50ms. If the main thread is parsing, compiling, and executing JavaScript, it is unavailable and therefore cannot respond to user interaction in a timely manner (less than 50ms).

In our example, the image may load quickly, but the anotherscript.js file may be 2MB, and the user’s network connection is slow. In this case, the user can see the page very quickly, but cannot scroll until the script is downloaded, parsed, and executed. This is not a good User Experience. Avoid occupying the main thread

Summary of optimization methods

  • Pre-resolve the IP Address corresponding to the domain name through dns-prefetch, reducing navigation time
  • Try to put different resources under the same server domain name to reduce the need for DNS resolution

Response phase

  • Render the first screen page through SSR.
  • The required content of the first screen page should be as small as 14kb as possible, so that it can be returned and loaded at one time in the first packet of TCP

Analytic phase

  • Use preload scanner, Resource Hints like preload, prefetch etc
  • Reasonably plan the location of css and script tags, because css will not block html parsing, but will block js parsing and execution, and js will block html parsing if it is not defer or async
  • Use CDN to speed up the loading of resources
  • Load key CSS first, and split the CSS required for the first screen separately

Interaction phase

Optimize the first screen time

Improving page loading speed requires prioritizing resources being loaded, controlling the order in which they are loaded, and reducing the size of these resources.

  • Reduce the number of requests by downloading important resources asynchronously
  • Optimize the number of requests required and the file size per request
  • Reduce Critical Path length by prioritizing critical resources to optimize the order in which critical resources are loaded.

Coding optimization

In fact, the speed of data access is divided into speed. Here are several factors that affect the speed of data access:

Literals and local variables have the fastest access speed, while array elements and object members are relatively slow

  • The longer the search process of variables from local scope to global scope, the slower it is

  • The deeper the object is nested, the slower the read speed

  • The deeper an object exists in the prototype chain, the slower it will be to find it

  • The recommended practice is to cache object member values. Caching object member values in local variables will speed up access

When the application is running, the performance bottleneck is mainly that the cost of DOM operations is very expensive. Here are some suggestions for improving performance related to DOM operations:

  • Accessing the DOM in JS is very expensive. Please try to reduce the number of times you access the DOM (it is recommended to cache DOM attributes and elements, cache the length of the DOM collection into variables and use them in iterations. Reading variables is much faster than reading DOM.)

  • Rearrangements and repaints are very expensive. If the operation requires multiple rearrangements and repaints, it is recommended to remove the element from the doc stream first, and then return the element to the doc stream after processing, so that the browser will only rearrange and repaint twice (during detachment and during regression).

  • Good at using event delegation

Here are some details related to process control that can slightly improve performance, which are widely used in large open source projects (such as Vue).

  • Avoid using for… in (it can enumerate to the prototype, so it is slow)

  • Reverse looping in JS will slightly improve performance

  • Reduce the number of iterations

  • Loop-based iteration is 8x faster than function-based iteration

  • Replacing a lot of if-else and switch with Map tables will improve performance

Static resource optimization

  • Plain text compression with Brotli or Zopfli

  • Use responsive images via srcset, sizes and < picture > elements whenever possible. You can also use WebP formatted images via the < picture > element.

  • Compression and resolution of products using Tree-shaking, code-splitting, etc

Network optimization

  • Use HTTP cache

Use HTTP2 to reuse Merge Requests