Browser Cache

Some time ago, I encountered a problem with browser caching. I didn’t have any ideas at the time. Later, I checked back and found that I had contacted many things, but I didn’t systematically sort them out. I used this blog to sort them out.

The main content is the browser’s strong cache, the mechanism for negotiating the cache, and the corresponding HTTP headers.

In addition, the content related to the browser’s Cache API may be sorted out in the future.

Strong cache and negotiation cache

Browser cache is divided into strong cache and negotiation cache, and there are two obvious differences between the two:

  1. If the browser hits a strong cache, there is no need to send a request to the server; and the negotiation cache is ultimately decided by the server whether to use the cache, that is, there is a communication between the Client and the server.
  2. The request status code returned by the strong cache in’chrome ’ (although no real’http’ request was made) is’ 200 (from cache) ‘; while the negotiation cache if it hits the cache, the request status code is’ 304 (not modified) ‘. The strategy of different browsers is different, in’Fire Fox’, the’from cache 'status code is 304.

Of which

Caching mechanism

First, let’s take a general perception of its matching process, as follows:

  1. Before the browser sends the request, determine whether it hits (including whether it expires) according to the expires and cache-control of the request header. If it hits, the resource is directly obtained from the cache and the request will not be sent. If there is no hit, go to the next step.
  2. If the strong cache rule is not hit, the browser will send a request and judge whether it hits the negotiation cache according to the last-modified and etag of the request header. If it hits, the resource is directly obtained from the cache. If there is no hit, proceed to the next step.
  3. If the first two steps are not hit, directly from the server level to get resources.

Request flow

The browser caches the resource after the first request, and when the request is made again, it will perform the following two steps:

  1. The browser will get the information in the’header ‘of the cache resource, and judge whether it hits a strong cache according to the’expires’ and’cache-control ‘in the’response header’. If it hits, it will directly obtain the resource from the cache.
  2. If there is no strong cache hit, the browser will send a request to the server, the request will bring “IF-Modified-Since” or “IF-One-Match”, their values are the first request returns “Last-Modified” or “Etag”, by the server to compare this pair of fields to determine whether a hit. If it hits, the server returns a 304 status code and does not return the resource content, the browser will directly retrieve it from the cache; otherwise, the server will eventually return the actual content of the resource and update the relevant cache fields in the header.

Strong cache

Strong caching is controlled by the’Expires’ or’Cache-Control 'fields in the return header, both of which represent the cache effective time of the resource.

  • ‘Expires’ is the specification of’http 1.0 ‘, and the value is a time point string in’GMT’ format, such as’Expires: Mon, 18 Oct 2066 23:59:59 GMT ‘. This time point represents the time when the resource expires. If the current timestamp is before this time, it is determined that the cache hit. One drawback is that the expiration time is an absolute time, and if the server time deviates from the Client time by a large margin, it will cause cache confusion. It is normal that the time of the server is different from the actual time of the user, so’Expires’ will bring some trouble in actual use.
  • ‘Cache-Control’ field is the specification of’http 1.1 ‘. Generally, the value of’max-age’ of this field is often used to judge. It is a relative time, for example. ‘Cache-Control: max-age = 3600’ represents the valid period of the resource is 3600 seconds. And the’Date ‘in the return header indicates the time when the message was sent, indicating that the current resource is valid during the period of’Date~ Date + 3600s’. However, I often encounter in actual use after setting’max-age ‘, in the’max-age’ time to revisit the resource will return ‘304 not modified’, which is due to the server time and the local time different caused. Of course, there are several other values of’Cache-Control 'can be set, but relatively rarely used:
    • ‘No-cache’ does not use local cache. Negotiation cache is required.
    • ‘No-store’ directly prohibits the browser from caching data, and every time a resource is requested, the server will be asked for the complete resource, similar to’disabled cache ‘in’network’.
    • ‘public’ can be cached by all users, including end users and Middleware proxy servers such as cdn.
    • ‘Private’ can only be cached by the end users’ browsers.

If’Cache-Control ‘and’Expires’ exist at the same time, ‘Cache-Control’ takes precedence over’Expires’.

Negotiation cache

Negotiated caching is determined by the server whether cache resources are available. It mainly involves two pairs of attribute fields, both of which appear in pairs, that is, the response header of the first request contains a certain word, ‘Last-Modified’ or’Etag ‘, and subsequent requests will bring the corresponding request field’If-Modified-Since’ or’If-One-Match ‘, if there is no’Last-Modified’ or’Etag 'field in the response header, there will be no corresponding field in the request header.

  • ‘Last-Modified/If-Modified-Since’ The values of both are time strings in GMT format, and’Last-Modified ‘marks the last file modification time. When the next request is made, the request header will be marked with’If-Modified-Since’ The value is’ Last-Modified ‘, which tells the server the last modification time of my locally cached file. On the server, according to the last modification time of the file, it is judged whether the resource has changed. If the file has not changed, it will return’ 304 Not Modified '. The request does not return the resource content, and the browser directly uses Local cache. When the server returns a response of 304 Not Modified, the response header will not add the Last-Modified to attempt to update the local cache of Last-Modified, because since the resource has not changed, the Last-Modified will not change; if the resource has changed, it will return the resource content normally, the new Last-Modified will return in the response header, and update the local cache of Last-Modified before the next request, The next request, ‘If-Modified-Since’ will Enable the updated Last-Modified.
  • ‘Etag/If-One-Match’, the value is a unique identification string generated by the server for each resource, and the value will change as long as the resource changes. The server calculates a hash value based on the file itself and returns it to the browser through the’ETag ‘field. After receiving the’If-One-Match’ field, the server determines whether the file content is changed by comparing whether the two are consistent. Unlike’Last-Modified ‘, when the server returns a response of’ 304 Not Modified ‘, the’ETag’ will be returned in the’response header ‘because the’ETag’ has been recalculated on the server, even if the’ETag 'has not changed from the previous one.

HTTP

Why have

The emergence of Etag in HTTP1.1 is mainly to solve several difficult problems of Last-Modified:

  • Some files may change periodically, but the content does not change (only the modification time of the change). At this time, we do not want the Client to think that the file has been modified and re-GET;
  • Some files are modified very frequently, such as modifying within seconds, (for example, modifying N times within 1s), the granularity that can be checked by If-Modified-Since is second-level, and the use of Etag can ensure that the client can refresh the cache N times within 1 second under this requirement.
  • Some servers cannot accurately obtain the last modification time of the file.

Cache-control

Cacheability

  • public

    Indicates that the response can be cached by any object (including: the client sending the request, the proxy server, etc.), even content that is not normally cacheable. (For example: 1. The response does not have a’max-age ‘directive or an’Expires’ header; 2. The request method corresponding to the response is POST 。)

  • private

    Indicates that the response can only be cached by a single user and cannot be used as a shared cache (that is, the proxy server cannot cache it). Private caches can cache the response content, such as the local browser of the corresponding user.

  • no-cache

    Before releasing the cached copy, it is mandatory for the cache to submit the request to the origin server for verification (negotiated cache verification).

  • no-store

    The cache should not store anything about client requests or server responses, i.e. no cache is used.

Due

  • max-age=<seconds>

    Sets the maximum period of cache storage, beyond which the cache is considered expired (in seconds). In contrast to’Expires’, the time is relative to the requested time.

  • s-maxage=<seconds>

    Override the max-age or Expires header, but only for shared caches (such as individual proxies), private caches ignore it.

  • max-stale[=<seconds>]

    Indicates that the Client is willing to receive a resource that has expired. An optional number of seconds can be set to indicate that the response cannot be outdated beyond this given time.

  • min-fresh=<seconds>

    Indicates that the Client wants to get a response that keeps it up to date for a specified number of seconds.

  • stale-while-revalidate=<seconds>

    Indicates that the client is willing to accept stale responses while checking for new ones asynchronously in the background. The second value indicates the length of time the client is willing to accept stale responses.

  • stale-if-error=<seconds>

    Indicates that the customer is willing to accept the stale response if the new check fails. The second value indicates the time the customer is willing to accept the stale response after the initial expiration.

Re-verify and reload

  • must-revalidate

    Once a resource has expired (for example, it has passed max-age), the cache cannot use it to respond to subsequent requests until it has successfully authenticated to the origin server.

  • proxy-revalidate

    Same effect as must-revalidate, but it only applies to shared caches (such as proxies) and is ignored by private caches.

  • immutable

    Indicates that the response body does not change over time. The resource (if not expired) does not change on the server, so the Client should not send a revalidation request header (such as’If-One-Match ‘or’I f-Modified-Since’) to check for updates, even if the user explicitly refreshes the page. In Firefox, immutable can only be used in’https://'transactions. For more information, see这里

Other

  • no-transform

    Resources must not be transformed or transformed. HTTP headers such as’Content-Encoding ‘,’ Content-Range ‘,’ Content-Type 'cannot be modified by proxies. For example, non-transparent proxies orGoogle’s Light ModeIt is possible to convert image formats to save cache space or reduce traffic on slow links. The’no-transform 'directive does not allow this.

  • only-if-cached

    Indicates that the Client only accepts cached responses and does not check with the origin server for updated copies.

Priority

1
Cache-Control > expires > Etag > Last-Modified

Three-level cache principle

Finally, summarize the browser’s three-level caching principle:

  1. Go to the memory first to see, if there is, load it directly.
  2. If the memory is not available, go to the hard disk to get it. If there is, load it directly.
  3. If the hard disk is not available, then make a network request
  4. Load resources into cache to hard disk and memory

Reference article: https://segmentfault.com/a/1190000021661656