All caches have a set of rules that they use to determine when to serve an object from the cache, if it's available. Some of these rules are set in the protocols (HTTP 1.0 and 1.1), and some are set by the administrator of the cache (either the user of the browser cache, or the proxy administrator).
Generally speaking, these are the most common rules that are followed for a particular request (don't worry if you don't understand the details, it will be explained below):
Together, freshness and validation are the most important ways that a cache works with content. A fresh object will be available instantly from the cache, while a validated object will avoid sending the entire object over again if it hasn't changed.
There are several tools that Web designers and Webmasters can use to fine-tune how caches will treat their sites. It may require getting your hands a little dirty with the server configuration, but the results are worth it. For details on how to use these tools with your server, see the Implementation sections below.
HTML authors can put tags in a document's <HEAD> section that describe its attributes. These Meta tags are often used in the belief that they can mark a document as uncacheable, or expire it at a certain time.
Meta tags are easy to use, but aren't very effective. That's because they're usually only honored by browser caches (which actually read the HTML), not proxy caches (which almost never read the HTML in the document). While it may be tempting to slap a Pragma: no-cache meta tag on a home page, it won't necessarily cause it to be kept fresh, if it goes through a shared cache.
On the other hand, true HTTP headers give you a lot of control over how both browser caches and proxies handle your objects. They can't be seen in the HTML, and are usually automatically generated by the Web server. However, you can control them to some degree, depending on the server you use. In the following sections, you'll see what HTTP headers are interesting, and how to apply them to your site.
HTTP headers are sent by the server before the HTML, and only seen by the browser and any intermediate caches. Typical HTTP 1.1 response headers might look like this:
HTTP/1.1 200 OK Date: Fri, 30 Oct 1998 13:19:41 GMT Server: Apache/1.3.3 (Unix) Cache-Control: max-age=3600, must-revalidate Expires: Fri, 30 Oct 1998 14:19:41 GMT Last-Modified: Mon, 29 Jun 1998 02:28:12 GMT ETag: "3e86-410-3596fbbc" Content-Length: 1040 Content-Type: text/html
The HTML document would follow these headers, separated by a blank line.
Many people believe that assigning a Pragma: no-cache HTTP header to an object will make it uncacheable. This is not necessarily true; the HTTP specification does not set any guidelines for Pragma response headers; instead, Pragma request headers (the headers that a browser sends to a server) are discussed. Although a few caches may honor this header, the majority won't, and it won't have any effect. Use the headers below instead.
The Expires HTTP header is the basic means of controlling caches; it tells all caches how long the object is fresh for; after that time, caches will always check back with the origin server to see if a document is changed. Expires headers are supported by practically every client.
Most Web servers allow you to set Expires response headers in a number of ways. Commonly, they will allow setting an absolute time to expire, a time based on the last time that the client saw the object (last access time), or a time based on the last time the document changed on your server (last modification time).
Expires headers are especially good for making static images (like navigation bars and buttons) cacheable. Because they don't change much, you can set extremely long expiry time on them, making your site appear much more responsive to your users. They're also useful for controlling caching of a page that is regularly changed. For instance, if you update a news page once a day at 6am, you can set the object to expire at that time, so caches will know when to get a fresh copy, without users having to hit 'reload'.
The only value valid in an Expires header is a HTTP date; anything else will most likely be interpreted as 'in the past', so that the object is uncacheable. Also, remember that the time in a HTTP date is Greenwich Mean Time (GMT), not local time.
Expires: Fri, 30 Oct 1998 14:19:41 GMT
Although the Expires header is useful, it is still somewhat limited; there are many situations where content is cacheable, but the HTTP 1.0 protocol lacks methods of telling caches what it is, or how to work with it.
HTTP 1.1 introduces a new class of headers, the Cache-Control response headers, which allow Web publishers to define how pages should be handled by caches. They include directives to declare what should be cacheable, what may be stored by caches, modifications of the expiration mechanism, and revalidation and reload controls.
Interesting Cache-Control response headers include:
Cache-Control: max-age=3600, must-revalidate
If you plan to use the Cache-Control headers, you should have a look at the excellent documentation in the HTTP 1.1 draft; see References and Further Information.
In How Web Caches Work, we said that validation is used by servers and caches to communicate when an object has changed. By using it, caches avoid having to download the entire object when they already have a copy locally, but they're not sure if it's still fresh.
Validators are very important; if one isn't present, and there isn't any freshness information (Expires or Cache-Control) available, most caches will not store an object at all.
The most common validator is the time that the document last changed, the Last-Modified time. When a cache has an object stored that includes a Last-Modified header, it can use it to ask the server if the object has changed since the last time it was seen, with an If-Modified-Since request.
HTTP 1.1 introduced a new kind of validator called the ETag. ETags are unique identifiers that are generated by the server and changed every time the object does. Because the server controls how the ETag is generated, caches can be surer that if the ETag matches when they make a If-None-Match request, the object really is the same.
Almost all caches use Last-Modified times in determining if an object is fresh; as more HTTP/1.1 caches come online, Etag headers will also be used.
Most modern Web servers will generate both ETag and Last-Modified validators for static content automatically; you won't have to do anything. However, they don't know enough about dynamic content (like CGI, ASP or database sites) to generate them; see Writing Cache-Aware Scripts.
Sun Sep 10 20:46:50 EDT 2006