Code: A specific caching example

Chapter 9 - Advanced Web Performance Optimization

Let’s look at a specific example as we build up the caching efficiency for WebSiteOptimization.com's logo, l.gif. First we request the image from Internet Explorer:

GET /l.gif HTTP/1.1
Accept: */*
Referer: http://www.websiteoptimization.com/
Accept-Language: en-us
UA-CPU: x86
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705;
.NET CLR 2.0.50727; .NET CLR 1.1.4322; Media Center PC 4.0)
Proxy-Connection: Keep-Alive
Host: www.websiteoptimization.com

To demonstrate the default Apache configuration, we eliminated the cache control directives from our httpd.conf file, and the response was as follows:

HTTP/1.1 200 OK
date: Mon, 22 Oct 2007 23:32:20 GMT
server: Apache
last-modified: Sat, 19 Jun 2004 15:25:21 GMT
etag: "10690a1-4f2-40d45ae1"
accept-ranges: bytes
content-length: 1266
content-type: image/gif

This image was last modified June 19, 2004 and will not be changed for some time. Next we’ll show how to add cache control headers.

Cache control with mod_expires and mod_headers.

For Apache 1.3x, enable the expires and headers modules by adding the following lines to your httpd.conf configuration file:

LoadModule expires_module libexec/mod_expires.so
LoadModule headers_module libexec/mod_headers.so
AddModule mod_expires.c
AddModule mod_headers.c
...

For Apache 2.0, enable the modules in your httpd.conf file like so:

LoadModule expires_module modules/mod_expires.so
LoadModule headers_module modules/mod_headers.so...

Target files by extension for caching

One quick way to enable cache control headers for existing sites is to target files by extension. Although this method has some disadvantages (notably the requirement of file extensions), it has the virtue of simplicity. To turn on mod_expires, set ExpiresActive to on:

ExpiresActive On

Next, target your website’s root HTML directory to enable caching for your site in one fell swoop. Note that the default web root shown in the following code (/var/ www/htdocs) varies among operating systems.

<Directory "/var/www/htdocs">
    Options FollowSymLinks MultiViews
    AllowOverride All
    Order allow,deny
    Allow from all
    ExpiresDefault A300
    <FilesMatch "\.html$">
        Expires A86400
    </FilesMatch>
    <FilesMatch "\.(gif|jpg|png|js|css)$">
        Expires A31536000
    </FilesMatch>
</Directory>

ExpiresDefault A300 sets the default expiry time to 300 seconds after access (A) (using M300 would set the expiry time to 300 seconds after file modification). The FilesMatch segment sets the cache control header for all .html files to 86,400 seconds (one day). The second FilesMatch section sets the cache control header for all images, external JavaScript, and Cascading Style Sheet (CSS) files to 31,536,000 seconds (one year).

Note that you can target your files with a more granular approach using multiple directory sections, like this:

<Directory "/var/www/htdocs/images/logos/">

For truly dynamic content you can force resources to not be cached by setting an age of zero seconds, which will not store the resource anywhere (or you can set Expires to A0 or M0):

<Directory "/var/www/cgi-bin/">
    Header Set Cache-Control "max-age=0, no-store"
</Directory>

Target files by MIME type.

The disadvantage of the preceding method is its reliance on the existence of file extensions. In some cases, webmasters elect to use URIs without extensions for portability. A better method is to use the ExpiresByType command of the mod_expires module. As the name implies, ExpiresByType targets resources for caching by MIME type, like this:

<VirtualHost 10.1.1.100>
    ...
    ExpiresActive On
    ExpiresDefault "access plus 300 seconds"
    <Directory "/var/www/htdocs">
        Options FollowSymLinks MultiViews
        AllowOverride All
        Order allow,deny
        Allow from all
        ExpiresByType text/html "access plus 1 day"
        ExpiresByType text/css "access plus 1 year"
        ExpiresByType text/javascript "access plus 1 year"
        ExpiresByType image/gif "access plus 1 year"
        ExpiresByType image/jpg "access plus 1 year"
        ExpiresByType image/png "access plus 1 year"
    </Directory>
</VirtualHost>

These httpd.conf directives set the same parameters, only in a more flexible and readable way. For expiry commands you can use access or modified, depending on whether you want to start counting from the last time the file was accessed or from the last time the file was modified. In the case of WebSiteOptimization.com, we chose to use short access offsets for text files likely to change, and longer access offsets for infrequently changing images.

Note the AllowOverride All command. This allows webmasters to override these settings with .htaccess files for directory-based authentication and redirection. However, overriding the httpd.conf file causes a performance hit because Apache must traverse the directory tree looking for .htaccess files.

After updating the httpd.conf file with the preceding MIME-based code, we restart the HTTP daemon in Apache for Linux using this command from the shell prompt:

service httpd restart

HTTP header results

We updated the httpd.conf configuration file with the MIME type code in the preceding section. Let’s look at the how the headers change when we request the WebSiteOptimization.com logo (l.gif):

GET /l.gif HTTP/1.1
Accept: */*
Referer: http://www.websiteoptimization.com/
Accept-Language: en-us
UA-CPU: x86
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705;
.NET CLR 2.0.50727; .NET CLR 1.1.4322; Media Center PC 4.0)
Proxy-Connection: Keep-Alive
Host: www.websiteoptimization.com

The headers for our home page logo now look like this:

HTTP/1.1 200 OK
Date: Thu, 25 Oct 2007 12:51:13 GMT
Server: Apache
Cache-Control: max-age=31536000
Expires: Fri, 24 Oct 2008 12:51:13 GMT
Last-Modified: Sat, 19 Jun 2004 15:25:21 GMT
ETag: "10690a1-4f2-40d45ae1"
Accept-Ranges: bytes
Content-Length: 1266
Content-Type: image/gif

As a result, this resource has cache control headers. We left the ETag in as we use one server. Note also that the Server field is also stripped down, to save some header overhead. This is done with the ServerTokens command:

ServerTokens Min

This minimizes the response header from this:

Server: Apache/1.3.31 (Unix) mod_gzip/1.3.26.1a mod_auth_passthrough/1.8
mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.3.8 FrontPage/5.0.2.2634a mod_ssl/2.8.19
OpenSSL/0.9.7a

to the minimal:

Server: Apache

Our images are now cacheable for one year. We could eliminate other headers, such as Cache-Control, ETags, and Accept-Ranges, but we don't gain as much by doing so.