Compressing the Web

Compression is one of those technologies where it seems like you get something for nothing. Compression saves bandwidth and speeds up web sites by removing redundancy to reduce the amount of data sent. Although the cost of compression is certainly not zero, over networked environments like the Internet, transmission time is usually the limiting factor. This chapter will show you how to compress the text in your content to minimize bandwidth costs and maximize speed. See Chapter 12, "Optimizing Web Graphics," and Chapter 13, "Minimizing Multimedia," for graphics and multimedia compression information.

Compression algorithms trade time for space by pre-processing files to create smaller versions of themselves. This compressed file is then decompressed to reconstruct the original, or an approximation thereof. The compression process naturally includes two components: encoding, and decoding. Encoding compresses the data, while decoding decompresses the data, usually at a faster rate. With Moore's Law leading Metcalf's, bandwidth concerns usually trump any CPU speed considerations.

Figures

Tables

  • Table 18.2 - Content Encoding Average Compression Ratios for Different Web Site Categories

Summary

Here's a summary of web compression tips discussed in this chapter:

  • For lighter loads, use simple solutions and default settings.
  • Pre-compress content for maximum speed.
  • For dynamic content, use a module or ISAPI filter specifically designed for this, like mod_gzip, mod_deflate-ru, or mod_hs for Apache and PipeBoost, VIGOS, or Hyperspace i for IIS.
  • For Apache 2.0x, there's only one choice: mod_deflate.
  • In a shared hosting situation, try gzip_cnc.
  • Compress HTML files and external JavaScript files referenced in the head.
  • Avoid CSS compression.
  • For maximum speed or for other servers like Sun or iPlanet, consider a reverse proxy compression solution like AppCelera or VIGOS Website Accelerator.

Further Reading

Books

Data Compression: The Complete Reference
By David Salomon (Springer Verlag, 2000). Data compression from the ground up.
Compressed Image File Formats: JPEG, PNG, GIF, XBM, BMP
By John Miano (Addison Wesley, 1999). Discusses the internal implementation of various image formats.
Managing Gigabytes: Compressing and Indexing Documents and Images
By Ian Witten et al. (Morgan Kaufmann, 1999). Data compression for advanced developers.

Articles and Resources

ACT compression test
Archive compression test by Jeff Gilchrist
Compressing .js files, from a lurkers point of view
by Kevin Kiley and Andrew Jarman, (Mod_gzip mailing list, 17 March 2001)
Data compression resources
Delta Encoding in HTTP
Delta Encoding in HTTP, RFC 3229
by Jeffrey C. Mogul et al., (The Internet Society, 2002)
Dr. Dobb's Journal
Data Compression Resources
The GIF Controversy: A Software Developer's Perspective
by Michael C. Battilana, (Las Vegas: Cloanto Italia, 1995).
HTTP Compression Speeds the Web
by Peter Cranstone
Introduction to Data Compression
A draft of a chapter on data compression by Guy Blelloch
JavaScript Guide for JavaScript 1.1
Netscape Communications, (Netscape Communications, 1996). Implied in the section "Specifying a File of JavaScript Code" that external files should be in the head section.
Official comp.compression FAQ
A Performance Analysis of 40 e-Business Web Sites
By Patrick Mills and Chris Loosley, CMG Journal of Computer Resource Management, no. 102 (2001): 28-33. From Keynote Systems, includes page size averages.
Use HTTP Compression
A brief introduction to content encoding that shows the benefits of compressing your textual content. Concludes that on average, HTTP compression reduces HTML, CSS, and JavaScript text files to one-fourth of their original size. By Andy King, Dec. 9, 2003.

Specifications

Compression of Individual Sequences via Variable Rate Coding
by Jacob Ziv and Abraham Lempel, IEEE Transactions on Information Theory 24, no. 5 (1978): 530-536. LZ78 described.
DEFLATE Compressed Data Format Specification Version 1.3
by L. Peter Deutsch, (Alladin Enterprises, 1996).
GZIP File Format Specification version 4.3, RFC 1952
by L. Peter Deutsch, (Alladin Enterprises, 1996).
Hypertext Transfer Protocol‹HTTP/1.0, RFC 1945
by Tim Berners-Lee, Roy T. Fielding, and Henrik F. Nielsen. This RFC includes a content encoding section.
A Technique for High-Performance Data Compression
by Terry A. Welch, IEEE Computer 17, no. 6 (1984): 8-19. The LZW algorithm described.
"A Universal Algorithm for Sequential Data Compression"
by Jacob Ziv and Abraham Lempel, IEEE Transactions on Information Theory 23, no. 3 (1977): 337-343. LZ77 described.
ZLIB Compressed Data Format Specification version 3.3, RFC 1950
by L. Peter Deutsch and Jean-Loup Gailly.

Compression Modules/Tools

HTML Compact
Replaces straight HTML tokenized JavaScript and a small decompressor, for JavaScript-enabled browsers. From AntsSoft.
mod_deflate
Included in the Apache 2.0 distribution.
mod_gzip
If you really need fine-tuning and advanced statistics, use mod_gzip instead. Currently an open source SourceForge project.
mod_hs
The Commercial Version of mod_gzip created by HyperSpace, Communications, Inc. HyperSpace claims a 30 percent performance increase achieved by in-memory compression and elimination of disk I/O operations.
mod_deflate ru (in Russian)
If you need fine-tuning and the best possible performance, try mod_deflate from sysoev.ru. See also documentation and mod_deflate ru tarball.
gzip_cnc
If you don't have access to install modules but do have CGI access try gzip_cnc from Michael Schröpl.

IIS Compression

httpZip
Configurable IIS ISAPI filter for HTTP and HTTPS compression. Includes built-in dynamic caching, reporting, and compression of static and dynamic files. Port80 Software also offers ZipEnable for IIS 6 servers.
Hyperspace i
If you don't mind editing text configuration files, try HyperSpace i from HyperSpace Communications
IISAccelerator
Vigos' answer to IIS performance.
IISxpress
A rule-based compression engine built on ZLIB. For Windows.
ISAPIZip
Offers ISAPI compression filter for web servers (HTTP/HTTPS) that includes customization.
PipeBoost
Compression ISAPI filter for Microsoft IIS
TurboIIS
An HTTP compression ISAPI filter software built to work with the isolated work processes of IIS 6 (also works with IIS 4 & 5). Compresses and optimizes all files, dynamic or otherwise, including chunked mode sites. Downsamples images automatically.
SqueezePlay
InnerMEDIA offers compression products and toolkits for Windows-based servers. SqueezePlay also compresses graphics for IIS.
XCompress
Compression ISAPI filter for Microsoft IIS from XCache Technologies.

Proxy-Based Solutions

HyperSpace
Has a compression proxy server.
Packeteer
Offers AppCelera, a "compression, conversion, and caching" proxy.
Redline Networks
Offers Web I/O Accelerator appliances that compress and optimize all static, dynamic, and SSL delivered content.
Venturi Wireless
Provides several compression solutions, including server- and client-side proxy.
VIGOS
Has the Website Accelerator, reverse-proxy software that compresses and optimizes content.

Benchmarking Tools

Compression Check
ISAPILabs checks for HTTP compression and shows site and download speeds before and after compression at different bandwidths.
VIGOS Website Analyzer
Free PC-based application that spiders your site and estimates the speed and bandwidth improvements you would realize using their VIGOS Website Accelerator software.
XCompress online analyzer