Crash course in handling web traffic spikes

So yesterday, there was a small earthquake in Melbourne. Within a few minutes, the Geoscience Australia web-page was delivering 503 Errors due to the load.

Why websites crash

Websites stop working under heavy load because the server doesn’t have enough resources to process everything. This is usually one of:

  1. Bandwidth saturation, indicated by timeouts and super slow load times.
  2. Webserver or cache overload, indicated by refused connections or server (5XX) errors.
  3. Database server overload, indicated by server errors.

A simple database-driven website might process a request like this:

Simple web-site setup

Set up a squid cache

For sites like the example above, a good cache setup is essential. This is another server (or server process), which serves pages that aren’t changing. A page only needs to be generated as often as it changes:

A webserver behind a cache to reduce load

squid is a good open source starting point if you are administering a server which struggles under load.

Round-robin DNS

Without running a hardware load-balancer (read: spending money), you can have clients connect to different servers by using round-robin DNS.

Each time a DNS lookup is issued, a different address can be returned, allowing you to have several caches at work.

Example of Round-Robin DNS