Quantcast
Viewing all articles
Browse latest Browse all 24

Proof that Digg has at least 6 web servers

The Straight p00p: Anyone on the Internet can experimentally determine how many web servers Digg.com has, due to their server configuration. If Digg tweaks their web servers to fix this, they can speed up the site and save everyone bandwidth. Anyone can try these investigative techniques on other sites, too.

The Details: A quick check with Fiddler shows that Digg.com seems to serve all its content from www.digg.com. That hostname resolves to a single IP address (they probably use a load balancer like BigIP), but we can still figure out what's going on by requesting a static file from their site and looking at the HTTP headers. I used the HEAD perl script that comes with perl and the LWP library:

>head http://www.digg.com/img/comment-1.png
200 OK
Cache-Control: max-age=3600
Connection: Keep-Alive
Date: Tue, 27 Jun 2006 05:02:41 GMT
Accept-Ranges: bytes
ETag: "870384-a72-41414d62cbc00" <-----------------
Server: Apache
Content-Length: 2674
Content-Type: image/png
Last-Modified: Thu, 18 May 2006 19:13:52 GMT

The entity tag value by default seems to be composed of the file inode value, the file size, and the modification time.

File inodes are values that make sense on a particular file-system (i.e. they're unique to the volume/partition/file-system on the box). So if the server cluster has multiple filesystems (i.e. content copied around and on each box in the cluster), it's highly likely that the inode values will be different. (of course, this assumption could be wrong if network filesystems or SANs are in use, etc.).

To test this theory, I wrote a short quick-hack perl script that makes 200 HEAD requests and shows how many unique inode values are returned. Here's the output:

>perl -w p.pl
"5c01db-a72-41414d62cbc00"
... // lot more output
"4f01c8-a72-41414d62cbc00"
INODES:

870384 = 24
430251 = 30
4f01c8 = 42
6481c8 = 31
981c0 = 34
5c01db = 39

SIZES:

a72 = 200

MTIMES:

41414d62cbc00 = 200

As you can see, there's 6 unique inode values, so that's how I deduced that Digg probably has at least 6 web servers. Back in April, it was reported that Digg had 3 web servers and just today Digg was stocking up on BestBuy boxes.

What's wrong with differing ETag values? The problem is that if you happen to access server #1 in the cluster and it returns one ETag for that file, then later you go back to the site, your browser will ask the server "give me that file, but don't give it to me if your file matches the ETag I got earlier". Well, if you're talking to server #2 this time around, it has a different ETag (even though the file content is the same), so then it has to retransmit the content to you again, for basically no reason.

Digg could reconfigure their web servers to use the same ETags across their cluster, then this wouldn't be an issue, then guys like me couldn't write these kinds of investigative reports. :-)

The technique I explained in this write-up could be used on other sites as well. Found anything interesting?

Viewing all articles
Browse latest Browse all 24

Trending Articles