HEAD Request Nonsense

Sep 25, 2009

A truly useful item in the HTTP server specification is the HEAD request. This is a type of request similar to a GET request; all the HTTP headers are returned as normal but no message body is returned. Think of it like querying your video archive for the meta-data of a large file, but not loading the contents of that file into memory. RFC 2616 describes them like so:

The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response. The metainformation contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request. This method can be used for obtaining metainformation about the entity implied by the request without transferring the entity-body itself. This method is often used for testing hypertext links for validity, accessibility, and recent modification.

It's main usefulness, IMHO, is byte-saving. You can verify that a file exists, and is the proper MIME-type, without actually downloading all of the data contained within that file. When used in server-side scripts, or Ajax requests, they can make responses snappier, if all the data needed is contained within the response headers.

I use HEAD requests in both my Orca Ringmaker and Orca Search scripts to make operations faster. For instance, the Ringmaker will send a HEAD request to the next site in the ring to verify that it exists before actually forwarding the user there. If the HEAD request fails, the script either assumes the page's server is down or the page owner has moved it without updating their info in their Ringmaker account.

Sounds like a really efficient use of this type of request, doesn't it? Well, the world is not so bright and shiny, my friends...

Since implementing these methods in my scripts, I occasionally get reports of things not working as they should. The Ringmaker will never go to a certain site, or the Search script refuses to acknowledge that the spider exists, etc. Upon further research, I usually find that these people are hosting their webpages on servers which (for some arcane reason) block HEAD requests.

Usually they return 403 Forbidden, but sometimes other error responses. The puzzling bit is that when the same URI is requested using GET, the page is returned as if nothing is amiss. Thus the Orca Search script might report "Spider not found at this URI!" even though actually visiting the spider URI using your web browser will seem to prove otherwise. Odd indeed!

Yes, there are reports of some server software having vulnerabilities relating to the HEAD request and denial of service attacks. However these reports relate to problems in the server software and certainly not the HEAD request method itself! Why is it that there is any such aura of fear around an HTTP request method that is essentially a truncated GET request? Just because it's one of the less frequently used methods is absolutely no reason for such discrimination. In fact, discriminating against it is probably one of the reasons it isn't more widely used! Web projects usually place emphasis on reliability.

If you are currently hosting your site on a server that blocks HEAD requests (you can find out by using Rex Swain's very useful tool), I suggest you ask your host to correct the situation. Blocking HEAD requests is tantamount to sabotaging the way web servers and web applications are supposed to work.

This is the main reason you can disable the "LookAhead" ability on a site by site basis in the Ringmaker, so that sites hosted by servers blocking the HEAD request can actually become viable links in the ring. The downside to this is that the Ringmaker forwards the user to that site without any pre-checking whatsoever, so if the page has moved or doesn't exist, the ring has been broken.

I could switch all of my scripts to use GET requests rather than HEAD requests. The two types of request are essentially the same, the only difference being one returns a message body and the other does not. But think of the wasted data-transfer when all that is needed are the HTTP headers, especially if the requested page is large in size. To use a GET request, the script either must wait until the entire response has been downloaded, or must actively close the connection after the headers are received before it can begin to act. In the former case, it adds extra delay in the script and increased memory load on the receiving end. The latter case restricts the script author to using file-request APIs which allow for the interruption of the incoming transfer.

HEAD requests are not dangerous folks! Rather they can be very useful and it makes sense to honour them. Make sure your server allows them and keep the web open to new innovative uses for this type of request.

HEAD Request Nonsense

Comments closed

Recent posts