The good folks of WebmasterWorld have an interesting thread at the moment titled Techies gone astray - Many .gov sites don't resolve without the "www". Members there have spotted that many US .gov domains simply don't work if you don't type the www in front of the website address.

Almost every SEO will know that this is an issue known as "canonicalisation" - forcing or specifying which is the best URL for any item of content. Failing to do this can mean:

  • Lost visitors (especially if the site doesn't show anything at all or shows an error page)
  • Diluted search performance, as search engines may have to choose between duplicates - and not always in your favour
  • Incorrect analytics and other metrics. Many such systems rely on a URL to count, and if users can end up at different URLs, then the counts are split into. Try searching delicious or other bookmarking systems for the non-www and www URL, for instance.

In summary, failing to map www and non-www together (by permanently redirecting one to the other with a 301 status code) creates two URLs with diluted performance, rather than a single strong URL. It's an easy fix, and one of the first and most basic things to do on the technical SEO checklist. Here's Google own guide to canonicalisation.

So, how good are UK and US government sites at getting these SEO basics right?

How we verified canonicalisation

First we identified a search syntax to give us a sample of US and UK government sites that use www:

[site:gov inurl:www] (add a.uk for the UK results)

Then, we extracted 100 unique hostnames from the results.

We fired up SEOThingBot and sent him to the www and non www version of those URLs, and exported the results into a spreadsheet. A little manual analysis and COUNTIF() later, and we had results for what happens to those non-www requests, and whether they are canonicalised correctly.

The Results

Unsurprisingly, we did not find that all of the sites tested responded correctly. What may be slightly more surprising is the scale of the problem. Here's a summary:

Summary of non-www responses from UK and US government websites
Non-www status .gov .gov.uk
Broken page 2 3
Canonicalised 31 11
Doesn't exist 11 32
Duplicate of www 32 31
Incorrect redirect 24 23

And pretty charts:

Conclusions

What's most shocking about these findings is the sheer number of sites that fail to provide anything to the user at all, with more than a third of UK and 13% of US government sites displaying an error page. Only 11% of UK and 31% of US sites have canonicalised correctly. If you like data (or want to check our workings), we've attached a full spreadsheet at the end of this post.

Notable mentions

A couple of sites in the results performed particularly badly. In the UK the Peak District National Park Authority (peakdistrict.gov.uk) redirects to...itself, causing a redirect loop that the browser has to intercept. In the US, the National Security Agency, nsa.gov (if you have cookies disabled) appends a broken URL to the redirects which also loops forever. Also hang your heads in shame everyone who showed users error page of one type or another. There are too many to list here!

If you want to check out the responses manually, try the excellent HTTP viewer from Rex Swain. And don't forget to check out the WebmasterWorld thread that was the inspiration for this analysis.

Download the complete data spreadsheet