What’s Wrong with Clean-URLs?

Clean URLs are everywhere. No Web 2.0 site is complete without them and many of the Internet heavyweights retrofitted their sites in an attempt to please search engines. Many of them completely miss the point.

What exactly is a Clean URL?

A URL is considered to be “clean” if there are no question marks, ampersands or equal signs in it. Almost all SEO guides actually strongly recommend to have none of these characters in your URLs to archive a higher ranking. So, instead of index.php?category=42&article=23 they tell you to use /category/42/article/23 – which, admittedly, looks a bit cleaner, but does absolutely nothing for SEO. Also, many sites rank just fine in Google and other search engines, despite the fact that they have question marks and ampersands in their URLs. So, having a Clean URL might be a lot less important than one might think – what actually matters, is having a Semantic URL.

Semantic URLs

A URL might be clean, but it is not necessarily human read- or understandable nor of any use for search engines. The important thing is to use URLs that have a meaning – a Semantic URL. So instead of index.php?category=42&article=23, you’ll have something like this:

index.php?category=movies&article=the-texas-chainsaw-massacre

This doesn’t look any more elegant than what you had before, but every human and also every search engine can now determine what this particular page is about, solely from looking at the URL.

To make this URL pretty, we can now remove all questionmarks and amepersands. Furthermore, the meaning of each part – whether it’s the name of a category, or the name of an article – is already implied by their position. So in the end we can simply use /movies/the-texas-chainsaw-massacre. It is short, not overly verbose, easy to remember and provides some nice keywords for search engines.

Not so Semantic URLs

One site that got it all wrong is Amazon. This is what a typical URL from their site looks like:

http://www.amazon.com/gp/product/B000FI73MA/

Yes, technically it is a Clean URL, but this doesn’t help anyone. It’s stuffed with bulks of meaningless numbers and letters – and I have the feeling it will be a 404 in just a few days. There is no point in having a Clean URL if all parameters are still meaningless. IMDB and many others have the same problem.

Then, of course, there’s eBay in a league of it’s own. This is what a typical URL on eBay looks like:

http://cgi.ebay.com/ebaymotors/Dodge-Challenger-1970-Dodge-Challenger-R-T-Convertible_W0QQcmdZViewItemQQcategoryZ6198QQihZ001QQitemZ110246760299QQrdZ1QQsspagenameZWDVW

Arguably a human can determine what this particular page is about, but this URL still sucks for a number of reasons:

It is verbose – no one could possibly remember it
It looks dirty – replacing all ampersands with “QQ” an all equal signs with “Z” is just plain wrong. It looks like it was hastily implemented by frustrated engineers who where told by the marketing department that they can not have any ampersands or equal signs in their URLs anymore.
Not all keywords in this URL are helpful for search engines. The words Google actually extracts from the URL are Dodge, Challanger, 1970 and _ConvertibleW0QQcmd(…). I could imagine the last word is not searched for all to often. This happens, because the underscore is treated as a normal character and is NOT separating words.

How to do it right

So, whenever you think about Clean URLs, also think about semantics and human readability. A prime example of how to do it right is last.fm. Their URLs just speak for themself:

http://www.last.fm/music/Tool