PHOBOSLAB

Blog Home

What’s Wrong with Clean-URLs?

Clean URLs are everywhere. No Web 2.0 site is complete without them and many of the Internet heavyweights retrofitted their sites in an attempt to please search engines. Many of them completely miss the point.

What exactly is a Clean URL?

A URL is considered to be “clean” if there are no question marks, ampersands or equal signs in it. Almost all SEO guides actually strongly recommend to have none of these characters in your URLs to archive a higher ranking. So, instead of index.php?category=42&article=23 they tell you to use /category/42/article/23 – which, admittedly, looks a bit cleaner, but does absolutely nothing for SEO. Also, many sites rank just fine in Google and other search engines, despite the fact that they have question marks and ampersands in their URLs. So, having a Clean URL might be a lot less important than one might think – what actually matters, is having a Semantic URL.

Semantic URLs

A URL might be clean, but it is not necessarily human read- or understandable nor of any use for search engines. The important thing is to use URLs that have a meaning – a Semantic URL. So instead of index.php?category=42&article=23, you’ll have something like this:

index.php?category=movies&article=the-texas-chainsaw-massacre

This doesn’t look any more elegant than what you had before, but every human and also every search engine can now determine what this particular page is about, solely from looking at the URL.

To make this URL pretty, we can now remove all questionmarks and amepersands. Furthermore, the meaning of each part – whether it’s the name of a category, or the name of an article – is already implied by their position. So in the end we can simply use /movies/the-texas-chainsaw-massacre. It is short, not overly verbose, easy to remember and provides some nice keywords for search engines.

Not so Semantic URLs

One site that got it all wrong is Amazon. This is what a typical URL from their site looks like:

http://www.amazon.com/gp/product/B000FI73MA/

Yes, technically it is a Clean URL, but this doesn’t help anyone. It’s stuffed with bulks of meaningless numbers and letters – and I have the feeling it will be a 404 in just a few days. There is no point in having a Clean URL if all parameters are still meaningless. IMDB and many others have the same problem.

Then, of course, there’s eBay in a league of it’s own. This is what a typical URL on eBay looks like:

http://cgi.ebay.com/ebaymotors/Dodge-Challenger-1970-Dodge-Challenger-R-T-Convertible_W0QQcmdZViewItemQQcategoryZ6198QQihZ001QQitemZ110246760299QQrdZ1QQsspagenameZWDVW

Arguably a human can determine what this particular page is about, but this URL still sucks for a number of reasons:

How to do it right

So, whenever you think about Clean URLs, also think about semantics and human readability. A prime example of how to do it right is last.fm. Their URLs just speak for themself:

http://www.last.fm/music/Tool

Monday, April 28th 2008
— Dominic Szablewski, @phoboslab

8 Comments:

#1Flo – Wednesday, April 30th 2008, 23:01

Full Ack!

Thanks for this worthful article.
Flo

#2Max – Thursday, May 22nd 2008, 19:29

good article. very interesting and informative.

#3millionface.com – Thursday, June 5th 2008, 05:32

The clean URL stuff is for small-time sites like ours. For sites like ebay.com or imdb.com, it doesn't really matter; they're the source.

Informative article though, thanks.

#4unitechy – Tuesday, June 10th 2008, 17:34

very informative article thanks a lot.

#5Elmak – Wednesday, June 11th 2008, 04:00

Even if they're the source, it still doesn't mean they shouldn't follow an easy to read format. The larger the site the more user-friendly it should be. Ignoring the search engine indexing, the readability of the URL should be of high importance.

#6 – roli – Saturday, June 14th 2008, 00:41

while i wholeheartedly agree it still raises the point about what happens if the pagename is changed. i've been reluctant so far to completely remove some sort of id in my own framework. in my opinion you have the following options:

a) track how the page was named previously (urghs)
b) keep an id but still add more semantic information: /movies/23/the-texas-chainsaw-massacre. which needs more discussion, what happens if a user opens /movies/23/gay-pr0n -- should it still find TCM?
c) don't allow to rename the page (after some time has passed?)

i'm leaning towards b) (with a 301 redirect upon incorrect naming) and for pages that never change names -> a). i haven't yet looked into it in depth nor looked for articles discussing this (or even how other known webapps do it). but i had to mention it :p it relies heavily on the purpose of the page.

for imdb i would welcome /title/tt0324216/the-texas-chainsaw-massacre (id+name) as it simplifies mapping for external uses (unless it has an API you could use instead, of course).
to "get it right" for last.fm is not too hard, though. as an artistname _can_ not be uniquely identified by an id as the 'Tool' page will display data for other artists with the same name as well.

nice and important article nevertheless! :)

#7Dominic – Saturday, June 14th 2008, 23:22

You're addressing an interesting problem there. One that knowingly ignored so far for my blog.

I'd say option a) would be the "right" way to go: If the page had the old name longer than ~15 minutes, it should be saved somewhere and anyone who enters the old URL should then be redirected to the new one. I believe Wikipedia does this.

Option b) is of course a lot easier to implement, but also contradicts the idea of an easily rememberable URL. This might very well be a non issue, because we have better options to store our bookmarks than our brain. However I also do love the fact, that I can just type "php.net/strpos" and get to the page I wanted. Which in turn could also be implemented with option b) by a quick lookup on 404 pages...

#8 – Sebastian – Wednesday, July 9th 2008, 17:21

seo2.0.onreact.com/top-10-fatal-url-design-mistakes