What’s Wrong with Clean-URLs?
Clean URLs are everywhere. No Web 2.0 site is complete without them and many of the Internet heavyweights retrofitted their sites in an attempt to please search engines. Many of them completely miss the point.
What exactly is a Clean URL?
A URL is considered to be “clean” if there are no question marks, ampersands or equal signs in it. Almost all SEO guides actually strongly recommend to have none of these characters in your URLs to archive a higher ranking. So, instead of index.php?category=42&article=23 they tell you to use /category/42/article/23 – which, admittedly, looks a bit cleaner, but does absolutely nothing for SEO. Also, many sites rank just fine in Google and other search engines, despite the fact that they have question marks and ampersands in their URLs. So, having a Clean URL might be a lot less important than one might think – what actually matters, is having a Semantic URL.
Semantic URLs
A URL might be clean, but it is not necessarily human read- or understandable nor of any use for search engines. The important thing is to use URLs that have a meaning – a Semantic URL. So instead of index.php?category=42&article=23, you’ll have something like this:
index.php?category=movies&article=the-texas-chainsaw-massacre
This doesn’t look any more elegant than what you had before, but every human and also every search engine can now determine what this particular page is about, solely from looking at the URL.
To make this URL pretty, we can now remove all questionmarks and amepersands. Furthermore, the meaning of each part – whether it’s the name of a category, or the name of an article – is already implied by their position. So in the end we can simply use /movies/the-texas-chainsaw-massacre. It is short, not overly verbose, easy to remember and provides some nice keywords for search engines.
Not so Semantic URLs
One site that got it all wrong is Amazon. This is what a typical URL from their site looks like:
http://www.amazon.com/gp/product/B000FI73MA/
Yes, technically it is a Clean URL, but this doesn’t help anyone. It’s stuffed with bulks of meaningless numbers and letters – and I have the feeling it will be a 404 in just a few days. There is no point in having a Clean URL if all parameters are still meaningless. IMDB and many others have the same problem.
Then, of course, there’s eBay in a league of it’s own. This is what a typical URL on eBay looks like:
http://cgi.ebay.com/ebaymotors/Dodge-Challenger-1970-Dodge-Challenger-R-T-Convertible_W0QQcmdZViewItemQQcategoryZ6198QQihZ001QQitemZ110246760299QQrdZ1QQsspagenameZWDVW
Arguably a human can determine what this particular page is about, but this URL still sucks for a number of reasons:
- It is verbose – no one could possibly remember it
- It looks dirty – replacing all ampersands with “QQ” an all equal signs with “Z” is just plain wrong. It looks like it was hastily implemented by frustrated engineers who where told by the marketing department that they can not have any ampersands or equal signs in their URLs anymore.
- Not all keywords in this URL are helpful for search engines. The words Google actually extracts from the URL are Dodge, Challanger, 1970 and Convertible_W0QQcmd(…). I could imagine the last word is not searched for all to often. This happens, because the underscore is treated as a normal character and is NOT separating words.
How to do it right
So, whenever you think about Clean URLs, also think about semantics and human readability. A prime example of how to do it right is last.fm. Their URLs just speak for themself:
http://www.last.fm/music/Tool
8 Comments:
Full Ack!
Thanks for this worthful article.
Flo
good article. very interesting and informative.
The clean URL stuff is for small-time sites like ours. For sites like ebay.com or imdb.com, it doesn't really matter; they're the source.
Informative article though, thanks.
very informative article thanks a lot.
Even if they're the source, it still doesn't mean they shouldn't follow an easy to read format. The larger the site the more user-friendly it should be. Ignoring the search engine indexing, the readability of the URL should be of high importance.
while i wholeheartedly agree it still raises the point about what happens if the pagename is changed. i've been reluctant so far to completely remove some sort of id in my own framework. in my opinion you have the following options:
a) track how the page was named previously (urghs)
b) keep an id but still add more semantic information: /movies/23/the-texas-chainsaw-massacre. which needs more discussion, what happens if a user opens /movies/23/gay-pr0n -- should it still find TCM?
c) don't allow to rename the page (after some time has passed?)
i'm leaning towards b) (with a 301 redirect upon incorrect naming) and for pages that never change names -> a). i haven't yet looked into it in depth nor looked for articles discussing this (or even how other known webapps do it). but i had to mention it :p it relies heavily on the purpose of the page.
for imdb i would welcome /title/tt0324216/the-texas-chainsaw-massacre (id+name) as it simplifies mapping for external uses (unless it has an API you could use instead, of course).
to "get it right" for last.fm is not too hard, though. as an artistname _can_ not be uniquely identified by an id as the 'Tool' page will display data for other artists with the same name as well.
nice and important article nevertheless! :)
You're addressing an interesting problem there. One that knowingly ignored so far for my blog.
I'd say option a) would be the "right" way to go: If the page had the old name longer than ~15 minutes, it should be saved somewhere and anyone who enters the old URL should then be redirected to the new one. I believe Wikipedia does this.
Option b) is of course a lot easier to implement, but also contradicts the idea of an easily rememberable URL. This might very well be a non issue, because we have better options to store our bookmarks than our brain. However I also do love the fact, that I can just type "php.net/strpos" and get to the page I wanted. Which in turn could also be implemented with option b) by a quick lookup on 404 pages...
seo2.0.onreact.com/top-10-fatal-url-design-mistakes