Avoid duplicate content issues in Mediawiki for better SEO

Most of the Mediawiki instances out there makes use of short URLs for being user friendly and for better SEO. However, many of us don't know that this is actually leading to a major SEO issue.

For example, Wikipedia also have short URLs and that makes URLs pretty and clean. However, Mediawiki continue serving the same content on the old, long URL too.

That is, http://en.wikipedia.org/wiki/Thrissur can also be accessed at
http://en.wikipedia.org/w/index.php?title=Thrissur.

To makes thing even worse, almost the same content can be accessed at all the revisions of that particular page.

http://en.wikipedia.org/w/index.php?title=Thrissur&oldid=544480567 is a permanent link to a revision but has contents similar to the above two pages. This means that a large amount of contents on a Mediwiki is accessible through several URLs.

From a general user's point of view, it's just some technical stuffs they don't want to be bothered about. As long as the content is available to them, it's not going to an issue for them. But from a search engines point of view all these falls under duplicate content and is considered as a cheat attempt to make the site appear big with duplicate content. Since it is considered as a bad practice, it will largely affect the search engine rankings.

Wikipedia solves this problem by disallowing search engines from crawling long URLs in robots.txt file, even though Google does not recommends this method. For Wikipedia, this is not a big issue as search engines companies will manually make some exceptions and changes in their system to give Wikipedia articles more preference. However, that is not the case with many of the Mediawiki installations out there.

So, what's the work around? You have two options:

  1. Redirect all other URLs to short URL of the respective articles automatically
  2. or specify a canonical URL in the head portion of each page.


CanonURL is an extension for Mediawiki that does the same. It automatically adds canonical URLs to all pages served from Mediawiki and thus helps search engine to determine which page has to indexed and which page should not be indexed.

Here's how you can do that:
  1. Download CanonURL extension
  2. Extract it to the extensions/ directory of Mediawiki installation
  3. Append the following line to LocalSettings.php:
    • require_once( "$IP/extensions/CanonURL/CanonURL.php" );

Check the source of Mediawiki served pages on the browser and verify the canonical URLs are getting added properly to the head section of the HTML page.


Related:

Popular posts from this blog

How to rip Audio CD with VLC