Blog Index > General > Rel=Canonical Hack

Rel=Canonical Hack

There have been reports of a new form of website hacking whereby malicious cross-site rel=canonical tags are placed on the content of a web page. Canonical tags are used to inform search engines of the preferred url for a pages content. This can be useful if the page can be accessed via different url paths - eg on an ecom website, you might have a page located at myimaginaryshop.com/widgets.php?product=doohickey and another identical page at myimaginaryshop.com/bestsellers.php?product=doohickey. In the case of the doohickey product, it can be accessed via different routes, but the content is the same on both pages, and we do not want google to penalise one page for duplucate content or incorrectly index the wrong version of the page. Another use is when you publish an article to different websites, and use the canonical tag to inform search engines where the original content is.

There has been a spate of cases where sites have been hacked to insert canonical tags in the content of the page. This can be achieved through xss hacks, sql injection attacks, cms hacking or a number of other commonly used methods. Generally, the canonical tag is inserted secretly into the body of the page, and because it has no effect on the operation of the page, it can be undetected for a long time.

Google have partly pre-empted this situation by issuing guidelines that state that canonical tags in the body of a web page will be ignored. Furthermore, even if the canonical tag is in the head of a document, it is regarded as only a suggestion as to what the original page should be. If the canonical tag points to a broken link, or the indicated url does not resemble the page on which the canonical tag is placed, then it is 'untrusted'. Another situation in which the canonical tag is untrusted is where there is non standard code in the head of the page. This could imply that the head tag is unclosed, or that the page is spammy.

Matt Cutts recommends putting the canonical tag at the top of the head section to be extra safe. I'm not sure about this personally, since I like to have the title tag at the top of the head section, because it helps with SEO for that page.

As far as passing on linkjuice, canonical tags are about as effective as a 301 redirect, but the difference is that the canonical tag is invisible for site owners unless they view the source code for the page, so the rewards for hacking sites in this way could be that they hacker gets a signification ranking boost to the pages that the canonical points to.

What I think would help is if Google Webmaster Tools was updated to show a list of pages for your site with canonical tags. There is already the change of address feature which operates on a sitewide basis. This new feature could be a page level version of the change of address facility.

Categories: General26 May 2011Log in to add new posts