This is a summary of Anne Kennedy’s presentation at Search Marketing Expo / Online Marketer Conference held in Sydney 1-2 May 2012.
Anne Kennedy has co-authored the first book on international SEO and PPC, called Global Search Engine Marketing. Anne provides search engine consulting to hundreds of companies worldwide and formed an international online marketing consortium with Nordic eMarketing in Reykjavik, London, Stockholm, Rome and Beijing.
Duplicate content happens, says Anne. URL duplication is a big one. This is where you see several different versions of the same page being indexed and/or linked to. For example:
and so on.
You should always use the Rel=canonical tag to lose the canonical versions of pages and also let Google know in Webmaster Tools which version of your pages to index.
Anne says to watch your crawl budget. Your crawl budget is the percentage of your site that Googlebot will crawl. Googlebot rarely crawls your entire site, so keep your low quality pages out of the index by excluding them from your sitemap and blocking them using robots.txt.
Common Duplication Causes
A very common duplicate content mistake is to have printer-friendly versions of your content. Whatever you do, lose the print friendly versions from your sitemap!
Use 301 redirects on your pages, but only when necessary because not all link value will transfer to your replacement pages. PageRank will not transfer 100 percent over to pages if you 301 redirect them – keep that in mind.
Think about using a separate XML feed for your product pages, says Anne. Separate out your e-commerce or product-specific pages from your main sitemap and create a sitemap just for them. Upload the two sitemaps separately in your Google Webmaster Tools account.
Content syndication and site scraping can cause duplicate content headaches. If you are an article syndicator or blogger, make sure you link back to the original article with the title in the anchor text within the article, not the footer, because some syndications sites strip links out of footers. Require syndicators to use the canonical url version or require a no index (exclusion) of the article link in their robots.txt. This will ensure Google finds the original article more easily.
Another trick is to give syndicators a logo or image to go with the article that contains a link to your article and article title in the alt tag of the logo/image. Syndicators will often miss those.
Be sure to update your XML sitemap immediately whenever you publish a new article or blog post – you can use WordPress plugins to update your sitemap automatically for this.
If your article is out of date or no longer accurate and you want it gone from the SERPs for good, use a 410 code to tell Google the article is GONE. This is a more permanent solution than 404.
Dont put your international content on your English TLD. If you want your content to rank well in a particular international market, you should put the content on a related TLD e.g. a German language site should site on site.de or at the very least, de.site.com. Your international content will rank better in regional markets if you have links pointing to it from related TLDs e.g. site.de will rank better in Google.de if it has plenty of .de sites linking to it.
And finally – dont leave it up to the bots! Take control of your content.