Q and A: Do sitemap crawl errors hurt me in Google?

QuestionDear Kalena

I have a new site just built in late Sep 2008. I have it submitted to google and verified. Every week when it is crawled it comes up with the same errors.

I’ve been back to my designer multiple times and have done everything he has said to do and the errors still exist. These pages are not mine, they belong to a friend who had his site designed at the same place over a year ago.

My question is: Does this hurt me with google by continuing the same errors? If so, what can I do about it?

Thanks

Doug

————————————————————–

Dear Doug

No and nothing. Hope this helps!

Spread the joy!

Q and A: Do regional domains constitute a duplicate content problem?

QuestionDear Kalena…

First of all I find the info on your site extremely useful –  I always look forward towards the newletter! I have been trying to find the time to do the SEO course but finding the time is always a problem! However, its still on my to do list.

I am trying to sort out a problem regarding duplicate content on my sites. We run local sites for each language/country we trade in (e.g. .fr for France and .co.uk for England). Unfortunately whilst growing the business I never had time to research SEO optimisation practices so I ended up with a lot of sites with the same duplicate content in them including title tags, descriptions etc. I had no idea how bad this was of course for organic ranking!

I have now created unique title tags and description for ALL the pages on ALL the sites. I have also changed the content into unique content for the home page and the paternity testing page (our main pages) for each site in English. The only site with complete unique content pages is .com and parts of .co.uk. For the rest of the pages that still have double content I have also put a NO INDEX, FOLLOW code on the pages that have duplicate content so that the spiders will not index the duplicate content pages. I did a FOLLOW as opposed to NO FOLLOW as I still want the internal links in the pages to be picked up – does this make sense ?

Also having made such changes how long does it normally take for Google to refresh its filters and starting ranking the site? The changes are now about a month old however the site is still not ranking.

Also should this not work – do you have any experience with submitting a re-consideration through the webmaster tools? What are the upside and downside of this?

Any advice would be greatly appreciated.

Regards
Kevin

Dear Kevin

Thanks for your coffee donation and I’m glad you like the newsletter. Now, about your tricky problem:

1) First up, take a chill pill. There’s no need to lodge a reinclusion request to Google. According to Google’s Site Status Tool, your main site is being indexed and hasn’t been removed from their datacenter results. A standard indexed page lookup shows 32 pages from your .com site have been indexed by Google, while a backward link lookup reveals at least 77 other sites are linking to yours. If you’ve put NoIndex tags on any dupe pages, you’ve covered yourself.

2) Next, pour yourself a drink and put your feet up. Your .fr site is also being indexed by Google, but there isn’t a dupe content issue because the site is in French, meaning that Googlebot sees the content as being completely different. Your .co.uk site is also being indexed by Google and again, there isn’t a dupe content issue because it looks like you have changed the content enough to ensure it doesn’t trip any duplicate content filters.

3) Now you’re relaxed, login to Google Webmaster Tools and make sure each of your domains are set to their appropriate regional search markets. To do this, click on each domain in turn and choose “Set Geographic Target” from the Tools menu. Your regional domains should already be associated with their geographic locations i.e. .co.uk should already be associated with the UK, meaning that Google will automatically be giving preference to your site in the SERPs shown to searchers in the UK. For your .com site, you can choose whether to associate it with the United States only (recommended as it is your main market), or not to use a regional association at all.

4) Now it’s time to do a little SEO clean up job on your HTML code. Fire or unfriend whoever told you to include all these unecessary META tags in your code:

  • Abstract
  • Rating
  • Author
  • Country
  • Distribution
  • Revisit-after

All these tags are un-supported by the major search engines and I really don’t know why programmers still insist on using them! All they do is clog up your code and contribute to excessive code bloat.

5) Finally, you need to start building up your site’s link popularity and boost your Google PageRank beyond the current 2 out of 10. And by link building, I mean the good old-fashioned type – seeking out quality sites in your industry and submitting your link request manually, NOT participating in free-for-all link schemes or buying text links on low quality link farms.

Good luck!

Spread the joy!

Google Now Helps You Clean Up 404 Links

Google logoGoogle has just announced the easiest way to obtain inbound links to your site in a short space of time.

Webmaster Tool’s new Crawl Error Sources feature allows you to identify the sources of 404 Not Found errors that are found on your site. Listed next to “Crawl Errors” in the Webmaster Tools control panel, you’ll now find a “Linked From” column that lists the number of pages that link to a specific “Not found” URL on your site.  Clicking on an item in the “Linked From” column opens a separate dialog box which lists each page that links to this URL (both internal and external) along with the date it was discovered. You can even download all your crawl error sources to an Excel file.

If your webserver doesn’t comprehend 404s or fetch error pages very well, Google has also introduced a widget for Apache or IIS that consists of 14 lines of JavaScript that you can paste into your custom 404 page template to helps your visitors find what they’re looking. It provides suggestions based on the incorrect URL.

You can use the “Linked From” source information to fix the broken links in your site, place redirects to a more appropriate URL on your site and/or contact the webmasters linking to missing pages or using malformed links and ask them to fix the links.

Webmasters have been asking for something like this for a long time, so it’s a relief to see it live at last. The official post about the feature is on Google’s Webmaster Central Blog and Matt Cutts goes into more detail on his blog.

Spread the joy!

Google’s Cross-Product Webinar

Google have announced a free cross-product webinar for webmasters to learn more about three of their most used products, Google Webmaster Tools, Google Analytics and Google Website Optimizer, and how they can work together to enhance your website.

The webinar will be held 8th July 2008, 9:00am PT (Pacific Time). To attend you need to register. Those that can’t make it will be able to access an archived version of the presentation via the same registration URL. This is the first time Google have offered a joint webinar for these products.

Spread the joy!

Q and A: What is an XML Sitemap and why do I need one?

QuestionHi Kalena

I am not sure what a XML sitemap is. I have gone to websites that will automatically generate a site map and the code they create is not understandable to me and they can only index the first 500 pages.

There are pages on my site that are important to be indexed and others that don’t matter. I have no idea how to create a XML sitemap that only lists the pages I want indexed. How can I do this? Can you clarify what a XML sitemap is and if I can have only my important pages indexed on it?

Beverly

Hi Beverly

Thanks for the caffeine donation, I’ll be sure to use it tomorrow when I visit Starbucks.

A sitemap is simply a way for search engines and visitors to find all the pages on your site more easily. XML is simply a popular format for the delivery of the sitemap. To quote Sitemaps.org:

Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.

I personally use XML Sitemaps to build all sitemaps for my own sites and my client’s sites. I paid for the stand alone version so I can create sitemaps for sites with over 500 pages. At under USD 20, I believe the price is pretty reasonable and their support is pretty good so it might be worth the investment for you. Apart from that, the instructions for using their web version are quite clear – perhaps you need to have a closer look? These sitemap FAQs shoud also help.

You can either create a full sitemap of your entire site and edit out any pages you don’t want indexed later, or instruct the generator to avoid certain files or sub-directories before running. Once you’ve created and downloaded the XML sitemap file for your site, simply upload it to your web server and follow the instructions to ensure it is indexed by search engines. If you’ve created a Google Webmaster Tools account, you can login and enter your sitemap URL directly into the control panel.

Like this post? Prove it! Please press the big feed button at the top left. Thanks.

Spread the joy!