Q and A: Why is Google having trouble indexing our site?

QuestionHello Kalena

One of the sites we manage has a problem.

The homepage at [URL removed] is not getting indexed anymore by Google. The site was made using Sitefinity 3.7 and the hosting is provided by Rackspace. Something similar already happened two times in the past which we resolved using the option “index this page” on the page generated by Sitefinity (1st time) and by re-creating the XML sitemap and linking it directly to Google Webmaster tools (2nd time).

This time we can’t seem to find the reason. We checked if the end-user that works as the back-end has made any changes or if there was any notification from Google Webmaster Tools reports but nothing came up. Here are some more technical details:

1) The site homepage is [URL removed]. But the site root is [URL removed] which is an empty page with a redirect to the home page using a 301 redirect.

2) In Google Webmaster Tools we set up 2 Sitemaps:

  • The first at [URL removed] is indexing the Top pages of the Home page (static)
  • The second [URL removed] gets populated with the pages content generated by Sitefinity (dynamic)

3) Also, from the back-end options, a metatag ROBOTS was set at page level for the top pages, as Google suggests.

4) Google reports 5 blocked URLs when crawling our robots.txt with the message: “Google tried to crawl these URLs in the last 90 days, but was blocked by robots.txt”. This seems suspicious, because I can’t seem to understand what could be blocking it, the robot is pretty simple and not restrictive.

Could you give us an hand? I’ve left a generous donation for your coffee fund.

Thanks!
Jim

————————————–

Hi Jim

First up, thanks for the caffeine donation 🙂

As for your problem, oh boy. You’ve got a few different issues going on, so let me address each of them separately:

1) Your XML sitemaps are missing contextual data specified by the Sitemaps protocol. In particular, your < loc > child entries per URL are messed up. I’m surprised this hasn’t generated an error in Webmaster Tools, but I’m pretty sure it would be confusing Googlebot. Go check your sitemaps against the protocol and re-generate them if necessary. Maybe use one of the XML generator tools recommended by Google. Personally, I like XML Sitemaps (yes that’s my affiliate link).

Also, why 2 separate sitemaps for HTML pages? I can understand having separate ones for RSS feeds or structured data stuff, but your standard site pages should all be listed in the one file so you can better manage the content and keep track of indexing history in Webmaster Tools.

2) Your robots.txt file is blocking a number of pages that you have listed in your XML sitemap. So on the one hand you’re telling Google to index pages within a certain directory, but on the other, you’re telling Google they are not allowed to access that directory. This is what the error message is about. You’ve also got conflicting instructions on some of your pages in terms of robots meta tags vs. robots.txt.

3) The 301 redirect on your root directory is your major problem. In fact, that empty landing page is your major problem. Why do you need it? You don’t use Flash and it doesn’t appear to have an IP sniffer for geo-location purposes so I can’t understand why you wouldn’t just put your home page content at the root level and let search engines index it as expected.

The way you have it set up right now is essentially telling Google that you have moved all your content to a new location, when you really haven’t. It’s adding another step to the indexing process and you are also shooting yourself in the foot as every 301 contributes to some lost PageRank. Google clearly doesn’t like the set up or isn’t processing it for some reason. There also appear to be several hundred 301s in place for other pages, so I’m not sure what that’s about. I don’t have access to your .htaccess file, but I can imagine it reads like a book!

4) Unless you specifically need a robots meta tag for a particular page scenario, I would avoid using them on every page. You can achieve the same results with your robots.txt file and it’s easier to manage robot instructions in one location rather than having to edit page by page – avoiding conflicting issues as you have now.

Apart from the obvious issues mentioned above – have you considered switching away from Sitefinity and over to WordPress? I’ve struggled optimizing Sitefinity sites for years – it’s a powerful CMS but it was never built with search engines in mind and always requires clunky hacks to get content optimized. Plus that’s a really out-dated version of Sitefinity.

Given the other issues, it might be time for a total site rebuild?

Best of luck

——————————————————————–

Like to learn SEO with a view to starting your own business? Access your Free SEO Lessons. No catch!

 

Spread the joy!