Q and A: Why is Google having trouble indexing our site?

QuestionHello Kalena

One of the sites we manage has a problem.

The homepage at [URL removed] is not getting indexed anymore by Google. The site was made using Sitefinity 3.7 and the hosting is provided by Rackspace. Something similar already happened two times in the past which we resolved using the option “index this page” on the page generated by Sitefinity (1st time) and by re-creating the XML sitemap and linking it directly to Google Webmaster tools (2nd time).

This time we can’t seem to find the reason. We checked if the end-user that works as the back-end has made any changes or if there was any notification from Google Webmaster Tools reports but nothing came up. Here are some more technical details:

1) The site homepage is [URL removed]. But the site root is [URL removed] which is an empty page with a redirect to the home page using a 301 redirect.

2) In Google Webmaster Tools we set up 2 Sitemaps:

  • The first at [URL removed] is indexing the Top pages of the Home page (static)
  • The second [URL removed] gets populated with the pages content generated by Sitefinity (dynamic)

3) Also, from the back-end options, a metatag ROBOTS was set at page level for the top pages, as Google suggests.

4) Google reports 5 blocked URLs when crawling our robots.txt with the message: “Google tried to crawl these URLs in the last 90 days, but was blocked by robots.txt”. This seems suspicious, because I can’t seem to understand what could be blocking it, the robot is pretty simple and not restrictive.

Could you give us an hand? I’ve left a generous donation for your coffee fund.

Thanks!
Jim

————————————–

Hi Jim

First up, thanks for the caffeine donation 🙂

As for your problem, oh boy. You’ve got a few different issues going on, so let me address each of them separately:

1) Your XML sitemaps are missing contextual data specified by the Sitemaps protocol. In particular, your < loc > child entries per URL are messed up. I’m surprised this hasn’t generated an error in Webmaster Tools, but I’m pretty sure it would be confusing Googlebot. Go check your sitemaps against the protocol and re-generate them if necessary. Maybe use one of the XML generator tools recommended by Google. Personally, I like XML Sitemaps (yes that’s my affiliate link).

Also, why 2 separate sitemaps for HTML pages? I can understand having separate ones for RSS feeds or structured data stuff, but your standard site pages should all be listed in the one file so you can better manage the content and keep track of indexing history in Webmaster Tools.

2) Your robots.txt file is blocking a number of pages that you have listed in your XML sitemap. So on the one hand you’re telling Google to index pages within a certain directory, but on the other, you’re telling Google they are not allowed to access that directory. This is what the error message is about. You’ve also got conflicting instructions on some of your pages in terms of robots meta tags vs. robots.txt.

3) The 301 redirect on your root directory is your major problem. In fact, that empty landing page is your major problem. Why do you need it? You don’t use Flash and it doesn’t appear to have an IP sniffer for geo-location purposes so I can’t understand why you wouldn’t just put your home page content at the root level and let search engines index it as expected.

The way you have it set up right now is essentially telling Google that you have moved all your content to a new location, when you really haven’t. It’s adding another step to the indexing process and you are also shooting yourself in the foot as every 301 contributes to some lost PageRank. Google clearly doesn’t like the set up or isn’t processing it for some reason. There also appear to be several hundred 301s in place for other pages, so I’m not sure what that’s about. I don’t have access to your .htaccess file, but I can imagine it reads like a book!

4) Unless you specifically need a robots meta tag for a particular page scenario, I would avoid using them on every page. You can achieve the same results with your robots.txt file and it’s easier to manage robot instructions in one location rather than having to edit page by page – avoiding conflicting issues as you have now.

Apart from the obvious issues mentioned above – have you considered switching away from Sitefinity and over to WordPress? I’ve struggled optimizing Sitefinity sites for years – it’s a powerful CMS but it was never built with search engines in mind and always requires clunky hacks to get content optimized. Plus that’s a really out-dated version of Sitefinity.

Given the other issues, it might be time for a total site rebuild?

Best of luck

——————————————————————–

Like to learn SEO with a view to starting your own business? Access your Free SEO Lessons. No catch!

 

The 2015 Periodic Table of SEO Success Factors

Table of SEO Success FactorsEarlier this month, the team over at Search Engine Land updated their brilliant Periodic Table of SEO Success Factors.

Now in it’s 3rd edition, the table is a fantastic SEO resource and one of the few items on my Ubuntu desktop that gets regular eyeball attention. Content is divided between on-page and off-page factors and clearly color-coded to make it visually intuitive, with relevancy weight ranging from -3 to +3.

The new edition references new factors of SEO importance including vertical search, Direct Answers and HTTPS, with mobile friendliness and structured data acquiring a relevancy weight increase in line with recent Google updates.

The idea behind the table is to highlight tasks within the SEO process and to act as a visual reminder about what is most important and what areas to focus on for clients.

Danny Sullivan describes the goal and philosophy of the table:

“Our goal with the Periodic Table Of SEO is to help publishers focus on the fundamentals needed to achieve success with search engine optimization. This means it’s not about trying to list all 200 Google ranking factors or detail Google’s 10,000 sub-factors. It’s not about trying to advise if keywords you want to rank for should go at the beginning of an HTML title tag or the end. It’s not about whether or not Facebook Likes are counted for ranking boosts.

Instead, the table is designed to broadly guide those new to or experienced with SEO into general areas of importance. Title tags are generally important. Think about making sure they’re descriptive. Social sharing is often generally seen as good for SEO. Aim for social shares, without worrying about the specific network.”

While not exactly a cheat-sheet, my SEO students at Search Engine College tell me it is their favorite resource for assignment preparation, so that’s a pretty good endorsement.

The Table can be downloaded as a PDF in large or condensed format, or you can grab the code to embed the infographic directly into your web site.

Revised link building course now available at Search Engine College

SEC-laptop-2014-colourAfter several months of revision, I am pleased to announce that we have just re-launched our Link Building 101 course at Search Engine College.

The course content has been completely updated with new material and videos to reflect Google’s revised stance on acceptable link building tactics. This 10 lesson course also takes into account the impact of recent tweaks to Google’s Panda, Penguin and Hummingbird algorithms.

The course is now available to all students with a paid subscription. We have also added new assessment items including review quizzes, 4 tutor-graded assignments and a final exam available exclusively to subscribers who want to upgrade their subscription to Certification.

Just as a reminder, Certification is for current subscribers who want to benefit from tutor supervision, complete set assessments to reach our knowledge benchmark and receive formal, industry-recognized certification following completion. You can upgrade your subscription to Certification at any point for any course.

Hope to see some of you in our new link building class.