Q and A: Why is Google having trouble indexing our site?

QuestionHello Kalena

One of the sites we manage has a problem.

The homepage at [URL removed] is not getting indexed anymore by Google. The site was made using Sitefinity 3.7 and the hosting is provided by Rackspace. Something similar already happened two times in the past which we resolved using the option “index this page” on the page generated by Sitefinity (1st time) and by re-creating the XML sitemap and linking it directly to Google Webmaster tools (2nd time).

This time we can’t seem to find the reason. We checked if the end-user that works as the back-end has made any changes or if there was any notification from Google Webmaster Tools reports but nothing came up. Here are some more technical details:

1) The site homepage is [URL removed]. But the site root is [URL removed] which is an empty page with a redirect to the home page using a 301 redirect.

2) In Google Webmaster Tools we set up 2 Sitemaps:

  • The first at [URL removed] is indexing the Top pages of the Home page (static)
  • The second [URL removed] gets populated with the pages content generated by Sitefinity (dynamic)

3) Also, from the back-end options, a metatag ROBOTS was set at page level for the top pages, as Google suggests.

4) Google reports 5 blocked URLs when crawling our robots.txt with the message: “Google tried to crawl these URLs in the last 90 days, but was blocked by robots.txt”. This seems suspicious, because I can’t seem to understand what could be blocking it, the robot is pretty simple and not restrictive.

Could you give us an hand? I’ve left a generous donation for your coffee fund.

Thanks!
Jim

————————————–

Hi Jim

First up, thanks for the caffeine donation :-)

As for your problem, oh boy. You’ve got a few different issues going on, so let me address each of them separately:

1) Your XML sitemaps are missing contextual data specified by the Sitemaps protocol. In particular, your < loc > child entries per URL are messed up. I’m surprised this hasn’t generated an error in Webmaster Tools, but I’m pretty sure it would be confusing Googlebot. Go check your sitemaps against the protocol and re-generate them if necessary. Maybe use one of the XML generator tools recommended by Google. Personally, I like XML Sitemaps (yes that’s my affiliate link).

Also, why 2 separate sitemaps for HTML pages? I can understand having separate ones for RSS feeds or structured data stuff, but your standard site pages should all be listed in the one file so you can better manage the content and keep track of indexing history in Webmaster Tools.

2) Your robots.txt file is blocking a number of pages that you have listed in your XML sitemap. So on the one hand you’re telling Google to index pages within a certain directory, but on the other, you’re telling Google they are not allowed to access that directory. This is what the error message is about. You’ve also got conflicting instructions on some of your pages in terms of robots meta tags vs. robots.txt.

3) The 301 redirect on your root directory is your major problem. In fact, that empty landing page is your major problem. Why do you need it? You don’t use Flash and it doesn’t appear to have an IP sniffer for geo-location purposes so I can’t understand why you wouldn’t just put your home page content at the root level and let search engines index it as expected.

The way you have it set up right now is essentially telling Google that you have moved all your content to a new location, when you really haven’t. It’s adding another step to the indexing process and you are also shooting yourself in the foot as every 301 contributes to some lost PageRank. Google clearly doesn’t like the set up or isn’t processing it for some reason. There also appear to be several hundred 301s in place for other pages, so I’m not sure what that’s about. I don’t have access to your .htaccess file, but I can imagine it reads like a book!

4) Unless you specifically need a robots meta tag for a particular page scenario, I would avoid using them on every page. You can achieve the same results with your robots.txt file and it’s easier to manage robot instructions in one location rather than having to edit page by page – avoiding conflicting issues as you have now.

Apart from the obvious issues mentioned above – have you considered switching away from Sitefinity and over to WordPress? I’ve struggled optimizing Sitefinity sites for years – it’s a powerful CMS but it was never built with search engines in mind and always requires clunky hacks to get content optimized. Plus that’s a really out-dated version of Sitefinity.

Given the other issues, it might be time for a total site rebuild?

Best of luck

——————————————————————–

Like to learn SEO with a view to starting your own business? Access your Free SEO Lessons. No catch!

 

Spread the joy!

Q and A: Why doesn’t Google index my entire sitemap?

QuestionHello Kalena

I’ve submitted my sitemap to Google several times, and it doesn’t spider more than 57 pages even when I add more pages. I can’t figure out why and would really appreciate your help!

My website is [URL withheld]. The sitemap I submit to google is called sitemap.xml. I’m working on the site currently, and I want google to find the changes and new pages.

Thanks!
Greg

————————————–

Hi Greg

I’ve had a look at your sitemap and your site and I’ve worked out the problem. I think you’re going to laugh :-)

Yes, you have created a XML sitemap containing all your site URLs. Yes, you have uploaded it via your Webmaster Tools account. However, the robots.txt file on your site contains disallow rules that contradict your sitemap.

There are over 30 URLs in your robots.txt with a disallow instruction for Googlebot.  Essentially, you are giving Google a list of your pages and then instructing the search giant not to go near them! Have you re-designed your site lately? Maybe your site programmers made the change during a large site edit or testing phase and forgot to remove the URLs after completion?

All you need to do is edit your robots.txt file to remove the URLs being disallowed and then resubmit your XML sitemap.

All the best.

——————————————————————–

Need to learn SEO but not sure where to start? Access your Free SEO Lessons. No catch!

 

Spread the joy!

Search Industry Job of the Week – Marketing Manager SEO

Job Title: Marketing Manager Search Engine Optimization
Job Reference: MAC01914
Position Type: full time
Name of employer: Macy’s Inc
Location: San Francisco
Date Posted: 21 September 2013
Position description:

Macy’s are looking for a hands-on natural search optimization (SEO) professional, with excellent problem solving skills and a basic understanding of SEO issues involving site architecture; keyword generation; search friendly content; link building strategies; and metrics-driven SEO. Strong communication skills are essential in this role, as the Manager will be integral in socializing SEO across the organization.

Essential Functions:

  • Work with the Director of SEO to test, plan and successfully execute on ROI positive SEO initiatives, including but not limited to keyword generation, content creation, URL selection for XML sitemap submissions, internal linking, and external linking.
  • Work closely with engineering, creative, merchants, and product teams to ensure consistency in SEO strategy and implementation across multiple properties.
  • Measure and report on the effectiveness of SEO strategies in generating increased web traffic and organic revenue.
  • Develop, share and implement SEO best practices across multiple families of businesses.
  • Work with external agencies and in-house teams to build quality external links.
  • Stay up to date with industry trends
  • Support merchant team with the development and creation of proposals to secure incremental vendor co-operative marketing dollars.
  • Mine macys.com’s onsite search logs and organic search referrals on a monthly basis to grow keyword portfolio.
  • Provide a weekly analysis of natural search term performance based on traffic, conversion, and sales.
  • Team up with Marketing Analyst to identify trends in behavior of customers referred by natural search.
  • Co-ordinate the design, testing, and production of new reports, as required, to improve conversion and sales from natural search.
  • Publish reports and findings to the organization on a regularly scheduled basis.
  • Regular, dependable attendance and punctuality.

Qualifications:

  • BA/BS in Marketing with a strong understanding of technical SEO or BA/BS in Engineering or similar technical discipline or relevant work experience.
  • 3+ years of experience in online marketing with a successful track record, either in-house or at an agency/consultancy.
  • Understanding of basic HTML, CSS and code structure as it relates to SEO.
  • Knowledge of XML sitemap submissions, internal linking and keyword generation.
  • Track record of successfully implementing SEO strategies with a goal of increasing traffic and revenue.
  • Familiarity with social networking and bookmarking sites.
  • Attention to detail with a strong focus on analytics.
  • Strong project management and inter-departmental coordination skills.
  • Ability to execute on multiple projects while closely measuring the impact of each project and changing course when needed.

Company Profile:

As the fastest growing part of Macy’s Inc. business, macys.com is achieving record sales and broadening their workforce. With offices in New York and San Francisco, macys.com is the best of all worlds. The entrepreneurial thinking of a Web business complements the stability and support of a national brand. Creativity and ingenuity partner with business acumen and tech savvy to build a unique business poised for continued growth. Employees at macys.com have long term opportunities and are encouraged to utilize their Supervisors and Human Resources for cross-functional movement to further their careers. At macys.com they are committed to giving back to the community by partnering with local charitable organizations. By skillfully combining the power of the Internet with the best in retailing, macys.com is reaching new heights.

Salary range: Unknown
Closing date: Unknown
More info from: Macy’s Careers
Contact: Send resumes via online form to: Macy’s Careers

For more search industry jobs, or to post a vacancy, visit Search Engine College Jobs Board.

Spread the joy!

Q and A: Does Ask.com Accept XML Sitemaps?

QuestionHi Kalena

I have uploaded my XML sitemap to Google, Yahoo and more recently Bing, thanks to your blog post about the Bing Webmaster Center.

However, I’m wondering if Ask.com accept XML sitemaps and if so, how do I upload mine to Ask?

thanks
Georgia

————————————–

Hello Georgia

Yes, Ask.com DO support XML Sitemap submissions. Here’s a blurb about it from their Webmaster Help area:

“Yes, Ask.com supports the open-format Sitemaps protocol. Once you have prepared a sitemap for your site, add the sitemap auto-discovery directive to robots.txt, or submit the sitemap file directly to us via the ping URL”

The ping URL is as follows:

http://submissions.ask.com/ping?sitemap=http%3A//www.yoursite.com/sitemap.xml

To add your sitemap to your robots.txt file, simply include this line:

Sitemap: http://www.yoursite.com/sitemap.xml

Actually it’s not just Ask that supports the addition of sitemaps in robots.txt. Did you know that both Google and Yahoo also support that method of sitemap delivery?

You can either submit your sitemap via the search engine’s appropriate submission interface (e.g. Google Webmaster Tools, Yahoo Site Explorer, Bing Webmaster Center) or specify your sitemap location in your robots.txt file as per the above instructions.

Spread the joy!

Q and A: Why doesn’t Google index my entire site?

Question

Dear Kalena…

I have been on the internet since 2006, I re-designed my site and for the past year it still has only indexed 16 pages out of 132.

Why doesn’t google index the entire site? I use a XML site map. I also wanted to know if leaving my old product pages up will harm my ratings. I have the site map setup to only index the new stuff and leave the old alone. I have also got the robots.txt file doing this as well. What should I do?

Jason

Hi Jason

I’ve taken a look at your site and I see a number of red flags:

  • Google hasn’t stored a cache of your home page. That’s weird. But maybe not so weird if you’ve stopped Google indexing your *old* pages.
  • I can’t find your robots.txt file. The location it should be in leads to a 404 page that contains WAY too many links to your product pages. The sheer number of links on that page and the excessive keyword repetition may have tripped a Googlebot filter. Google will be looking for your robots.txt file in the same location that I did.
  • Your XML sitemap doesn’t seem to contain links to all your pages. It should.
  • Your HTML code contains duplicate title tags. Not necessarily a problem for Google, but it’s still extraneous code.

Apart from those things, your comments above worry me. What do you mean by “old product pages”? Is the content still relevant? Do you still sell those products? If the answer is no to both, then remove them or 301 redirect them to replacement pages.

Why have you only set up your sitemap and robots.txt to index your new pages? No wonder Google hasn’t indexed your whole site. Googlebot was probably following links from your older pages and now it can’t. Your old pages contain links to your new ones right? So why would you deliberately sabotage the ability to have your new pages indexed? Assuming I’m understanding your actions correctly, any rankings and traffic you built up with your old pages have likely gone also.

Some general advice to fix the issues:

  • Run your site through the Spider Test to see how search engines index it.
  • Remove indexing restrictions in your robots-txt file and move it to the most logical place.
  • Add all your pages to your XML sitemap and change all the priority tags from 1  (sheesh!).
  • Open a Google Webmaster Tools account and verify your site. You’ll be able to see exactly how many pages of your site Google has indexed and when Googlebot last visited. If Google is having trouble indexing the site, you’ll learn about it and be given advice for how to fix it.
  • You’ve got a serious case of code bloat on your home page. The more code you have, the more potential indexing problems you risk. Shift all that excess layout code to a CSS file for Pete’s sake.
  • The number of outgoing links on your home page is extraordinary. Even Google says don’t put more than 100 links on a single page. You might want to heed that advice.
Spread the joy!