I have been on the internet since 2006, I re-designed my site and for the past year it still has only indexed 16 pages out of 132.
Why doesn’t google index the entire site? I use a XML site map. I also wanted to know if leaving my old product pages up will harm my ratings. I have the site map setup to only index the new stuff and leave the old alone. I have also got the robots.txt file doing this as well. What should I do?
I’ve taken a look at your site and I see a number of red flags:
- Google hasn’t stored a cache of your home page. That’s weird. But maybe not so weird if you’ve stopped Google indexing your *old* pages.
- I can’t find your robots.txt file. The location it should be in leads to a 404 page that contains WAY too many links to your product pages. The sheer number of links on that page and the excessive keyword repetition may have tripped a Googlebot filter. Google will be looking for your robots.txt file in the same location that I did.
- Your XML sitemap doesn’t seem to contain links to all your pages. It should.
- Your HTML code contains duplicate title tags. Not necessarily a problem for Google, but it’s still extraneous code.
Apart from those things, your comments above worry me. What do you mean by “old product pages”? Is the content still relevant? Do you still sell those products? If the answer is no to both, then remove them or 301 redirect them to replacement pages.
Why have you only set up your sitemap and robots.txt to index your new pages? No wonder Google hasn’t indexed your whole site. Googlebot was probably following links from your older pages and now it can’t. Your old pages contain links to your new ones right? So why would you deliberately sabotage the ability to have your new pages indexed? Assuming I’m understanding your actions correctly, any rankings and traffic you built up with your old pages have likely gone also.
Some general advice to fix the issues:
- Run your site through the Spider Test to see how search engines index it.
- Remove indexing restrictions in your robots-txt file and move it to the most logical place.
- Add all your pages to your XML sitemap and change all the priority tags from 1 (sheesh!).
- Open a Google Webmaster Tools account and verify your site. You’ll be able to see exactly how many pages of your site Google has indexed and when Googlebot last visited. If Google is having trouble indexing the site, you’ll learn about it and be given advice for how to fix it.
- You’ve got a serious case of code bloat on your home page. The more code you have, the more potential indexing problems you risk. Shift all that excess layout code to a CSS file for Pete’s sake.
- The number of outgoing links on your home page is extraordinary. Even Google says don’t put more than 100 links on a single page. You might want to heed that advice.
[…] read “Q and A: Why Doesn’t Google Index My Entire Site?. After two days of meetings at a company working to generate Web traffic, this question was apropos […]
i think it depends on the content of your site.. if the content of your site is up to the mark then only the crawler will crawl your site otherwise it may take some time or may be more time…
Thank you Kalena for the informative answer to Jason’s problem. I had the same question myself regarding my website. I recently generated an XML sitemap fo rmy site and the total number of pages came to 629 and I went and submitted it to Google and I see that Google only indexed 365 pages out of 629. And I did my research and I believe I am missing my Robots.txt file too so I am going to address this and see if the problem persist. By reading your reply It opened up my eyes to knew areas I need to investigate. Thank you and I will keep you posted.
PS : If you do check out my site please let me know if you find other problems.
I am working on a site that is month old. http://codingraptor.com
The content is unique and I believe provides value to the reader.
Google, yahoo and bing all show only the first page of CodingRaptor.com.
I have checked on the Google webmasters tool, submitted sitemap multiple times and have submitted to google index after doing ‘Fetch as Google’.
Still no luck. Search engines are not going beyond the first page.
What am I missing?
The client is now at a boiling point.So it’s urgent.