How Google Applies Science to Search

Craig Nevill-ManningSiteProNews have now published my 2 part article based on the Webstock 2008 presentation by Google’s Senior Research Scientist, Dr. Craig Nevill-Manning.

Here’s Part 1 and Part 2.

In his presentation, Craig, who is New Zealand born and bred, explained how Google uses science to develop more precise search techniques. I found his talk absolutely riveting and typed frantically during the whole thing in my hurry to blog it.

Here are a couple of classic excerpts:

Google used to do a terrible job of defining terms. Craig noticed people were searching for “definition of…”, or “what is a….” etc so he wanted the search engine to provide better results for these searches. He found lots of web pages that contained glossaries and definitions, so he hacked up a Perl script to get the glossary formats.

The first recall results were only 50 percent accurate. He wanted to improve this rate, so he did some experiments with the data. But he could never reach an accuracy level he was happy with. It was later he realized that most of the questions people actually needed answers to could be answered with his crappy little Perl script. He concluded that 100 percent accuracy is not important, that scale is much more important.

Craig says that once a week, a person at each data center has a list of all the failed hard disks and walks around the datacenter with a pile of hard drives, replacing them one at a time. Velcro is Google’s secret weapon! All Google’s hard disks are velcroed in. This allows super quick service and replacement time. So curiously, there is no downside to hardware failures at Google, because they are expected and managed via scale.

Fascinating stuff!

Spread the joy!