How Google’s Algorithm Rules the Web

Interesting reading about Google’s search algorithm team written by Steven Levy, though as of this moment, this particular search is not as described:

Even the Bingers confess that, when it comes to the simple task of taking a search term and returning relevant results, Google is still miles ahead. But they also think that if they can come up with a few areas where Bing excels, people will get used to tapping a different search engine for some kinds of queries. “The algorithm is extremely important in search, but it’s not the only thing,” says Brian MacDonald, Microsoft’s VP of core search. “You buy a car for reasons beyond just the engine.”

Google’s response can be summed up in four words: mike siwek lawyer mi.

Amit Singhal types that koan into his company’s search box. Singhal, a gentle man in his forties, is a Google Fellow, an honorific bestowed upon him four years ago to reward his rewrite of the search engine in 2001. He jabs the Enter key. In a time span best measured in a hummingbird’s wing-flaps, a page of links appears. The top result connects to a listing for an attorney named Michael Siwek in Grand Rapids, Michigan. It’s a fairly innocuous search — the kind that Google’s servers handle billions of times a day — but it is deceptively complicated. Type those same words into Bing, for instance, and the first result is a page about the NFL draft that includes safety Lawyer Milloy. Several pages into the results, there’s no direct referral to Siwek.

The comparison demonstrates the power, even intelligence, of Google’s algorithm, honed over countless iterations. It possesses the seemingly magical ability to interpret searchers’ requests — no matter how awkward or misspelled. Google refers to that ability as search quality, and for years the company has closely guarded the process by which it delivers such accurate results. But now I am sitting with Singhal in the search giant’s Building 43, where the core search team works, because Google has offered to give me an unprecedented look at just how it attains search quality. The subtext is clear: You may think the algorithm is little more than an engine, but wait until you get under the hood and see what this baby can really do.

[From Exclusive: How Google’s Algorithm Rules the Web | Magazine]

Probably because Bing has now indexed the Wired article and various links to it, if you search Bing for “mike siwek lawyer mi“, you currently do get relevant results, well, results of lawyers in Grand Rapids anyway1. Google still gives a better search result most of the time, I haven’t switched to using a different search engine.


Bing search results (click to embiggen, or do the search yourself)


Google search results (click to embiggen, or do the search yourself)

Quite interesting article though, worth reading more

For instance:

The search engine currently uses more than 200 signals to help rank its results.

Google’s engineers have discovered that some of the most important signals can come from Google itself. PageRank has been celebrated as instituting a measure of populism into search engines: the democracy of millions of people deciding what to link to on the Web. But Singhal notes that the engineers in Building 43 are exploiting another democracy — the hundreds of millions who search on Google. The data people generate when they search — what results they click on, what words they replace in the query when they’re unsatisfied, how their queries match with their physical locations — turns out to be an invaluable resource in discovering new signals and improving the relevance of results. The most direct example of this process is what Google calls personalized search — an opt-in feature that uses someone’s search history and location as signals to determine what kind of results they’ll find useful. (This applies only to those who sign into Google before they search.) But more generally, Google has used its huge mass of collected data to bolster its algorithm with an amazingly deep knowledge base that helps interpret the complex intent of cryptic queries.

Take, for instance, the way Google’s engine learns which words are synonyms. “We discovered a nifty thing very early on,” Singhal says. “People change words in their queries. So someone would say, ‘pictures of dogs,’ and then they’d say, ‘pictures of puppies.’ So that told us that maybe ‘dogs’ and ‘puppies’ were interchangeable. We also learned that when you boil water, it’s hot water. We were relearning semantics from humans, and that was a great advance.”

But there were obstacles. Google’s synonym system understood that a dog was similar to a puppy and that boiling water was hot. But it also concluded that a hot dog was the same as a boiling puppy. The problem was fixed in late 2002 by a breakthrough based on philosopher Ludwig Wittgenstein’s theories about how words are defined by context. As Google crawled and archived billions of documents and Web pages, it analyzed what words were close to each other. “Hot dog” would be found in searches that also contained “bread” and “mustard” and “baseball games” — not poached pooches. That helped the algorithm understand what “hot dog” — and millions of other terms — meant. “Today, if you type ‘Gandhi bio,’ we know that bio means biography,” Singhal says. “And if you type ‘bio warfare,’ it means biological.”

Footnotes:
  1. update, I notice that the Bing results are actually not as precise as Google’s results []

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.