Google on Using a Knowledge Base of Articles to Make Searches Smarter

When a search at a search engine includes a person’s name, or the name of a particular place, or a book, or a band, or an album, there might be some confusion as to which person (or place or thing) is being searched for.

Danny Sullivan, Race Car Driver

Case in point, there’s a well known race car driver by the name of Danny Sullivan. There’s also a well known journalist who writes about the search industry by the name of Danny Sullivan.

Danny Sullivan, Journalist and Technologist

If I were to perform a search on Google for “Danny Sullivan,” might it make sense for the search engine to group results based upon the different Danny Sullivan’s that it finds, and serve me relevant results for both within the top ten search results?

Google has explored ways of handling what it calls “named entities” where there might be more than one specific person, or place, or thing that shares a name, and “disambiguating” (exploring the different senses or meanings behind) those named entities so that it is possible to distinguish between them easily.

A recent paper, Using Encyclopedic Knowledge for Named Entity Disambiguation (pdf), written by Razvan Bunescu and Marius Pasca of Google, explores how a knowledge base of articles, like the pages of the Wikipedia could possibly be used to identify different named entities.

So, in Wikipedia, there is an entry for the driver Danny Sullivan, and another for Danny Sullivan (technologist). Could Google use the fact that there are two different Danny Sullivan’s listed in a knowledge base like Wikipedia to provide smarter search results?

It’s not surprising that other search engines are exploring the same area. I wrote back in March about Microsoft’s exploration of a similar topic in Can Web Search Use Wikipedia to Understand References to Names?

A patent application on this subject from Google, naming the authors of the Google paper as inventors, was published at the end of last week.

It goes into more detail on how helpful a collection of documents such as the pages of the Wikipedia might be to understanding which named entity might be referred to when a query includes a named entity:

Disambiguation of Named Entities
Inventors: Razvan Constantin Bunescu and Alexandru Marius Pasca
US Patent Application 20070233656
Published October 4, 2007
Filed: June 29, 2006

Abstract

Named entities are disambiguated in search queries and other contexts using a disambiguation scoring model. The scoring model is developed using a knowledge base of articles, including articles about named entities. Various aspects of the knowledge base, including article titles, redirect pages, disambiguation pages, hyperlinks, and categories, are used to develop the scoring model.

The patent and paper provide a nice look at a method of ranking and grouping search results that relies less on links and more on extracting and understanding the context of certain concepts (in this case, named entities).

Share

12 thoughts on “Google on Using a Knowledge Base of Articles to Make Searches Smarter”

  1. Interesting idea, but I would have thought that search engines would have more pressing things on their agenda than this sort of thing. If I want to find out about Danny Sullivan the technologist and he is not on page one, I can just go to page two – No big deal. I guess it could give search enignes a more personalised touch though.

    I have noticed that Google is doing something similar to this for general searches. For example. Do a Gooogle for commerce.

  2. With many folks researching approaches to ranking pages, it makes sense for Google to try out a number of different approaches, which is why we see someone trying to solve ranking issues that might be less dependent upon linking and keyword matching.

    The commerce example that you point out isn’t a bad one.

    The patent application uses the example of “Steve Williams,” which is kind of interesting. The disambiguation page at wikipedia shows a number of Steve Williams:

    - Steve Williams (jazz drummer), Shirley Horn’s accompanist and band leader
    - Steve Williams (rock drummer), drummer for heavy metal group Budgie
    - Steve Williams (wrestler), an American professional wrestler
    - Steven James Williams, another American professional wrestler, better known as Stone Cold Steve Austin
    - Steve Williams (rugby player), Wales international rugby union player
    - Steve Williams (blogger), a rider, writer, photographer, and author of the Scooter in the Sticks blog
    - Steve Williams (rower), a British rower
    - Steve Williams (goalkeeper), a current football goalkeeper for Wycombe Wanderers
    - Steve Williams (midfielder), a former football midfielder for Southampton and Arsenal
    - Steve Williams (caddy), a New Zealander who is the caddy for golfer Tiger Woods
    - Steve Williams (musician), a keyboardist for the power metal band Power Quest
    - Steven W. Bailey, who played the character of “Steve Williams” on the reality tv show My Big Fat Obnoxious Fiance
    - Steve Williams (motorcycle racer)

    The search engine knowing that there are this many Steve Williams that are public fugures might influence how Google might group and rank results on a search for Steve Williams.

  3. As we know, this is about grouping. If the user interfaces can reflect the aggregated results, it will be better. Of example, in the Advance search options, I may pick options of grouping:
    People name, industry, wikipedia categories, yellow categories.

    So, when I search Windows, I may the differentiate results in computing fields, or house building fields.

    I saw some startup firms had started similar things to looking up people. However, I think giants like Google and MS have better power and resources to deliver more accurate results.

  4. Hi Andy,

    Thanks for stopping by. Instead of providing options in the Advanced Search area, would it be better if group options were provided to a searcher after the search on the main search results page, so that a searcher could choose from the different groups that might be created from looking at a knowledge base that may be relevant for specific queries?

    I could see something like that offered at part of a Universal Search/Blended search approach.

  5. Pingback: » Pandia Weekend Wrap-up Oct 14
  6. Great post. And a great idea idea from google. I think this will be very useful. Sometimes I’ve had that problem searching names. I think solution is to categorize them according to their fields. Thanks for sharing the information. Cheers !

  7. Hi Henry,

    Thanks. I believe that Google is continuing to work upon using named entities like this. It should be interesting to see how it develops.

Comments are closed.