How a Search Engine Might Interpret Ambiguous Queries through Entity Tags

When someone performs a search at a search engine they tend to use only a handful or less words to try to find information about a topic. That presents a search engine with the challenge of trying to find web pages and other results in response and attempting to understand the intent behind that search.

If someone enters “new york pizza sunnyvale” (without the quotation marks) into a search box at Google or Yahoo or Bing, it’s not quite clear whether they are looking for: (1) pizza in New York, in a neighborhood or area referred to as Sunnyvale, (2) New York style pizza in a place called Sunnyvale, (3) a place called “New York Pizza,” in Sunnyvale, or (4) some other result.

One approach that could be followed to try to understand the intent behind a query like this is to break down the words in the query into entity types, and apply labels to those entities. With the “new york pizza sunnyvale” example, that could be done a few ways:

[new york pizza]/food [sunnyvale]/location
[new york pizza]/business [sunnyvale]/location
[new york]/location [pizza]/food [sunnyvale]/location

This kind of attempt to disambiguate, or find the meanings or senses behind words and phrases, used in a query could be helpful in finding results that might better match what a searcher may be looking for.

When I perform the search “new york pizza sunnyvale” in Google, the top result is Giovanni’s New York Pizzeria in Sunnyvale, California. At Yahoo, my top result is a place called New York Pizza in Mountain View, California. A search at Bing gives me a top result showing a directory of pizza places in Sunnyvale that serve New York style pizza. Most of the other top ten results at all three search engines are about pizza in California, rather than results about pizza in New York.

If a search engine were to try to break a query down into entities, and apply labels to them, it would then have to try to choose between the best of those disambiguation attempts to decide which might be closest to the intent of a searcher. It could potentially identify entities by creating a confidence score for each of the possible interpretations, based upon information found in online dictionaries or encyclopedias, web pages, and other kinds of documents found online.

The tags assigned to different entities found within queries could cover a wide range of labels, such as:

  • Product names,
  • Locations,
  • Persons names,
  • Organizations,
  • Media,
  • Events,
  • etc.

This kind of query interpretation system could be created from training data, that might be collected from human judges to train a model that would score interpretations of queries.

A Yahoo patent application published last week explores how such a system might be used:

Search Query Disambiguation
Inventors: Gilad Mishne, Raymond Stata, and Fuchun Peng
US Patent Application 20100205198
Published August 12, 2010
Filed: February 6, 2009

Abstract

Disclosed herein is a system and method of query disambiguation. At least one model is generated using training data, which model can be used to score, or rank, possible interpretations identified for a query, which can be used to select an interpretation from a number of possible interpretations.

A selected interpretation can be used to process a web search request, e.g., to generate search results that relate to the selected query interpretation, rank or order the items in the search result based on relevance to the selected query interpretation, and/or identify a presentation to be used to display the search results based on the selected query interpretation.

Conclusions

The patent filing goes into a fair amount of detail about how a system like this might be used, but the basic concept that entities might be identified from those query terms, and labeled is at the heart of the approach.

For some queries, more than one interpretation may be identified with a certain level of confidence, and search results might contain pages covering those interpretations.

In addition to helping decide which web pages to return in search results, query interpretations might sometimes trigger specialized results, such as a local search map result, or certain kinds of advertisements.

The patent filing also branches off to explore how numeric terms might be interpreted when found in queries, and provides a large number of examples. For instance, “Godfather 3″ might be interpreted to be equivalent to “GodFather III,” but “firefox 3″ might not be seen to be equivalent to “firefox III.”

Share

11 thoughts on “How a Search Engine Might Interpret Ambiguous Queries through Entity Tags”

  1. It is always nice to know that search engines are always doing their best to be able to show the best possible answers to different queries from searchers. And I was just wondering, what if it’s the other way around and we’ll try to educate people to search properly? Well, I guess with all the million people searching the net, that would be hard.

  2. Wow excellent article. Finding the right information when searching on search engines can be a pain for sure. Using these tags is a great way to categorize what I am looking for in order to get more relevant response, a great idea…but I guess it also begs the question if search engines started to do this..how many people would actually use them, since we are all lazy by nature and would tend to continue to type in new york pizza even without the quotes.

  3. Hi Andrew,

    I’m not sure if there is any particular “best way to search.” The biggest stumbling block might be when people search for information on a topic that they don’t know much about, and aren’t very sure of what words to use in their queries, but there are others as well.

    It’s also possible that some people might see too much quesswork from a search engine as to the intent behind a search as a negative.

  4. I bet Bill is on to something huge! Search engines could be using this data right now (albeit for testing small data samples). They could probably get away with using it discretely for a while. And since the “entity data” is user maintained, it would be difficult for black-hat SEOs to spam or manipulate the data :-)

  5. Hi David,

    Thanks. There’s no need for us to use the tags shown in my example – that’s something that the search engine itself is trying to do on its own. Google did come out with a patent a while back which had searchers add labels to queries they might use, but this approach has the search engine trying to decide upon the proper labels itself.

  6. Pingback: Ambiguous Search Engine Queries, Programmable Search Engines and a Patent :: Prodigal Webmaster

Comments are closed.