How Search Engines May Try to Match Searchers’ Intents from Analysis of Search Engine Query Logs
When you type a search query term into a search box at Google or Yahoo or Live.com, the search engines might go through their indexes, and try to find the most relevant and important pages in their databases for the word or phrase that you want to find out more about.
But those search engines might try to improve the results that they show to you by trying to understand the intent behind a search rather than just looking for pages that match keywords that you typed as a search query.
Search Engines and Searcher Intent
What do the search engines themselves reveal about the importance of considering the intent behind a search?
Google tells us on one of their search help pages that:
Search is rarely absolute. Search engines use a variety of techniques to imitate how people think and to approximate their behavior. As a result, most rules have exceptions. For example, the query [ for better or for worse ] will not be interpreted by Google as an OR query, but as a phrase that matches a (very popular) comic strip. Google will show calculator results for the query [ 34 * 87 ] rather than use the ‘Fill in the blanks’ operator. Both cases follow the obvious intent of the query.
Yahoo also mentions that they try to consider the intent behind a search on a help page of theirs titled How are web documents ranked:
Search engines don’t have the ability to ask questions, so they rely on the search terms you enter to interpret and determine the intent of your search.
Microsoft’s Satya Nadella, senior vice president of their search, portal and advertising platform group also described how user intent plays a role in the results live.com shows to searchers in a presentation from August of 2008:
I believe this notion of understanding user intent–being able to analyze (search queries) and come up with search patterns and use them to shape the search experience–is one of the most important areas for us.
One set of intentions behind searches were identified in a paper from 2002, A taxonomy of web search, which broke searches into three types; informational, transactional, and navigational.
Informational searches are conducted by a searcher to fill some kind of informational need that they might have. Transactional searches are made to help a searcher conduct a task on the web. Navigational searches are intended to help a searcher find a specific page.
Identifying the intent behind a search can be a difficult task, as Google Researcher Dan Russell has noted in a number of presentations, including one at The San Francisco Bay Area Chapter of ACM SIGCHI in 2006, where he described some of the approaches that Google takes to learn about intentions behind searches. One of those approaches involves looking at the log files of the search engines, and seeing how people refine the searches that they perform during search sessions.
A couple of newly published patent applications from Yahoo describes how that search engine might look at log files to identify whether some queries evidence an intent that might not be easily seen from just looking at the search query itself.
Explicit and Implicit Searcher Intent
The intent behind some searches may be easier for a search engine to interpret than others, and might be considered to have an explicit intent behind them.
For example, if you’re searching for a new pair of shoes, or a camcorder, you might include some words in your search that tell a search engine about that intention, such as “cheap sneakers” or “buy a camcorder.” A search engine might see the use of the words “cheap” or “buy” as an explicit indication that you want to make a purchase online.
If your search includes the word “reviews,” it may signal to the search engine that you want informational pages about specific kinds of products or services. A search for a place type and a geographic location might indicate to a search engine that you are conducting a local search, and you may be shown local search results when you type something like “San Jose Library” into a search box.
Other searches may not have such a clear intent. The patent applications from Yahoo provide an example of a search for the term [olympics]. The best results to show a searcher might involve showing results from the Olympics from a specific year, even though the search didn’t include a year.
The search engine might look through its log of search queries from previous searchers, and see that many of the search queries that people used to find out information about the Olympics included a year within the query, such as:
- The 2004 Summer Olympics in Athens,
- The 2006 Winter Olympics in Turin,
- The 2008 Summer Olympics in China,
- The 2010 Winter Olympics in Vancouver, or;
- The 2012 Summer Olympics in London.
Many searches are “time sensitive,” and mining search engine query logs to see a pattern like this might help a search engine understand the intent behind a search, and influence which search results are shown to searchers. It’s possible that a search engine might boost rankings for pages that might show the most popular intents, or that they might rerank search results to show a broad range of intents behind a query that has an implicit intent behind it.
For example, if most of the people searching for the word “Olympics” tend to click on pages for the 2010 Olympics or refine their search query to include the year 2010, then the search engine might start boosting search results that are relevant for the 2010 Olympics.
An alternative approach might be to look at those search engine query logs, and see the percentages of people who click on results for Olympics associated with specific years or refine their search results for a specific year, and show a diverse mix of search results for each of the years.
So, if 50 percent of searchers looking for “Olympics” seem to be looking for the 2010 Olympics, and 30 percent appear to want to find out about the 2012 Olympics, and the remainder of searches for the term “olympics” don’t seem to have a specific date attached to them, then the top ten (or top 100, or some other number) of search results might be half filled with results about the 2010 Olympics, contain results about the 2012 Olympics for almost another third, and have more general pages about the Olympics, without necessarily having years attached to them.
The patent filings are:
Extracting Query Intent from Query Logs
Invented by Priyank S. Garg, Kostas Tsioutsiouliklis, Bruce T. Smith, and Timothy M. Converse
US Patent Application 20090043749
Published February 12, 2009
Filed August 6, 2007
Techniques are provided for storing queries received by a search engine are in a query log.
For a particular query term in the query, it is determined how many queries in the query log contain that particular query term and an intent-indicating term, and determined how many queries in the query log contain that particular query term without an intent-indicating term.
Based on the ratio between the number of queries in the query log that contain the particular query term and the intent-indicating term and the number of queries in the query log that contain the particular query term without the intent-indicating term, it is determined whether the particular query term is an intent-qualified query term.
In response to determining that the particular query term is an intent-qualified query term, data is stored in a computer-readable medium that identifies the query term as an intent-qualified query term.
Implicit-intent queries that contain the intent-qualified query term are processed based, at least in part, on the intent associated with the intent-qualified query term.
Estimating the Date Relevance of a Query from Query Logs
Invented by Farzin Maghoul and Kostas Tsioutsiouliklis
US Patent Application 20090043748
Published February 12, 2009
Filed: August 6, 2007
Techniques are provided maintaining data that indicates for a plurality of query terms whether the plurality of query terms are date-qualified query terms.
A query is received, and in response to receiving the query, the query is inspected to determine that the query contains a particular date-qualified query term.
Then it is determined that the particular date-qualified query term has been associated with a plurality of dates, and it is determined which of the plurality of dates with which to associate the date-qualified query term for the query, based at least in part on the frequency with which each particular date of the plurality of dates has been associated with the particular date-qualified query term.
The patent filings focus upon providing examples of identifying time sensitive and date sensitive search intents, but the methods that they describe can be used to find other implicit intents behind queries.
We don’t know if or how much of the methods behind these two patent applications have been incorporated in Yahoo’s search results, but we do see that the intent behind some query terms in a search can influence the types of results that we receive at the major search engines, such as navigational queries showing a top result with sitelinks, or a local type query showing one box or ten box map results.
If you perform a search at Google or Yahoo or Live.com, chances are that they will be considering the intention behind your search, and may show you results that are influenced by what the search engine believes the intent behind your query might have been.
One place that a search engine might look at is in their query log files to see if they can glean an implicit intent behind your search terms by seeing which results previous searchers might have chosen as search results, or looking at how searchers might have rewritten, or refined their search queries.
If you search for “Olympics” without including a year, the search results you see might focus upon the 2010 and 2012 Olympics, since it appears to be a time sensitive query.
If you’re a site owner or working with one, and you are performing keyword research on specific search phrases for the pages of a site, it’s also important to keep in mind that the search engines might be considering more than how many times a search phrase shows up on a page in title elements, or headings, or text, or in anchor text pointing to those pages or the PageRank (or link popularity or page quality of pages) when it returns results to searchers.
The search engines might be attempting to understand the intentions behind the search phrase, to show searchers the pages that they believe will match those intents.