How a Search Engine Might Identify Possible Query Suggestions

Like the information architects who organize the content on websites, search engine designers should aspire to provide users with scent at every step of their information-seeking process. Techniques like query suggestions, faceted search and results clustering all offer users the opportunity to make progress on their next step, rather than always having to restart the information-seeking process from scratch. Indeed, faceted search is a popular technique for offering users such guidance.

While users are ultimately responsible for expressing their information needs, it is the search engine’s job to act like a reference librarian and help the users in this process.

Reconsidering Relevance and Embracing Interaction
by Daniel Tunkelang

When you search at Google, you may notice some alternative search query suggestions within your results.

Those query suggestions offer related terms that might be helpful when the terms you used might not quite provide the information that you’re looking for. There are a number of different ways that a search engine might locate and identify potential query suggestions. One is by looking at the search engine’s query log files for potential suggestions. The other involves looking at the frequency of terms that show up in documents found in search results, or search result snippets for a particular query.

Search result-based query suggestions are the focus of a Google patent originally published in 2005, and granted to Google this week. It’s quite possible that Google has moved on to other methods of identifying query suggestions, but the processes described within the patent appear to have been influential in later work involving the expansion of queries in search results, classification of web pages, and related processes. A couple of white papers from the inventors of the patent describe the process behind this approach in a great amount of detail:

A check at Google Scholar revealed 150 citations to the “Web-Based Kernel” paper.

In short, the papers describe how a search engine might find it more likely that when someone searches for “AI,” they likely mean “artificial intelligence,” rather than some other term that the abbreviation might be short for. It explains how a search for Steve Ballmer might include a query suggestion for “Microsoft CEO,” while a search for Bill Gates (a former Microsoft CEO) might include a query suggestion for “Microsoft Founder,” rather than “Microsoft CEO.”

It also explains how the terms “NASA” and “Space exploration” might be seen to be more related than “vacation travel” and “space travel,” even though the first two phrases don’t share a single term and the second both include the word “travel.”

The patent provides another example of when a searcher might need a query suggestion:

Language difficulties might cause a person to search using the wrong keywords. A person who lacks familiarity with the language of the content being searched might use the wrong keywords. Even a person who is familiar with the language of the content might make mistakes.

For example, a British citizen who seeks information about temporarily obtaining a car in the United States might search “car for hire” rather than “car for rent.” The latter query more accurately reflects conventional usage in United States English and is likely to produce better search results.

The patent is:

Generating query suggestions using contextual information
Invented by Mehran Sahami and Timothy D. Heilman
Assigned to Google Inc.
us Patent 7,725,485
Granted May 25, 2010
Filed August 1, 2005

Abstract

A search engine receives a query from an end-user. The search engine executes the query on a content database and identifies a set of matching content. The search engine utilizes the matching content to generate a query vector describing the end-user query.

The search engine searches a repository of other vectors, called “centroids,” to produce a ranked set of centroids matching the query vector. These centroids are converted into search queries and form a set of candidate queries. The search engine filters the candidate queries to identify ones that are likely to be meaningful to the end-user. The selected candidate queries are returned to the end-user as query suggestions.

In simple terms, here’s a high level overview of how the process involved in the patent may work:

The search engine:

  1. Receives a query from a searcher
  2. Selects one of more pages in response to the query
  3. Chooses the highest-weighted terms from each of those pages
  4. Identifies the most common terms from those pages
  5. Looks to see if this process has been done before for that particular query, and if so finds previous collections of those most common terms (search results change over time, and the terms collected may differ)
  6. Calculates a degree of similarity between each previously stored collection of terms (if any) and the most recent
  7. Sorts the previous collections of terms to see which matches most closely the newest
  8. Converts terms from the most highly ranked of previous collections of terms into candidate query suggestions
  9. Examines those candidate query suggestions in a ranked order
  10. Adds candidate query suggestions to a set of suggestions if they contain a certain level of new terms that are not included in the original query, and
  11. Provides the set of query suggestions to the searcher, in response to the original query

The process described could use the full documents identified within the search results for the original query, or a summary or short snippet (possibly up to 1,000 words) from those documents.

It’s also possible that terms identified by this method could be used in other ways, such as providing additional query terms that might be used to expand the original search, or by providing terms that might be used to classify web pages to help identify appropriate advertisements for those pages.

When the search engine looks for previous collections of terms (or centroids) that might have been identified in the past in relation to a particular query, it might take those from a few different sources, such as:

  • Queries culled from real-world queries received by the search engine during a given time period,
  • A set of training queries fed to the search engine by an administrator, and/or
  • Hand-coded data.

While I’ve given a rough overview of the processes describe in the patent, it goes into considerably more detail. If you want to dive into it, I highly recommend that you read the two papers I linked to above first which are easier to read and understand, and include a number of other examples.

Other Approaches to Query Suggestions

This Google patent was originally filed in 2006, as were the whitepapers that describe the processes within it. While the search result-based process it uses appears to have been influential given the number of times the first paper linked above was cited on other white papers, there have been a large number of other papers describing other ways to identify possible query suggestions.

Many of the more recent ones look at the query logs of the search engines to see things such as other queries used in the same query sessions from searchers, or pages clicked upon during query sessions, or other search-log based approaches to identifying query suggestions. Here are some of those, including a couple of videos and slide presentations, mostly from 2007 to the present.

Google

Microsoft

Yahoo

  • Mining Broad Latent Query Aspects from Search Sessions (pdf)

Yahoo Query Flow Graphs

Academic

Conclusion

At the start of this post, I included a quote from Daniel Tunkelang, who joined Google some time after publishing the paper the quote was taken from. His suggestion that search engines are beginning to act more like reference librarians than simple indexes of the web is a point that should be given some careful thought.

When you write for the web, and you focus upon specific terms or phrases hoping that someone will search using those terms, you need to keep in mind that the search engines might suggest alternative queries to that searcher. Those suggested query terms are likely mostly generated automatically by the search engines through a process like the one described in the Google patent, or in the papers that I listed at the end of this post.

Those query suggestions may also change over time – if the search engine is using a search results based approach, query suggestions may be based upon a certain number of the top results for that query. If the search engine is using a search log approach, the suggestions may change based upon other terms used by a searcher during the same search session, and/or pages being clicked upon and viewed in search results, or other user behavior-based activity.

It’s also possible that some of the processes used to create query suggestions could also be amended to find additional terms to expand queries with, or to classify pages into different categories to present as search results broken down by category (as Bing sometimes does).

If you are creating pages for a website, and you decide to focus upon specific terms for the pages you develop, you may want to look at the suggestions that the search engines provide in search results for those terms, and keep an eye upon them. They might provide some ideas for changes to those pages, or for additional pages if the suggestions are relevant for what you offer on your web site.

Share

14 thoughts on “How a Search Engine Might Identify Possible Query Suggestions”

  1. It always seems to me that the possible suggestions seem in my opinion to be based more on search numbers and volume more than anything else. I know it is probably a overly simplistic conclusion, but the simplest explanation is sometimes the best?

  2. Hi John,

    Good question. I’m of the opinion that an educated explanation is usually the best. It would be nice if the process was often a simple one as well, but it isn’t always.

    It’s quite possible that the number of searches for possible suggestions are chosen in part based upon looking at search volume. If you were in a library, and you went up to the reference librarian’s desk and asked for help finding information about a particular subject, the librarian might ask you if you wanted to find the best resources on the topic, or the most popular, or might ask you a few more questions to narrow down what you were looking for.

    If Google is acting like a reference librarian when offering suggestions, it might do something similar.

    There are times when suggestions show up at the top of search results, and when I see those, they often provide suggestions that cover different topics that might be associated with my query, some of which may provide results very different from the list of pages that show up in my search results. Those suggestions may give me a diverse range of choices.

    There are also times when suggestions show up at the bottom of search results, and they often appear to be popular alternatives on my query – and more related to the search results that I’m seeing.

    It’s possible that the top of results queries are found using a different approach than the bottom of results queries. How are they different? How did the search engine come up with them?

    Even if a search engine chooses based mostly on search numbers, how does it decide which suggestions to choose amongst, to compare numbers?

    We’re somewhat fortunate that we have some possible explanations of how the search engines might choose amongst those in the whitepapers and patents that people from the search engines have provided that can give us some clues.

    And, some of the methods that are used in providing search suggestions may also be used to expand our queries to give us a wider range of search results. Because of that, it’s probably worth spending some time paying more attention to them.

  3. I do a lot of keyword research with seobook (free) and some other paid keyword tools that I have. I forgot about google’s keyword suggestion tool and actually just went over to Google to see what their suggestions would be for a keyword phrase I am currently writing a blog about.

    Thanks for the reminder. It’s the simple and free tools that are most useful to remember!

  4. This keyword suggestion tool is also present in youtube. Try typing a word and a dropdown appears showing possible keywords that you may be searching for and these are also keywords that most people are using for their search. This helps me a lot on how to properly name my youtube videos.

  5. Hi Heather,

    This post wasn’t so much about the Google Keyword Suggestion tool as it was about some of the queries that Google might suggest along with search results after you’ve performed a search at Google. It’s worth looking at those query suggestions in addition to Google’s keyword suggestion tool, especially when you are doing keyword research.

  6. Hi Andrew,

    youTube also has a Keyword Suggestion Tool in addition to the predictive query suggestions that show up in a dropdown under the search bar.

  7. Hi Bill. Good stuff as usual. Taking a LOT of notice of Google’s query suggestions has always been fundmental for me in keyword research. Just makes sense for creating a link profile with a healthy ratio of related anchor text links.

  8. Hi Steve,

    Thanks. Definitely, those query suggestions can provide some nice hints as to what the search engine and searches might find important, whether you’re viewing them while doing keyword research, getting ideas for anchor text, or even just searching.

  9. Thanks bill nice post. Actually i commented on this post at the first time when i visited your blog first time. But you didn’t approve my comment. May you thought me as spam. But i’m not. Lol. Anyway actually i use googler related suggestion as my keyword for many of my post. Thanks

  10. Hi Alamin,

    There are times when some comments make it into my spam queue or moderation queue and end up not getting approved. It’s possible that I might have thought that your post was spam, and I apologize that it wasn’t published. I do get a lot of comments that don’t seem to add anything to a post or appear to be only ways for people to get links back to their pages without any intent to raise related questions or points, and those might not get published.

    Looking at query suggestions can present additional ideas for keyword research, and are definitely worth paying attention to.

  11. Now that was a lot of information that I have obviously been lacking. I hear so many different ways to do things anymore that I’m being very selective on where I research. This is definitely a site that I trust Bill. Lots of the information that I have learned from your site has really given me the extras that I need. Queries can be tricky to deal with depending on how you want to approach things. They can also be very frustrating, especially when they do not give you the results you are looking for.

  12. Hi Tony,

    Thanks for your kind words about my site, and I’m happy to hear that it has been helpful to you.

    As a searcher and I site owner, I try to pay a lot of attention to how the search engines interpret the queries that I use, and that visitors to my site might use.

    It’s interesting to see what query suggestions the different search engines might offer to searchers as well. When you already know something about what it is that you are searching for, some of those can seem unnecessary. But when you don’t know much about the topic, they can sometimes be very useful.

Comments are closed.