Predictive Search Query Suggestions

When you start typing a query into a search box at many search engines, you may see a dropdown appear under the search box which offers selectable suggestions for query terms even before you may have finished typing. The suggestions may also provide alternative URLs for web pages if you are typing the address of a web page into the search box.

We’ve seen a few patent filings in the past that describe this kind of behavior, but they haven’t gone into a lot of detail about how those specific suggestions might have been chosen.

A patent application published by Google this week gives us a little more insight into the search suggestions that it offers. Interestingly, it’s possible that the query suggestions that I see might be different than the ones that you may be offered, based upon things such as whether or not either of us:

  • Is using a mobile device to connect to the search engine or a desktop computer
  • Might be identifiable as a member of a group profile interested in certain topics or categories of sites
  • Has a search history that the search engine can use to bias those suggestions towards something we are interested in
  • Are viewing a specific page which has a specific profile attached to it, and are using a search toolbar for our search
  • May be connecting to the Web at different connection speeds, or are using different connection types
  • Could have set our browsing preferences differently in our browser or through the search engine for things such as preferred language
  • Others

The patent filing also describes filters that might keep certain terms and phrases from showing up in search suggestions. More on those filters below.

Predictive Search Suggestion Interfaces

Predictive search suggestions have become pretty popular, and they tend to look pretty similar from one search engine to another. Even though they may look similar, it’s possible that the way each search engine comes up with suggestions may vary drastically. Regardless of that, I though it would be interesting to take a look at how a number of search engines present their suggestions, and see if they provided any information about those suggestions on their pages.

Google:

Google describes their query suggestion approach on one of their help pages titled Features: Google Suggest. At one point before query suggestions were integrated into Google’s Web search, Google had a separate page in their experimental labs called “Google Suggest” where you could receive query suggestions. While that page is no longer available, the Google Suggest FAQ still exists.

an example of Google predictive queries

Yahoo Search Assist

Yahoo’s query suggestions have a slightly different look and feel, in a scrollable box that opens below their search box, and they are known as Yahoo Search Assist.

an example of Yahoo Search Assist predictive queries

Microsoft Live Search Suggestions

Microsoft Live calls their predictive query suggestions Search Suggestions

an example of Microsoft Live Search Suggestion predictive queries

Ask.com

While I found a patent application from Ask.com on search suggestions, it mainly described an interface for suggestions without much detail on how those suggestions where derived. It also didn’t look much like the query suggestions offered today on Ask.com. There isn’t much else on the ask.com site about their predictive query suggestion approach.

an example of ask.com predictive queries

Cuil.com

On Cuil’s Features page (no longer available) is a subtle dig at Google in their description of their Search query suggestions, where they tell us:

When you type a query, sometimes you’ll see a search suggestion with an icon representing a website. Click on this link and you will go directly to that website. We let you look before you leap, because not everyone feels lucky.

Presumably, the mention of the word “lucky” refers to Google’s “I’m Feeling Lucky” button on the front page of that search engine, which normally brings you directly to the first result in the search results for a query typed into Google’s search box. Here’s what Cuil’s search suggestions look like:

an example of Cuil predictive queries

Patent Filings for Predictive Query Suggestions

There have been a number of papers and patent filings involving predictve query suggestions from the major commercial search engines. I’ve written about a few of them in the past. If you’d like to see those posts, they are available here:

The latest patent filing that I’ve seen on predictive query suggestions was published this week from Google:

Method and System for Autocompletion Using Ranked Results
Invented by Kevin A. Gibbs, Sepandar D. Kamvar, Taher H. Haveliwala, and Glen M. Jeh
Assigned to Google
US Patent Application 20090119289
Published May 7, 2009
Filed December 29, 2008

Abstract

A set of ordered predicted completion strings are presented to a user as the user enters text in a text entry box (e.g., a browser or a toolbar). The predicted completion strings can be in the form of URLs or query strings. The ordering may be based on any number of factors (e.g., a query’s frequency of submission from a community of users). URLs can be ranked based on an importance value of the URL. Privacy is taken into account in a number of ways, such as using a previously submitted query only when more than a certain number of unique requesters have made the query.

The sets of ordered predicted completion strings is obtained by matching a fingerprint value of the user’s entry string to a fingerprint to table map which contains the set of ordered predicted completion strings.

Where this differs most from some of the previous patent filings on the topic is telling us that the query suggestions shown for one searcher may differ from the query suggestions shown for other searchers based upon a number of different possible signals.

While one method of ranking and displaying specific query suggestions may depend upon how frequently queries shown as suggestions may have been submitted to the search engine in the past, there are other factors that can influence which suggestions are shown to whom. I started this post with a list of some of those signals.

User personalization information may play a role in determining which query suggestions you might see as you search. The patent filing tells us:

For instance, user personalization information may include information about subjects, concepts or categories of information that are of interest to the user. The user personalization information may be provided directly by the user, or may be inferred with the user’s permission from the user’s prior search or browsing activities, or may be based at least in part on information about a group associated with the user or to which the user belongs (e.g., as a member, or as an employee).

It’s also possible that the predictive queries shown to a searcher may be influenced by search queries that are stored locally on your computing device. So, if you’ve searched for a topic before, and your query search history may contain some queries that might part of your search history, those can be offered to you as well as new suggestions which might be taken from the search engine’s cache of previous queries, or from a database of queries if the cache doesn’t contain many suggestions.

FingerPrints and Search Query Suggestions

The search queries that may be suggested for your search can be based upon a “fingerprint” associated with that search. Each query (or partial query as you type) can have a number of different fingerprints associated with it based upon a number of different factors, such as:

  • Profile information provided by the user, including things like location
  • Information taken from the request itself, such as language
  • Information associated with the user based upon user behavior signals such as previous searches during a search session
  • Device-type – a handheld might receive fewer predictive queries due to their smaller screen size
  • Connection-speed
  • Connection type
  • Importance Factors Associated with Query Terms – query terms having lower importance factors could be removed from the predictions before terms having higher importance factors
  • Categories Associated with Users – different sets of fingerprint-to-table maps might be used for respective categories of users, where those categories or topics are associated with the user
  • Historic Queries Associated with web sites – a partial search query received from a particular website (perhaps through a toolbar search) might be mapped to predicted results generated from historical queries received from the same website, or from a group of websites that might be seen to be similar to that particular website
  • Misspellings – if a query being type in could be considered to be a “conspicuously misspelled word,” predictive queries for the correctly spelled word may be merged with the predicted results
  • Concepts extraction – the terms in the query might be analyzed to extract concepts from the search terms indicating a particular category of information, such as “technology, “food”, “music” or “animals.”
  • Community Membership – queries from searchers sharing at least one similar characteristic such as: “belonging to the same workgroup, using the same language, having an internet address associated with the same country or geographic region, or; the like.”

Filters

Some query suggestions may not appear in the dropdown box because of filters that keep them from showing up. There are a number of different types of filters that might be involved, such as:

A Privacy Filter – Since the number of queries that have been received by the search engine is one of the signals looked at to decide whether a term or phrase should show up as a query suggestion, terms that haven’t been search for by a certain number of “unique submitters” may not be shown to searchers.

Infrequently Submitted Query Filter – eliminates queries which are infrequently submitted and probably not likely to be selected by a user.

An Appropriateness Filter – blocks certain queries based upon factors such as particular keywords in a query, and the content of search result pages that correspond to the query.

A Recency Filter – blocks query suggestions that may have been submitted earlier than a particular historical point in time, which might be hours, days, weeks, months, or years. So, if a particular query term was used commonly last year, but not so much this year, it might not be shown

An Antispoofing Filter – could be used to prevent certain queries or URLs from showing up in predictions if the prediction system sees them in a large number of artificially generated queries or URL submissions.

Conclusion

The patent application from Google provides more details and examples on how it might come up with different query suggestions for different searchers. What I thought was important was knowing that the predictive query suggestions that I see when I search might be different than the ones that you see.

Share

37 thoughts on “Predictive Search Query Suggestions”

  1. Poor Cuil when it first came out I had such high hopes but their biggest flaw was putting pictures next to the results because 75% of the time the pictures where terribly wrong.

  2. That is interesting.

    At one time I found these search suggestions annoying, but they do help when doing a search on a topic that you are relatively unfamiliar with.

    Sometimes these search suggestions really do help you “map out the lay of the land” when doing a search in unfamiliar territory.

  3. Hello, Interesting article. Have you tried to test google suggest to show different suggestions for the same query?

  4. For the most part, I only found these assist features as interesting to see what comes up as a popular topic, but they have provided some phrasing ideas that help focus in on my search.

    What I have not seen yet is the development of semantic search being incorporated into these features. That may be a way forward to help searchers discover more. For example, you could type in “capriotada” and be shown “bread pudding” as being semantically related. This could help search become a better form of discovery. Wonder if Wolfram Alpha will do that.

  5. Pingback: » Pandia Search Engine News Wrap-up 10
  6. These predictive search suggestions are the type of thing I’ve always thought were there, I never really noticed them start to appear during searches. I guess they don’t really do any harm as you can still do your normal search if you want to. All company’s seem to be doing it the same way so I don’t think anyone’s gaining any particular advantage over it.

    Your example of ‘baseb’ shows how diverse the results can be however – having fantasy baseball 2nd on MSN is a strange one considering baseball is the second word rather than the first…

  7. Hi Chris,

    The thing that has bothered me most about Cuil are those pictures that, like you note, are often “terribly wrong.” I hope that Cuil does get better when it comes to those, or drop them and focus upon showing more relevant search results.

  8. Hi People Finder,

    If you don’t know too much about a topic, Google’s mix of predictive search suggestions, as well as their query refinement suggestions can be helpful. I just get worried that some great information might be missed because it hasn’t been searched for frequently enough to show up in those suggestions…

  9. Hi Maciej,

    The patent filing just came out, and I wrote about it before doing too much testing. I’ve started doing some testing. No conclusions so far.

  10. Hi Frank,

    Good points. I don’t believe I usually pay too much attention to the query suggestions that drop down most of the time. I already know what I’m going to search for, regardless of what suggestions might be offered. But, if semantically related suggestions were shown, I would probably pay more attention.

    The Google and Yahoo patent filings on predictive query suggestions do indicate that they would look to more types of suggestions that just ones that might share letters with the query you’re typing into a search box. For instance, in this Google patent filing, we’re told:

    In some embodiments, the terms in the query are analyzed to extract concepts embodied in the search terms indicating a particular category of information (e.g., “technology, “food”, “music” or “animals”). One or more predicted results from queries related to one or more of the extracted concepts are merged with the predicted results returned to the user.

    That could make things pretty interesting.

  11. Hi Adam,

    That’s actually the Yahoo result that shows “baseball” as the second word in the phrase suggested (fantasy baseball). I’m actually a little surprised that the other search engines aren’t showing more results like that. The results from one search engine to another are more diverse than I expected, too.

  12. Hello Bill,
    I am at a loss to see the value of the predictive queary tool based on what I’ve seen when it goes tragically awry. When I use “why are black…” as my search query, the suggestions made do not remotely approach what I would be interested in and this happens when I use “why are white,”why are Chinese” or Irish or the ethnic group of your choice. With spell check and behaviorial relevance ranking, do we really need to get suggestions from a machine with a bad sense of humor or individuals who need to evaluate their preconceived notions?

  13. Hi Bill – good find on the patent app.

    The suggestions are useful about 50% of the time. When they are not then they are irritating – bit like the Microsoft paperclip thing.

    Something else to switch off, I guess…

  14. Hi Bill,
    Cuil’s implementation (the category matches) is interesting.

    Have you seen any examples where site searches are doing suggests based on matches against titles of documents?

    I really appreciate all the great content and research that you publish on your blog.

  15. Those are pretty annoying. I guess it’d be better if the suggestions were just displayed at the bottom of the search results page. I usualy know what I’m looking for, don’t need google’s help there.

  16. Hi marianne,

    Some of the first patent filings and papers that I came across about predictive queries described one of the major reasons for developing this technology to be an aid to people who were using handheld and mobile devices, with small keyboards (or even numerical keyboards). If a search engine could predict the query that you were trying to type in, it could save you from having to type the full query. I think that was a good idea.

    The results that you see are taken from “behaviorial relevance rankings,” where past queries from other searchers are often one of the main base sources of the predictive queries that you see.

  17. Hi John,

    Thanks. I wonder how many people actually use the query suggestions that are offered. It would be nice to see a whitepaper from one of the search engines where they discuss how frequently people choose predictive query suggestions when they are doing a search, and how frequently people turn off those query suggestions…

  18. Hi Pete,

    I thought it was interesting that Cuil offered categories as well. I think that’s a good idea.

    It’s probably likely that titles play a role in the rankings of pages that show up in search results in response to the queries suggested, but hard to tell what role titles might have in the actual suggestions of queries. It looks as though the suggestions are triggered by information found in query logs. It would be interesting if query suggestions offered might also be determined by information found in search results for the query being predicted, including titles of those documents, URLs, and snippets from those results.

  19. Hi Stancje,

    Google already offers query refinement suggestions within search results. The purpose behind those is to offer alternatives that might help people who may not know too much about the topics that they are searching for. One main purpose behind the predictive query suggestions is to offer the suggestions as a shortcut, to help searchers reduce the amount of typing that they may have to do when searching

  20. This is also an interesting post, though worries me that some of these companies will try to enforce these patents and it’s us, the searching public, that will lose out.

    Have you Yahoo’s http://developer.yahoo.com/yui/autocomplete/ Typeahead UI Pattern in their (open source) Design Pattern Library? It doesn’t deal with how to populate the suggestions list, but it does address all the issues I can think of in implementation.

    Avi

  21. As an end user these predictive query suggestion are help full for me because they eliminate the work of typing and hitting enter key and most of the time they offer the exact thing I was looking for.

    On the other side these suggestion are a good chance for web site owner to optimize their site for these searches to catch more visitors.

    What’s your opinion on this William.

  22. Hi Avi,

    Thank you. It is interesting to see the different search engines experiment with different approaches to predictive queries. It would be a shame to see them actively attempting to exclude others from experimentation with them as well, on the basis of their patent filings – I think you’re right that the public would be the ones losing out if they did that.

    Thanks also for the link to the Yahoo User Interface library design page. It’s really nice to be able to see some of the code behind the implementation of something like predictive queries. The related articles for mobile devices linked to near the bottom of that page were interesting too:

  23. Hi Agra,

    I think that there’s some value in the search engines providing predictive queries, but hope that the predictions don’t detour people from the queries that they may have actually intented too often.

    It is smart for site owners to type candidate keyword terms or phrases into search engines, and see how the search engines react, in terms of offering predictive queries, seeing what shows up in the search results, looking at any query suggestion refinements, and finding out whether or not blended image and video and news and book and other types of searches show up amongst those search results. Doing that may provide some ideas on other potential keyword terms/phrases to optimize for, and some other ideas on how to approach optimizing for the candidate terms, and possibly even whether it’s a good idea or not to target those terms.

  24. Hi William,
    you say Google delivers individual Predictive Search Query Suggestions based upon things like our queries on mobile device (e.g. iPhone) or our statements in communities profiles. Where is described how Google connects these different information. In Germany this approach would conflict with the law …

    br,
    dirk

  25. Hi Dirk,

    Really good question. Thanks. Privacy and privacy laws are one of the things that I’m very concerned about when it comes to search engines.

    My list of different sources of information that Google may use to come up with predictive searches comes from the patent application that I’ve linked to above (Method and System for Autocompletion Using Ranked Results), in the section of the description that starts with this line:

    [0045]An applicable fingerprint-to-table map 510 may be selected based on a number of different factors associated with a user or a request.

    .

    It’s possible that predictive search may not be using all of those information sources described in the patent application. The patent filing also notes that it would only use information such as that taken from a person’s profile if the person agreed to it being used that way. From the same paragraph:

    Similarly, an individual user may, with his/her permission, have a user profile that specifies information about the user or about a group associated with the user, and that “personalization information” may be used to identify a respective set of fingerprint-to-table maps for use when predicting results for that user.

    I don’t know German law well enough to know if such a use is permissible in the way that they describe it. We have to remember that what they describe in a patent application may have been implemented differently, as well.

  26. I dint know search engines can look at all those things you mention to suggest search options. I was more under the impression that the suggestions are based on keyword matches – broad and lateral.

    They have keyword tools and can also track most search patterns. I was thinking that is all they use to come up with suggestions.

    To add to Dirk’s point above it should not be allowed if they are using information from my phone or machine to suggest search options.

  27. Hi Ravi,

    The days of search engines solely looking at keyword matches ended almost a decade ago, if not longer. They need to look at query log files to track search patterns.

    I don’t think that using that type of information in an aggregated manner, where personally identifiable information isn’t included, is going to be a violation of privacy laws. An if you create a profile of information about yourself, and if you use personalized search to influence the search results that you might see, I’m not sure if that would be a problem either.

  28. Interesting. I’m wondering if the patent conflicts in anyway with Yahoo’s search assist, Bing autocomplete, Ask (and so many other sites)?

  29. Hi Jonathan,

    There is some overlap with the ideas and method presented in patents and whitepapers from other search engines. I listed a number of those in my post above. I’m sure that each has its own unique features, but I couldn’t tell you if one challenged another legally on the basis of its patent filings whether there might be a problem. I’m leaving that up to the legal teams at the search engines to explore. :)

  30. Hi Bill,

    do you know which period of time is used by Google to refine Search Query Suggestions? A month or more than that?

    Thank you for your help!

  31. Hi Dirk,

    We aren’t given a clear indication of how far back Google might look, but there is a mention of a recency filter in the patent filing, which suggests that Google might only look a certain distance back in time to generate query suggestions:

    [0050]One or more filters 504 are used to determine queries authorized for further processing. For example, filters can eliminate certain queries based on various criteria. In some embodiments, a privacy filter 504 prevents queries which have not been received from more than a certain number of unique submitters to be included in the authorized historical queries list 506. This could be accomplished by examining the unique identifier associated with each query, if one exists, and identifying only those queries which have been submitted by at least n unique submitters, where n is a number chosen based on privacy concerns (e.g., three or five unique submitters). In some embodiments, the filters 504 include a filter that eliminates queries which are infrequently submitted and therefore not likely to be selected by a user. In some embodiments, the filters 504 include an appropriateness filter 504 that blocks certain queries from inclusion based on a number of different factors such as the presence of one or more particular keywords in a query, and/or based on the content of the search results or documents that correspond to the query. Other types of filters could be easily imagined. For example, a filter could block queries submitted earlier than a particular historical point in time, such that the authorized historical queries list 506 represent recently submitted queries. What is considered recent depends on the embodiment (e.g., hours, days, weeks, months, or years).* In yet another example, an anti-spoofing filter 504 could be use to prevent the query/URL prediction system from being spoofed by a large number of a artificially generated queries or URL submissions. For instance, an anti-spoofing filter 504 might filter out multiple submissions of the same query or URL received from the same user or from the same client computer.

    * Emphasis mine.

    So, I can’t give you a definite answer to your question, but I can state that Google likely wouldn’t want to use data going so far back in time that it would be stale.

  32. Hi, I wanted to know on what bases Google suggestions are shown? Is it totally based on search rates of the terms? Can they be manipulated?

  33. Hi Nidhi,

    I’ve identified a number of possible signals that a search engine might consider when determining which predictive suggestions to show.

    There seem to be two main factors right now:

    1. that a suggestion be something that autocompletes a query – to save a searcher from having to type out a full query
    2. based upon how frequently people search for a specific related term.

    But, if you look in the section above labelled “FingerPrints and Search Query Suggestions,” there are other factors that they may consider either now or in the future. Of course, the search engines might look at others, too.

    Can they be manipulated?

    Possibly, but I wouldn’t make any guarantees on the ability to do so.

  34. Hi Matthew,

    There are a lot of things that I like about ask.com, but I’ve been disappointed by results that I’ve seen from them, by things like how slow it can be for new content to appear in their search results, and how slowly old content removed from the Web can take to disappear from their index. I’d love to see them become more competitive.

  35. Ask.com has to remain the single most under-rated resource in online search.

Comments are closed.