Predictive Search Query Suggestions
When you start typing a query into a search box at many search engines, you may see a dropdown appear under the search box which offers selectable suggestions for query terms even before you may have finished typing. The suggestions may also provide alternative URLs for web pages if you are typing the address of a web page into the search box.
We’ve seen a few patent filings in the past that describe this kind of behavior, but they haven’t gone into a lot of detail about how those specific suggestions might have been chosen.
A patent application published by Google this week gives us a little more insight into the search suggestions that it offers. Interestingly, it’s possible that the query suggestions that I see might be different than the ones that you may be offered, based upon things such as whether or not either of us:
- Is using a mobile device to connect to the search engine or a desktop computer
- Might be identifiable as a member of a group profile interested in certain topics or categories of sites
- Has a search history that the search engine can use to bias those suggestions towards something we are interested in
- Are viewing a specific page which has a specific profile attached to it, and are using a search toolbar for our search
- May be connecting to the Web at different connection speeds, or are using different connection types
- Could have set our browsing preferences differently in our browser or through the search engine for things such as preferred language
The patent filing also describes filters that might keep certain terms and phrases from showing up in search suggestions. More on those filters below.
Predictive Search Suggestion Interfaces
Predictive search suggestions have become pretty popular, and they tend to look pretty similar from one search engine to another. Even though they may look similar, it’s possible that the way each search engine comes up with suggestions may vary drastically. Regardless of that, I though it would be interesting to take a look at how a number of search engines present their suggestions, and see if they provided any information about those suggestions on their pages.
Google describes their query suggestion approach on one of their help pages titled Features: Google Suggest. At one point before query suggestions were integrated into Google’s Web search, Google had a separate page in their experimental labs called “Google Suggest” where you could receive query suggestions. While that page is no longer available, the Google Suggest FAQ still exists.
Yahoo Search Assist
Yahoo’s query suggestions have a slightly different look and feel, in a scrollable box that opens below their search box, and they are known as Yahoo Search Assist.
Microsoft Live Search Suggestions
Microsoft Live calls their predictive query suggestions Search Suggestions
While I found a patent application from Ask.com on search suggestions, it mainly described an interface for suggestions without much detail on how those suggestions where derived. It also didn’t look much like the query suggestions offered today on Ask.com. There isn’t much else on the ask.com site about their predictive query suggestion approach.
On Cuil’s Features page (no longer available) is a subtle dig at Google in their description of their Search query suggestions, where they tell us:
When you type a query, sometimes you’ll see a search suggestion with an icon representing a website. Click on this link and you will go directly to that website. We let you look before you leap, because not everyone feels lucky.
Presumably, the mention of the word “lucky” refers to Google’s “I’m Feeling Lucky” button on the front page of that search engine, which normally brings you directly to the first result in the search results for a query typed into Google’s search box. Here’s what Cuil’s search suggestions look like:
Patent Filings for Predictive Query Suggestions
There have been a number of papers and patent filings involving predictve query suggestions from the major commercial search engines. I’ve written about a few of them in the past. If you’d like to see those posts, they are available here:
- Can Google Read Your Mind? Processing Predictive Queries
- Google Improving Mobile Search
- Google predicting queries
- Yahoo’s Predictive Queries, Invisible Tabs, and Temporal and Monetization Bias Experiments
- Predictive Queries versus Unique Searches
- Yahoo’s “Universal Search” and Vertical Search Suggestions
The latest patent filing that I’ve seen on predictive query suggestions was published this week from Google:
Method and System for Autocompletion Using Ranked Results
Invented by Kevin A. Gibbs, Sepandar D. Kamvar, Taher H. Haveliwala, and Glen M. Jeh
Assigned to Google
US Patent Application 20090119289
Published May 7, 2009
Filed December 29, 2008
A set of ordered predicted completion strings are presented to a user as the user enters text in a text entry box (e.g., a browser or a toolbar). The predicted completion strings can be in the form of URLs or query strings. The ordering may be based on any number of factors (e.g., a query’s frequency of submission from a community of users). URLs can be ranked based on an importance value of the URL. Privacy is taken into account in a number of ways, such as using a previously submitted query only when more than a certain number of unique requesters have made the query.
The sets of ordered predicted completion strings is obtained by matching a fingerprint value of the user’s entry string to a fingerprint to table map which contains the set of ordered predicted completion strings.
Where this differs most from some of the previous patent filings on the topic is telling us that the query suggestions shown for one searcher may differ from the query suggestions shown for other searchers based upon a number of different possible signals.
While one method of ranking and displaying specific query suggestions may depend upon how frequently queries shown as suggestions may have been submitted to the search engine in the past, there are other factors that can influence which suggestions are shown to whom. I started this post with a list of some of those signals.
User personalization information may play a role in determining which query suggestions you might see as you search. The patent filing tells us:
For instance, user personalization information may include information about subjects, concepts or categories of information that are of interest to the user. The user personalization information may be provided directly by the user, or may be inferred with the user’s permission from the user’s prior search or browsing activities, or may be based at least in part on information about a group associated with the user or to which the user belongs (e.g., as a member, or as an employee).
It’s also possible that the predictive queries shown to a searcher may be influenced by search queries that are stored locally on your computing device. So, if you’ve searched for a topic before, and your query search history may contain some queries that might part of your search history, those can be offered to you as well as new suggestions which might be taken from the search engine’s cache of previous queries, or from a database of queries if the cache doesn’t contain many suggestions.
FingerPrints and Search Query Suggestions
The search queries that may be suggested for your search can be based upon a “fingerprint” associated with that search. Each query (or partial query as you type) can have a number of different fingerprints associated with it based upon a number of different factors, such as:
- Profile information provided by the user, including things like location
- Information taken from the request itself, such as language
- Information associated with the user based upon user behavior signals such as previous searches during a search session
- Device-type – a handheld might receive fewer predictive queries due to their smaller screen size
- Connection type
- Importance Factors Associated with Query Terms – query terms having lower importance factors could be removed from the predictions before terms having higher importance factors
- Categories Associated with Users – different sets of fingerprint-to-table maps might be used for respective categories of users, where those categories or topics are associated with the user
- Historic Queries Associated with web sites – a partial search query received from a particular website (perhaps through a toolbar search) might be mapped to predicted results generated from historical queries received from the same website, or from a group of websites that might be seen to be similar to that particular website
- Misspellings – if a query being type in could be considered to be a “conspicuously misspelled word,” predictive queries for the correctly spelled word may be merged with the predicted results
- Concepts extraction – the terms in the query might be analyzed to extract concepts from the search terms indicating a particular category of information, such as “technology, “food”, “music” or “animals.”
- Community Membership – queries from searchers sharing at least one similar characteristic such as: “belonging to the same workgroup, using the same language, having an internet address associated with the same country or geographic region, or; the like.”
Some query suggestions may not appear in the dropdown box because of filters that keep them from showing up. There are a number of different types of filters that might be involved, such as:
A Privacy Filter – Since the number of queries that have been received by the search engine is one of the signals looked at to decide whether a term or phrase should show up as a query suggestion, terms that haven’t been search for by a certain number of “unique submitters” may not be shown to searchers.
Infrequently Submitted Query Filter – eliminates queries which are infrequently submitted and probably not likely to be selected by a user.
An Appropriateness Filter – blocks certain queries based upon factors such as particular keywords in a query, and the content of search result pages that correspond to the query.
A Recency Filter – blocks query suggestions that may have been submitted earlier than a particular historical point in time, which might be hours, days, weeks, months, or years. So, if a particular query term was used commonly last year, but not so much this year, it might not be shown
An Antispoofing Filter – could be used to prevent certain queries or URLs from showing up in predictions if the prediction system sees them in a large number of artificially generated queries or URL submissions.
The patent application from Google provides more details and examples on how it might come up with different query suggestions for different searchers. What I thought was important was knowing that the predictive query suggestions that I see when I search might be different than the ones that you see.