Search Suggestions from Previously Submitted Searcher Queries

Sharing is caring!

I came across an interesting Search Engine Land post last week. It inspired me to search and see if I could find a related patent from Google:

Google is suggesting searches based on users’ recent activity

I tried reproducing search suggestions shown to the author of the Search Engine Land article, but Google would not return those. Google may be experimenting with a limited number of searchers instead of showing those results to all searchers. I did find a patent about similar search suggestions.

When Google shows search suggestions on something you may have looked for in the past, that predicted query suggestion is likely related to a patent I’ve written about before, Autocompletion using previously submitted query data.

I wrote about an update in a continuation patent, but did not provide many details about how it works: How Google Predicts Autocomplete Query Suggestions is Updated.

Some interesting parts on identifying search suggestions and ranking them inspired me to write this post.

Search Suggestions Based on Previously Submitted Query Data

This patent is about: “using previously submitted query data to anticipate a user’s search request.”

Google has a long memory, and it remembers a lot about what someone might search for.

The description includes many assumptions that search engineers make about searchers (often an interesting reason to read through patents). Here are some from this patent that is worth thinking about:

Internet search engines aim to identify documents or other items that are relevant to a user’s needs and to present the documents or items in a manner that is most useful to the user. Such activity often involves a fair amount of mind-reading–inferring from various clues what the user wants. Certain clues may be user-specific. For example, the knowledge that a user is requesting a mobile device, and knowledge of the location of the device, can result in much better search results for such a user.

Clues about a user’s needs may also be more general. For example, search results can have elevated importance, or inferred relevance, if several other search results link to them. If the linking results are themselves highly relevant, then the linked-to results may have particularly high relevance. Such an approach to determining relevance may be premised on the assumption that, if authors of web pages felt that another web site was relevant enough to be linked to, then web searchers would also find the site to be particularly relevant. In short, the web authors “vote up” the relevance of the sites.

Other various inputs may be used instead of, or in addition to, such techniques for determining and ranking search results. For example, user reactions to particular search results or search result lists may be gauged, so that results on which users often click will receive a higher ranking. The general assumption under such an approach is that searching users are often the best judges of relevance, so that if they select a particular search result, it is likely to be relevant, or at least more relevant than the presented alternatives.

A Summary of the Search Suggestions Process Based on Previous Submitted Queries

The Description for this patent begins with a summary of the process in the patent. A “Detailed Description” is about how search at Google works, and what powers this search suggestion process.

Search suggestions may be based on user queries searched for before.

In the summary section of the patent, we are told about how the patent may address some assumptions:

When anticipating user search requests, responding involves certain methods for processing query information. Those include:

  • Receiving query information at a server system, with a part of a query from a searcher
  • Obtaining a set of predicted queries relevant to the part of the searcher’s query based on query and data indicative of the searcher relative to before submitted queries
  • Providing the set of predicted queries to the searcher

The patent also points out more features involved in the process such as obtaining the predicted queries including ordering the set of predicted queries based upon ranking criteria.

Those ranking criteria based upon the data indicative of searcher’s behavior relative to previously submitted queries.

Data about the searcher’s behavior about those previously submitted queries may include:

  • Click data
  • Location-specific data
  • Language-specific data
  • Other similar types of data

The patent points out the following as advantages of following the process described in the patent:

A search assistant receives query information from a search requestor before a searcher completely inputting the query.

Information associated with previous user (or users) searches (such as click data associated with search results) is collected. From the query information and the previous search information, a set of predicted queries is produced and provided to the search requestor for presentation.

The patent can be found at:

Autocompletion using previously submitted query data
Inventors: Michael Herscovici, Dan Guez, and Hyung-Jin Kim
Assignee: Google Inc.
US Patent: 9,740,780
Granted: August 22, 2017
Filed: December 1, 2014


A computer-implemented method for processing query information includes receiving query information at a server system. The query information includes a portion of a query from a search requestor. The method also includes obtaining a set of predicted queries relevant to the portion of the search requestor query based upon the portion of the query from the search requestor and data indicative of search requestor behavior relative to previously submitted queries. The method also includes providing the set of predicted queries to the search requestor.

Analysis of Ranking and Selection of Search Suggestions Based Upon Previous Query Data

The “Detailed Description” section of this search suggestions patent provides some insightful analysis about search at Google.

Relevance and Backlinks and a Rank Modifying Engine Lead to Ranking For Many Results at Google

This patent points out some of how search works at Google. It tells us that:

  1. The purpose of the patent is to “improve the relevance of results obtained from submitting search queries.”
  2. It describes ranking documents for a query as something that can be “performed using traditional techniques for determining an information retrieval (IR) score for indexed documents because of a given query.” And the relevance of a particular document about a query term may use look at the general level of back-links to a document containing matches for a search term to infer a document’s relevance. As the patent tells us:

    In particular, if a document is linked to (e.g., is the target of a hyperlink) by many other relevant documents (e.g., documents that also contain matches for the search terms), it can be inferred that the target document is particularly relevant. This inference can be made because the authors of the pointing documents presumably point, for the most part, to other documents that are relevant to their audience.

  3. We are given more details about some results being even more relevant than ones with backlinks. We are told that:

    If the pointing documents are in turn the targets of links from other relevant documents, they can be considered more relevant, and the first document can be considered particularly relevant because it is the target of relevant (or even highly relevant) documents. Such a technique may be the determinant of a document’s relevance or one o multiple determinants. The technique is exemplified in some systems that treat a link from one web page to another as an indication of quality for the latter page so that the page with the most such quality indicators is rated higher than others. Appropriate techniques can also be used to identify and eliminate attempts to cast false votes to artificially drive up the relevance of a page.

  4. There is another step that could potentially make some results even more relevant that involve what is referred to as a rank modifier engine:

    To further improve such traditional document ranking techniques, the ranking engine can receive an additional signal from a rank modifier engine to assist in determining an appropriate ranking for the documents. The rank modifier engine provides one or more prior models, or one or more measures of relevance for the documents based on one or more prior models, which can be used by the ranking engine to improve the search results’ ranking provided to the user. In general, a prior model represents a background probability of document result selection given the values of multiple selected features, as described further below. The rank modifier engine can perform one or more of the operations described below to generate the one or more prior models, or the one or more measures of relevance based on one or more prior models.

  5. This is a more detailed description of ranking than we normally see at Google. The section above references a Rank Modifier Engine that will be described in more detail further down this post

    Indexing, Scoring, Ranking, and Rank Modifier Engine

    ranking search suggestions

    The information retrieval system from this patent includes many different components:

    • Indexing engine
    • Scoring engine
    • Ranking engine
    • Rank modifier engine

    Scoring Engine

    A scoring engine may provide scores for document results based on many different features including:

    • Content-based features that link a query to document results
    • query-independent features that generally state the quality of document results

    Content-based features include aspects of document format, such as query matches to a title or anchor text in an HTML (HyperText Markup Language) page.

    The query-independent features can include aspects of document cross-referencing, such as a rank of the document or the domain.

    Moreover, the particular functions used by the scoring engine can be tuned, adjust the various feature contributions to the final IR score, using automatic or semi-automatic processes.

    Ranking Engine

    A ranking engine can produce a ranking of document search results for display to a searcher based on IR scores received from the scoring engine and possibly one or more signals from the rank modifier engine.

    A tracking component may record information about individual searcher selections of the search results presented in the ranking. The patent describes how selections may be tracked using javascript or a proxy system or a toolbar plugin:

    For example, the tracking component can be embedded JavaScript code included in a web page ranking that identifies user selections (clicks) of individual document results and also identifies when the user returns to the results page, thus indicating the amount of time the user spent viewing the selected document result. In other implementations, the tracking component can be a proxy system through which user selections of the document results are routed, or the tracking component can include pre-installed software at the client (e.g., a toolbar plug-in to the client’s operating system). Other implementations are also possible, such as by using a feature of a web browser that allows a tag/directive to be included in a page, which requests the browser to connect back to the server with a message(s) regarding link(s) clicked by the user.

    Logged selection information could capture for each selection:

    • Query (Q)
    • Document (D)
    • Time (T) on the document
    • Language (L) employed by the user
    • Country (C) where the user is likely located (e.g., based on the server used to access the IR system).

    Recorded information about a searcher’s interactions with presented rankings:

    • Negative information, such as presented document results that were not clicked on
    • Position(s) of click(s) in the user interface
    • IR scores of clicked results
    • IR scores of all results shown before the clicked result
    • Titles and snippets shown to the user before the clicked result
    • The user’s cookie
    • Cookie age
    • IP (Internet Protocol) address
    • User agent of the browser
    • Etc

    More recorded information (as described in this post below) about building a prior model.

    Rank Modifier Engine

    Similar recorded information (e.g., IR scores, position, etc.) for an entire session, or many sessions, including every click that occurs both before and after a current click.

    Stored Information in the result selection logs used by the rank modifier engine to generate one or more signals to the ranking engine.

    The stored information in the search results selection logs along with the information collected by the tracking component may also be accessible by a search assistant, which is also a component of the information retrieval system.

    Along with receiving information from these components, the search assistant could also monitor a user’s entry of a search query.

    On receiving a partial search query, the query along with the information (e.g., click data) from the tracking component and the results selection log(s) may be used to predict a searcher’s contemplated complete query.

    Based on this information, predictions may be ordered according to one or more ranking criteria before being presented to assist the user in completing the query.

    Presentation of a Search Suggestion

    As a searcher enters a search query, the searcher’s input is monitored.

    Before a searcher signals they have completed entering the search query, a part of the query goes to the search engine.

    Also, data such as click data (or other types of previously collected information) may is sent with the query portion.

    The part of the query sent may be:

    • A few characters
    • A search term
    • More than one search term
    • Any other combination of characters and terms

    The search engine receives the partial query and the data (e.g., click data) for processing and makes predictions) about the searcher’s contemplated complete query.

    Relevant information may be retrieved for processing with the received partial query to produce search suggestions predictions.

    Predictions may be ordered according to one or more ranking criteria.

    So, queries that have been submitted at a higher frequency may be ordered before queries submitted at lower frequencies.

    The search engine may also use various types of information for ranking and ordering predicted queries as search suggestions.

    Information about previously entered search queries may be used to make ordered predictions.

    Previous queries may include search queries associated with the same user, another user, or from a community of users.

    If one of the predicted queries is what the searcher intended as the desired query, the searcher may select that predicted query and proceed without having to finish entering the desired query.

    Or, if the predicted queries do not reflect what the searcher had in mind, then the searcher can continue entering the desired search query, which could trigger one or more other sets of search suggestions.

    Ranking User Submitted Previous Queries as Search Suggestions

    A few different processes may rank and order predicted search queries:

    • Ordered predicted search queries following frequency of submission by a community of users
    • Using time constraints with search queries ordered under the last time/date value of the query
    • Using personalization information or community information about subjects, concepts or categories of information of interest to the searcher (from prior search or browsing information)
    • Personalization from an associated group of the searcher or belonging to (a member or an employee.)
    • According to first ranking criteria, such as predefined popularity criteria, and then possibly reordered if any of the predicted search queries match the user personalization information of the user, to place the matching predicted search queries at or closer to the top of the ordered set of predicted search queries
    • Using Information provided by the tracking component and the result selection log(s) for ranking and ordering the predicted search queries. (click data, language-specific, and country-specific data.)
    • Using processed click data (e.g., aggregated click data for a given query) for ranking and ordering predicted search queries – or each query a score may be calculated by summing click data (e.g., weighted clicks, etc.) on documents associated with the query, and predicted queries may be ordered based upon the score (e.g., higher values representing better)

    An Information Model Based On Earlier Submitted Query Data to Obtain Search Suggestions Predictions

    This model can predict query data that may satisfy a searcher the most by looking at long click information. A timer can track how long a user views or “dwells” on a document.

    That amount of time is “click data”.

    More time dwelling on a document is a “long click”, indicating a user found the document to be relevant for their query.

    A brief period viewing a document is a “short click”, interpreted as a lack of document relevance.

    Click data is a count of each click type (e.g., long, medium, short) for a particular query and document combination.

    This click data from model queries for a given document can create a quality of result statistic for that document to enhance a ranking of a document.

    Quality of result statistic can be a weighted average of the count of long clicks for a given document and query.

    This description from the patent tells us about how click data might be stored in tuples:

    A search engine (e.g., the search engine) or other processes may create a record in the model for documents that are selected by users in response to a query or a partial query. Each record within the model (herein referred to as a tuple: ) is at least a combination of a query submitted by users, a document reference selected by users in response to that query, and aggregation of click data for all users that select the document reference in response to the query. The aggregate click data can be viewed as an indication of document relevance. In various implementations, model data can be location-specific (e.g. country, state, etc) or language-specific. For example, a country-specific tuple would include the country from where the user query originated from in whereas a language-specific tuple would include the language of the user query. Other extensions of model data are possible.

    The model may also include Post-click behavior tracked by the tracking component.

    This patent includes information about how Google may use click tracking data when ranking search suggestion predictions. It tells us about sollected data about clicks:

    The information gathered for each click can include:

    (1) the query (Q) the user entered,
    (2) the document result (D) the user clicked on,
    (3) the time (T) on the document,
    (4) the interface language (L) (which can be given by the user),
    (5) the country (C) of the user (identified by the host that they use, such as www-store-co-uk to sho the United Kingdom), and
    (6) more aspects of the user and session.

    Time (T) can be measured as the time between the initial click through to the document result until the time the user comes back to the main page and clicks on another document result.

    An assessment about the time (T) and whether it indicates a longer view of the document result or a shorter view of the document result (since longer views are generally indicative of quality for the click through the result.) This assessment about the time (T) can further be made in conjunction with various weighting techniques.

    Beyond Long Clicks

    Document views from the selections can be weighted based on viewing length information to produce weighted views of the document result.

    So, rather than distinguishing long clicks from short clicks, a wider range of click through viewing times can be included in the assessment of result quality, where longer viewing times in the range are given more weight than shorter viewing times.

    Predicted Search Suggestions

    Google will sometimes display search suggestions using autocomplete and also based upon user data from previous queries from a searcher’s previous search history. Or from the history of someone whom the searcher may be associated with, such as a fellow member of an organization or a co-worker.

    While results related to those previous queries can be ranked based upon relevance and backlinks, the search suggestions may include results that searchers spent long clicks upon, including long times viewing.

    So under this patent, predictions about search suggestions chosen using autocomplete may best meet a searcher’s informational needs by being searches that include results remembered as resulting in long clicks and long viewing times.

Sharing is caring!

20 thoughts on “Search Suggestions from Previously Submitted Searcher Queries”

  1. An interesting finding, I guess the click data and dwell time is only been used on the browser suggested search query, in order to provide a personalized experience based on their search behaviour.

  2. Hi Amine,

    Yes, the click data and dwell time Data isn’t being used to rank live search results, but rather to identify pages that people have seemed to like using those measures as a sign of search success.

  3. The most interesting part was for me that different Information Retrieval Model. The relationship of Indexing, Scoring, Rank Modifier Engine, and Ranking Engine is also interesting. First time I see these interesting terms “Rank Modifier and Ranking Engine.” I wonder that do they have a relation with RankBrain and Neutral Matching.

    Also, thank you for this awesome article. Seobythesea is an SEO Treasure.

  4. Hi Paw,

    There is a link to the patent in the post, and you are welcome to read through that, and see what it says about how they may be using information that shows long clicks and how much time people spend on results for queries that they may select as predictive suggested searches. They are not using dwell time as a ranking signal, but they are using it as a signal of searcher satisfaction associated with results that are shown for specific queries. Read through the patent. It is possible to look at other patents from the same inventors, and you may see them refer to dwell time in those as well.

  5. Hi Koray.

    Thanks. I have seen some of those terms before in patents from Hyung-Jin Kim. Look through patents he has worked on. You will see a number of those here:

    I am seeing a lot of references to a Rank Modifier Engine in:

    Modifying search result ranking based on implicit user feedback,229,166.PN.&OS=PN/10,229,166&RS=PN/10,229,166

    I don’t believe that there is a connection to RankBrain and Neural Matching. I do believe that they do have something to do with user data, which we can’t be sure that RankBrain or Neural Matching are really looking at. It is worth looking into how Google is potentially using such data.

  6. Hi, This is a very informative and well detailed explained blog .it is very interesting .it is very helpful for me and I hope you will post this type of blog further.

  7. Long Clicks seems to be quite new for me showing the new data to the searchers. But yeah it is true Google is day by day using our own behavior to show us the search data.

  8. Hi Gautam,

    I first came across the idea of long clicks as something that people at Google look for to see if people are appreciating the search results they see from a book about Google by Steven Levy called “In the Plex.” I recommend reading it if you can. It is filled with helpful information about how Google works, and humanizes what they do at the search engine.

    The patent was interesting because it described tracking approaches that can measure long clicks, which I hadn’t seen anywhere before.

    Yes, like any webmasters, Google looks at their analytics to see how good of a job they might be doing for their customers (searchers everywhere.)

  9. Hi Priyank,

    Thank you. I’ve been writing about patents I have been finding from Google since 2005. I have around 20 to write about in a queue right now, and Google doesn’t seem to be slowing down in being granted new ones. 🙂

  10. Thanks Bill,
    This is a great resource for those looking more detailed information – hard to find online, so as someone famous once said – I’ll be back!
    Its amazing how data from our own usage is being monitored and re-used especially within the advertising sector also.

  11. Thanks Bill,
    This is a great resource for those looking more detailed information – hard to find online… As an SEO Agency in Liverpool and Belfast we can see the benefits.
    Its amazing how data from our usage being analysed and re-used especially within the marketing and advertising sector also.

  12. thank you for giving such a great information . I think I have learned something new today . Thanks a lot for this information

  13. Hello Bill, What techniques are you talking about when you say? > Appropriate techniques can also be used to identify and eliminate false vote attempts to artificially increase the relevance of a page ? Thank you for your reply

  14. Hi Eric,

    That line about false vote attempts is part of the patent that I quoted, which tells us that sometimes people might point links at a page to make it seem more important than it really is, by doing things such as creating thin content pages that might link to those pages or stuffing links to a page on some other pages that didn’t originally link to those pages. The patent is referring to such links as “false votes.”

  15. Hey, Well Information about very best blog about ranking engine and rank modifier engine like all types of query. So thank you for sharing about best important information.
    Nice Blog.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.