I came across an interesting Search Engine Land post last week. It inspired me to search and see if I could find a related patent from Google:
I tried reproducing search suggestions shown to the author of the Search Engine Land article, but Google would not return those. Google may be experimenting with a limited number of searchers instead of showing those results to all searchers. I did find a patent about similar search suggestions.
When Google shows search suggestions on something you may have looked for in the past, that predicted query suggestion is likely related to a patent I’ve written about before, Autocompletion using previously submitted query data.
I wrote about an update in a continuation patent, but did not provide many details about how it works: How Google Predicts Autocomplete Query Suggestions is Updated.
Some interesting parts on identifying search suggestions and ranking them inspired me to write this post.
Search Suggestions Based on Previously Submitted Query Data
This patent is about: “using previously submitted query data to anticipate a user’s search request.”
Google has a long memory, and it remembers a lot about what someone might search for.
The description includes many assumptions that search engineers make about searchers (often an interesting reason to read through patents). Here are some from this patent that is worth thinking about:
Internet search engines aim to identify documents or other items that are relevant to a user’s needs and to present the documents or items in a manner that is most useful to the user. Such activity often involves a fair amount of mind-reading–inferring from various clues what the user wants. Certain clues may be user-specific. For example, the knowledge that a user is requesting a mobile device, and knowledge of the location of the device, can result in much better search results for such a user.
Clues about a user’s needs may also be more general. For example, search results can have elevated importance, or inferred relevance, if several other search results link to them. If the linking results are themselves highly relevant, then the linked-to results may have particularly high relevance. Such an approach to determining relevance may be premised on the assumption that, if authors of web pages felt that another web site was relevant enough to be linked to, then web searchers would also find the site to be particularly relevant. In short, the web authors “vote up” the relevance of the sites.
Other various inputs may be used instead of, or in addition to, such techniques for determining and ranking search results. For example, user reactions to particular search results or search result lists may be gauged, so that results on which users often click will receive a higher ranking. The general assumption under such an approach is that searching users are often the best judges of relevance, so that if they select a particular search result, it is likely to be relevant, or at least more relevant than the presented alternatives.
A Summary of the Search Suggestions Process Based on Previous Submitted Queries
The Description for this patent begins with a summary of the process in the patent. A “Detailed Description” is about how search at Google works, and what powers this search suggestion process.
Search suggestions may be based on user queries searched for before.
In the summary section of the patent, we are told about how the patent may address some assumptions:
When anticipating user search requests, responding involves certain methods for processing query information. Those include:
- Receiving query information at a server system, with a part of a query from a searcher
- Obtaining a set of predicted queries relevant to the part of the searcher’s query based on query and data indicative of the searcher relative to before submitted queries
- Providing the set of predicted queries to the searcher
The patent also points out more features involved in the process such as obtaining the predicted queries including ordering the set of predicted queries based upon ranking criteria.
Those ranking criteria based upon the data indicative of searcher’s behavior relative to previously submitted queries.
Data about the searcher’s behavior about those previously submitted queries may include:
- Click data
- Location-specific data
- Language-specific data
- Other similar types of data
The patent points out the following as advantages of following the process described in the patent:
A search assistant receives query information from a search requestor before a searcher completely inputting the query.
Information associated with previous user (or users) searches (such as click data associated with search results) is collected. From the query information and the previous search information, a set of predicted queries is produced and provided to the search requestor for presentation.
The patent can be found at:
Autocompletion using previously submitted query data
Inventors: Michael Herscovici, Dan Guez, and Hyung-Jin Kim
Assignee: Google Inc.
US Patent: 9,740,780
Granted: August 22, 2017
Filed: December 1, 2014
A computer-implemented method for processing query information includes receiving query information at a server system. The query information includes a portion of a query from a search requestor. The method also includes obtaining a set of predicted queries relevant to the portion of the search requestor query based upon the portion of the query from the search requestor and data indicative of search requestor behavior relative to previously submitted queries. The method also includes providing the set of predicted queries to the search requestor.
Analysis of Ranking and Selection of Search Suggestions Based Upon Previous Query Data
The “Detailed Description” section of this search suggestions patent provides some insightful analysis about search at Google.
Relevance and Backlinks and a Rank Modifying Engine Lead to Ranking For Many Results at Google
This patent points out some of how search works at Google. It tells us that:
- The purpose of the patent is to “improve the relevance of results obtained from submitting search queries.”
- It describes ranking documents for a query as something that can be “performed using traditional techniques for determining an information retrieval (IR) score for indexed documents because of a given query.” And the relevance of a particular document about a query term may use look at the general level of back-links to a document containing matches for a search term to infer a document’s relevance. As the patent tells us:
In particular, if a document is linked to (e.g., is the target of a hyperlink) by many other relevant documents (e.g., documents that also contain matches for the search terms), it can be inferred that the target document is particularly relevant. This inference can be made because the authors of the pointing documents presumably point, for the most part, to other documents that are relevant to their audience.
- We are given more details about some results being even more relevant than ones with backlinks. We are told that:
If the pointing documents are in turn the targets of links from other relevant documents, they can be considered more relevant, and the first document can be considered particularly relevant because it is the target of relevant (or even highly relevant) documents. Such a technique may be the determinant of a document’s relevance or one o multiple determinants. The technique is exemplified in some systems that treat a link from one web page to another as an indication of quality for the latter page so that the page with the most such quality indicators is rated higher than others. Appropriate techniques can also be used to identify and eliminate attempts to cast false votes to artificially drive up the relevance of a page.
- There is another step that could potentially make some results even more relevant that involve what is referred to as a rank modifier engine:
To further improve such traditional document ranking techniques, the ranking engine can receive an additional signal from a rank modifier engine to assist in determining an appropriate ranking for the documents. The rank modifier engine provides one or more prior models, or one or more measures of relevance for the documents based on one or more prior models, which can be used by the ranking engine to improve the search results’ ranking provided to the user. In general, a prior model represents a background probability of document result selection given the values of multiple selected features, as described further below. The rank modifier engine can perform one or more of the operations described below to generate the one or more prior models, or the one or more measures of relevance based on one or more prior models.
This is a more detailed description of ranking than we normally see at Google. The section above references a Rank Modifier Engine that will be described in more detail further down this post
Indexing, Scoring, Ranking, and Rank Modifier Engine
The information retrieval system from this patent includes many different components:
- Indexing engine
- Scoring engine
- Ranking engine
- Rank modifier engine
A scoring engine may provide scores for document results based on many different features including:
- Content-based features that link a query to document results
- query-independent features that generally state the quality of document results
Content-based features include aspects of document format, such as query matches to a title or anchor text in an HTML (HyperText Markup Language) page.
The query-independent features can include aspects of document cross-referencing, such as a rank of the document or the domain.
Moreover, the particular functions used by the scoring engine can be tuned, adjust the various feature contributions to the final IR score, using automatic or semi-automatic processes.
A ranking engine can produce a ranking of document search results for display to a searcher based on IR scores received from the scoring engine and possibly one or more signals from the rank modifier engine.
Logged selection information could capture for each selection:
- Query (Q)
- Document (D)
- Time (T) on the document
- Language (L) employed by the user
- Country (C) where the user is likely located (e.g., based on the server used to access the IR system).
Recorded information about a searcher’s interactions with presented rankings:
- Negative information, such as presented document results that were not clicked on
- Position(s) of click(s) in the user interface
- IR scores of clicked results
- IR scores of all results shown before the clicked result
- Titles and snippets shown to the user before the clicked result
- The user’s cookie
- Cookie age
- IP (Internet Protocol) address
- User agent of the browser
More recorded information (as described in this post below) about building a prior model.
Rank Modifier Engine
Similar recorded information (e.g., IR scores, position, etc.) for an entire session, or many sessions, including every click that occurs both before and after a current click.
Stored Information in the result selection logs used by the rank modifier engine to generate one or more signals to the ranking engine.
The stored information in the search results selection logs along with the information collected by the tracking component may also be accessible by a search assistant, which is also a component of the information retrieval system.
Along with receiving information from these components, the search assistant could also monitor a user’s entry of a search query.
On receiving a partial search query, the query along with the information (e.g., click data) from the tracking component and the results selection log(s) may be used to predict a searcher’s contemplated complete query.
Based on this information, predictions may be ordered according to one or more ranking criteria before being presented to assist the user in completing the query.
Presentation of a Search Suggestion
As a searcher enters a search query, the searcher’s input is monitored.
Before a searcher signals they have completed entering the search query, a part of the query goes to the search engine.
Also, data such as click data (or other types of previously collected information) may is sent with the query portion.
The part of the query sent may be:
- A few characters
- A search term
- More than one search term
- Any other combination of characters and terms
The search engine receives the partial query and the data (e.g., click data) for processing and makes predictions) about the searcher’s contemplated complete query.
Relevant information may be retrieved for processing with the received partial query to produce search suggestions predictions.
Predictions may be ordered according to one or more ranking criteria.
So, queries that have been submitted at a higher frequency may be ordered before queries submitted at lower frequencies.
The search engine may also use various types of information for ranking and ordering predicted queries as search suggestions.
Information about previously entered search queries may be used to make ordered predictions.
Previous queries may include search queries associated with the same user, another user, or from a community of users.
If one of the predicted queries is what the searcher intended as the desired query, the searcher may select that predicted query and proceed without having to finish entering the desired query.
Or, if the predicted queries do not reflect what the searcher had in mind, then the searcher can continue entering the desired search query, which could trigger one or more other sets of search suggestions.
Ranking User Submitted Previous Queries as Search Suggestions
A few different processes may rank and order predicted search queries:
- Ordered predicted search queries following frequency of submission by a community of users
- Using time constraints with search queries ordered under the last time/date value of the query
- Using personalization information or community information about subjects, concepts or categories of information of interest to the searcher (from prior search or browsing information)
- Personalization from an associated group of the searcher or belonging to (a member or an employee.)
- According to first ranking criteria, such as predefined popularity criteria, and then possibly reordered if any of the predicted search queries match the user personalization information of the user, to place the matching predicted search queries at or closer to the top of the ordered set of predicted search queries
- Using Information provided by the tracking component and the result selection log(s) for ranking and ordering the predicted search queries. (click data, language-specific, and country-specific data.)
- Using processed click data (e.g., aggregated click data for a given query) for ranking and ordering predicted search queries – or each query a score may be calculated by summing click data (e.g., weighted clicks, etc.) on documents associated with the query, and predicted queries may be ordered based upon the score (e.g., higher values representing better)
An Information Model Based On Earlier Submitted Query Data to Obtain Search Suggestions Predictions
This model can predict query data that may satisfy a searcher the most by looking at long click information. A timer can track how long a user views or “dwells” on a document.
That amount of time is “click data”.
More time dwelling on a document is a “long click”, indicating a user found the document to be relevant for their query.
A brief period viewing a document is a “short click”, interpreted as a lack of document relevance.
Click data is a count of each click type (e.g., long, medium, short) for a particular query and document combination.
This click data from model queries for a given document can create a quality of result statistic for that document to enhance a ranking of a document.
Quality of result statistic can be a weighted average of the count of long clicks for a given document and query.
This description from the patent tells us about how click data might be stored in tuples:
A search engine (e.g., the search engine) or other processes may create a record in the model for documents that are selected by users in response to a query or a partial query. Each record within the model (herein referred to as a tuple:
) is at least a combination of a query submitted by users, a document reference selected by users in response to that query, and aggregation of click data for all users that select the document reference in response to the query. The aggregate click data can be viewed as an indication of document relevance. In various implementations, model data can be location-specific (e.g. country, state, etc) or language-specific. For example, a country-specific tuple would include the country from where the user query originated from in whereas a language-specific tuple would include the language of the user query. Other extensions of model data are possible.
The model may also include Post-click behavior tracked by the tracking component.
This patent includes information about how Google may use click tracking data when ranking search suggestion predictions. It tells us about sollected data about clicks:
The information gathered for each click can include:
(1) the query (Q) the user entered,
(2) the document result (D) the user clicked on,
(3) the time (T) on the document,
(4) the interface language (L) (which can be given by the user),
(5) the country (C) of the user (identified by the host that they use, such as www-store-co-uk to sho the United Kingdom), and
(6) more aspects of the user and session.
Time (T) can be measured as the time between the initial click through to the document result until the time the user comes back to the main page and clicks on another document result.
An assessment about the time (T) and whether it indicates a longer view of the document result or a shorter view of the document result (since longer views are generally indicative of quality for the click through the result.) This assessment about the time (T) can further be made in conjunction with various weighting techniques.
Beyond Long Clicks
Document views from the selections can be weighted based on viewing length information to produce weighted views of the document result.
So, rather than distinguishing long clicks from short clicks, a wider range of click through viewing times can be included in the assessment of result quality, where longer viewing times in the range are given more weight than shorter viewing times.
Predicted Search Suggestions
Google will sometimes display search suggestions using autocomplete and also based upon user data from previous queries from a searcher’s previous search history. Or from the history of someone whom the searcher may be associated with, such as a fellow member of an organization or a co-worker.
While results related to those previous queries can be ranked based upon relevance and backlinks, the search suggestions may include results that searchers spent long clicks upon, including long times viewing.
So under this patent, predictions about search suggestions chosen using autocomplete may best meet a searcher’s informational needs by being searches that include results remembered as resulting in long clicks and long viewing times.