Finding a Search Suggestion Patent about Previously Submitted Searcher Queries
I came across an interesting Search Engine Land post last week. I wanted to see if I could find a related patent from Google:
I tried search suggestions shown to the author of the Search Engine Land article. But Google would not return those. Google may be experimenting with a limited number of searchers instead of showing those results to all searchers. I did find a patent about similar search suggestions.
When Google shows search suggestions for something that you may have looked for in the past, that predicted query suggestion is likely related to a patent I’ve written about before. That patent was Autocompletion using previously submitted query data.
I wrote about an updated continuation patent but did not provide many details about how it works: How Google Predicts Autocomplete Query Suggestions is Updated.
Some interesting parts on identifying search suggestions and ranking them inspired me to write this post.
Search Suggestions Based on Previously Submitted Query Data
This patent is about: “using previously submitted query data to anticipate a user’s search request.”
Google has a long memory. It remembers a lot about what someone might search for.
The description includes many assumptions search engineers make about searchers. Those are often a good reason to read through patents. Here are some from this patent that is worth thinking about:
Internet search engines aim to identify documents or other items relevant to a user’s needs and present the documents or items in a manner that is most useful to the user. Such activity often involves a fair amount of mind-reading–inferring from various clues what the user wants. Certain clues may be user-specific. For example, the knowledge that a user is requesting a mobile device and knowledge of the device’s location can result in much better search results for such a user.
Clues about a user’s search needs may also be more general. For example, search results can have elevated importance or inferred relevance if several other searches result in a link. If the linking results are themselves highly relevant, then the linked-to results may have particularly high relevance. Such an approach to determining relevance may use the assumption that if authors of web pages felt that another website was relevant enough to link to, web searchers would also find the site particularly relevant. In short, the web authors “vote up” the relevance of the sites.
Other various inputs may work instead of, or in addition to, such techniques for determining and ranking search results. For example, user reactions to particular search results or search result lists may happen, so that results on which users often click will receive a higher ranking. The general assumption under such an approach is that searching users are often the best judges of relevance. If they select a particular search result, it is likely to be relevant or more relevant than the presented alternatives.
A Summary of the Search Suggestions Process Based on Previous Submitted Queries
The Description for this patent begins with a summary of the process in the patent. It can include how search at Google works and what powers the search suggestion process.
Search suggestions work with user queries searched for before.
In the summary section of the patent, we learn about how the patent may address some assumptions:
When anticipating user search requests, responding involves certain methods for processing query information. Those include:
- Receiving query information at a server system, with a part of a query from a searcher
- Obtaining a set of predicted queries relevant to the part of the searcher’s query based on query and data indicative of the searcher relative to before submitted queries
- Providing the set of predicted queries to the searcher
The patent also points out more features involved in the process, such as obtaining the predicted queries, including ordering the set of predicted queries based upon ranking criteria.
Those ranking criteria work with the data indicative of the searcher’s behavior relative to previously submitted queries.
Data about the searcher’s behavior about those previously submitted queries may include:
- Click data
- Location-specific data
- Language-specific data
- Other similar types of data
Advantages of Following The Patent on Search Suggestions
The patent points out the following as advantages of following the process described in the patent:
A search assistant receives query information from a search requestor before a searcher completely inputting the query.
Information associated with previous searchers, could include click data associated with search results. Finally, a set of predicted queries can provide the searcher to present the query information and the previous search information.
The patent is at:
Autocompletion using previously submitted query data
Inventors: Michael Herscovici, Dan Guez, and Hyung-Jin Kim
Assignee: Google Inc.
US Patent: 9,740,780
Granted: August 22, 2017
Filed: December 1, 2014
A computer-implemented method for processing query information includes receiving query information at a server system. The query information includes a portion of a query from a search requestor. The method also includes obtaining a set of predicted queries relevant to the portion of the search requestor query based upon the portion of the query from the search requestor and data indicative of search requestor behavior relative to previously submitted queries. The method also includes providing the set of predicted queries to the search requestor.
The “Detailed Description” section of this search suggestions patent provides some insightful analysis about search at Google.
Relevance and Backlinks and a Rank Modifying Engine Lead to Ranking For Many Results at Google
This patent points out some of how search works at Google. It tells us that:
- The purpose of the patent is to “improve the relevance of results obtained from submitting search queries.”
- It describes ranking documents for a query as something that can be “performed using traditional techniques for determining an information retrieval (IR) score for indexed documents because of a given query.” And the relevance of a particular document about a query term may look at the general level of back-links to a document containing matches for a search term to infer a document’s relevance. As the patent tells us:
In particular, if a document links to (e.g., is the target of a hyperlink) by many other relevant documents (e.g., documents that also contain matches for the search terms), it can infer that the target document is particularly relevant. This inference is because the authors of the pointing documents presumably point, for the most part, to other documents that are relevant to their audience.
- We have more details about some more relevant results than ones with backlinks. We know that:
If the pointing documents are in turn the targets of links from other relevant documents, they can be more relevant, and the first document can be particularly relevant because it is the target of relevant (or even highly relevant) documents. Such a technique may be the determinant of a document’s relevance or one o multiple determinants. The technique works in some systems that treat a link from one web page to another to indicate quality for the latter page. The page with the most such quality indicators is higher than others. Appropriate techniques can identify and eliminate attempts to cast false votes to drive up the relevance of a page artificially.
- There is another step that could potentially make some results even more relevant that involve a rank modifier engine:
To further improve such traditional document ranking techniques, the ranking engine can receive an additional signal from a rank modifier engine to assist in determining an appropriate ranking for the documents. The rank modifier engine provides one or more prior models or one or more relevance measures for the documents based on one or more prior models. The ranking engine can use to improve the search results’ ranking provided to the user. In general, a prior model represents a background probability of document result selection given the values of many selected features, as described further below. The rank modifier engine can perform one or more of the operations described below to generate one or more prior models or one or more relevance measures based on one or more prior models.
The Use of a Rank Modifying Engine
This is a more detailed description of ranking than we normally see at Google. The section above references a Rank Modifier Engine described in more detail further down this post
Indexing, Scoring, Ranking, and Using a Rank Modifier Engine
The information retrieval system from this patent includes many different components:
- Indexing engine
- Scoring engine
- Ranking engine
- Rank modifier engine
A scoring engine may provide scores for document results based on many different features including:
- Content-based features that link a query to document results
- query-independent features that generally state the quality of document results
Content-based features include aspects of document format, such as query matches to a title or anchor text in an HTML (HyperText Markup Language) page.
The query-independent features can include document cross-referencing, such as a rank of the document or the domain.
Moreover, the particular functions used by the scoring engine can get tuned, adjust the various feature contributions to the final IR score using automatic or semi-automatic processes.
A ranking engine can produce a ranking of document search results for display to a searcher based on IR scores received from the scoring engine and possibly one or more signals from the rank modifier engine.
Logged selection information could capture for each selection
- Query (Q)
- Document (D)
- Time (T) on the document
- Language (L) employed by the user
- Country (C) where the user is likely located (e.g., based on the server used to access the IR system).
Recorded information about a searcher’s interactions with presented rankings
- Negative information, such as presented document results that were not clicked on
- Position(s) of click(s) in the user interface
- IR scores of clicked results
- IR scores of all results shown before the clicked result
- Titles and snippets shown to the user before the clicked result
- The user’s cookie
- Cookie age
- IP (Internet Protocol) address
- User agent of the browser
More recorded information (as described in this post below) about building a prior model.
Rank Modifier Engine
Similar recorded information (e.g., IR scores, position, etc.) for an entire session or many sessions, including every click that occurs both before and after a current click.
Stored Information in the result selection logs used by the rank modifier engine generates one or more signals to the ranking engine.
The stored information in the search results selection logs and the information collected by the tracking component may also be accessible by a search assistant, which is also a component of the information retrieval system.
Along with receiving information from these components, the search assistant could also watch a user’s search query entry.
On receiving a partial search query, the query along with the information (e.g., click data) from the tracking component and the results selection log(s) may predict a searcher’s contemplated complete query.
Based on this information, predictions may work according to one or more ranking criteria before assisting the searcher in completing the query.
Presentation of a Search Suggestion
As a searcher enters a search query, the searcher’s input gets watched.
Before a searcher signals, they have completed entering the search query, a part of the query goes to the search engine.
Also, data such as click data (or other previously collected information) may go with the query portion.
The part of the query sent may be:
- A few characters
- A search term
- More than one search term
- Any other combination of characters and terms
The search engine receives the partial query and the data (e.g., click data) for processing and makes predictions) about the searcher’s contemplated complete query.
Relevant information may work with processing with the received partial query to produce search suggestions predictions.
Predictions may work according to one or more ranking criteria.
So, queries submitted at a higher frequency may be higher than queries submitted at lower frequencies.
Ranking And Ordering Predicted Queries as Search Suggestion
The search engine may also use various types of information for ranking and ordering predicted queries as search suggestions.
Information about previously entered search queries may be used to make ordered predictions.
Previous queries may include search queries associated with the same user, another user, or a community of users.
If one of the predicted queries is what the searcher intended as the desired query, the searcher may select that predicted query and proceed without having to finish entering the desired query.
Or, if the predicted queries do not reflect what the searcher had in mind, then the searcher can continue entering the desired search query, which could trigger one or more other sets of search suggestions.
Ranking User Submitted Previous Queries as Search Suggestions
A few different processes may rank and order predicted search queries:
- Ordered predicted search queries following frequency of submission by a community of users
- Using time constraints with search queries ordered under the last time/date value of the query
- Using personalization information or community information about subjects, concepts, or categories of information of interest to the searcher (from prior search or browsing information)
- Personalization from an associated group of the searcher or belonging to (a member or an employee.)
- According to first ranking criteria, such as predefined popularity criteria, and then possibly reordered if any of the predicted search queries match the user personalization information of the user, to place the matching predicted search queries at or closer to the top of the ordered set of predicted search queries
- Using Information provided by the tracking component and the result selection log(s) for ranking and ordering the predicted search queries. (click data, language-specific, and country-specific data.)
- Using processed click data (e.g., aggregated click data for a given query) for ranking and ordering predicted search queries – or each query, a score may from summing click data (e.g., weighted clicks, etc.) on documents associated with the query, and predicted queries might be from the score (e.g., higher values representing better)
An Information Model Based On Earlier Submitted Query Data to Obtain Search Suggestions Predictions
This model can predict query data that may satisfy a searcher the most by looking at long-click information. For example, a timer can track how long a user views or “dwells” on a document.
That amount of time is “click data.”
More time dwelling on a document is called a “long click.” It indicates a searcher found a document relevant to their query.
A brief time viewing a document is a “short click.” It is interpreted as a lack of document relevance.
Click data counts each click type (e.g., long, medium, short) for a particular query and document combination.
That click data from model queries for a given document can create a quality of result statistic for that document to enhance a ranking of a document.
Quality of result statistics can be a weighted average of long clicks for a given document and query.
This description from the patent tells us about how click data might be in tuples:
A search engine (e.g., the search engine) or other processes may create a record in the model for documents that users select in response to a query or a partial query. Each record within the model (herein referred to as a tuple:
) is at least a combination of a query submitted by users, a document reference selected by users in response to that query, and aggregation of click data for all users that select the document reference in response to the query. The aggregate click data can be an indication of document relevance. In various implementations, model data can be location-specific (e.g., country, state, etc.) or language-specific. For example, a country-specific tuple would include the country from where the user query originated in whereas a language-specific tuple would include the language of the user query. Other extensions of model data are possible.
Post-Click Behavior May Also Be From the Tracking Component
The model may also include Post-click behavior tracked by the tracking component.
This patent includes information about how Google may use click tracking data when ranking search suggestion predictions. It tells us about collected data about clicks:
The information gathered for each click can include:
(1) Query (Q) the user entered,
(2) Document result (D) the user clicked on,
(3) Time (T) on the document,
(4) Interface language (L) (which can from the searcher),
(5) Country (C) of the user (identified by the host that they use, such as www-store-co-uk in the United Kingdom), and
(6) More aspects of the user and session.
Time (T) can measure as the time between the initial click-through to the document result until the searcher returns to the main page and clicks on another document result.
An assessment about the time (T) and whether it indicates a longer view of the document result or a shorter view of the document result. This is true since longer views are generally indicative of quality for the click-through result. This assessment about the time (T) can further be in conjunction with various weighting techniques.
Beyond Long Clicks
Document views from the selections can work based on viewing length information to produce weighted views of the document result.
So, rather than distinguishing long clicks from short clicks, a wider range of click-through viewing times can work in assessing result quality. When this happens, longer viewing times in the range gain more weight than shorter viewing times.
Predicted Search Suggestions
Google will sometimes display search suggestions using autocomplete and also based upon user data from previous queries from a searcher’s previous search history. Or from the history of someone the searcher may associated with, such as a fellow member of an organization or a co-worker.
While results related to those previous queries can rank based upon relevance and backlinks, the search suggestions may include results that searchers spent long clicks upon, including long viewing.
So under this patent, predictions about search suggestions chosen using autocomplete may best meet a searcher’s informational needs. They can do this by being searches that include results remembered as resulting in long clicks and long viewing times.