Some patents and some white papers focus upon search queries, and how a search engine might respond to what a searcher might be looking for, and help to fulfill the informational and situational needs they might have.
It is often tempting to think of search queries in terms of the intent behind them, whether they are informational, transactional or navigational. I recommend that people read the paper A Taxonomy of Web Search by Andrei Broder (who is now at Google) to learn more about the intents behind Search Queries.
With the Web becoming more Semantic and being more about finding entities, it’s possible that looking at search queries to see if they trigger the appearance of an answer box or a featured snippet will become more common.
In general, the subject matter of this specification relates to identifying or generating augmentation queries, storing the augmentation queries, and identifying stored augmentation queries for use in augmenting user searches. An augmentation query can be a query that performs well in locating desirable documents identified in the search results. The performance of an augmentation query can be determined by user interactions. For example, if many users that enter the same query often select one or more of the search results relevant to the query, that query may be designated an augmentation query.
In addition to actual queries submitted by users, augmentation queries can also include synthetic queries that are machine generated. For example, an augmentation query can be identified by mining a corpus of documents and identifying search terms for which popular documents are relevant. These popular documents can, for example, include documents that are often selected when presented as search results. Yet another way of identifying an augmentation query is mining structured data, e.g., business telephone listings, and identifying queries that include terms of the structured data, e.g., business names.
These augmentation queries can be stored in an augmentation query data store. When a user submits a search query to a search engine, the terms of the submitted query can be evaluated and matched to terms of the stored augmentation queries to select one or more similar augmentation queries. The selected augmentation queries, in turn, can be used by the search engine to augment the search operation, thereby obtaining better search results. For example, search results obtained by a similar augmentation query can be presented to the user along with the search results obtained by the user query.
In October of 2015, a new algorithm was announced by members of the Google Brain team, described in this post from Search Engine Land – Meet RankBrain: The Artificial Intelligence That’s Now Processing Google Search Results One of the Google Brain team members who gave Bloomberg News a long interview on Rankbrain, Gregory S. Corrado was a co-inventor on a patent that was granted this August along with other members of the Google Brain team.
In the SEM Post article, RankBrain: Everything We Know About Google’s AI Algorithm we are told that Rankbrain uses concepts from Geoffrey Hinton, involving Thought Vectors. The summary in the description from the patent tells us about how a word vector approach might be used in such a system:
Particular embodiments of the subject matter described in this specification can be implemented so as to realize one or more of the following advantages. Unknown words in sequences of words can be effectively predicted if the surrounding words are known. Words surrounding a known word in a sequence of words can be effectively predicted. Numerical representations of words in a vocabulary of words can be easily and effectively generated. The numerical representations can reveal semantic and syntactic similarities and relationships between the words that they represent.
How Google May Use Synonym Substitutions to Rewrite Queries
A couple of months ago, I wrote about a Google patent that involved rewriting queries, titled Investigating Google RankBrain and Query Term Substitutions. There’s likely a lot more to how Google’s RankBrain approach works, but I came across a patent that seems to be related to the patent I wrote about in that post and thought it was worth sharing and starting a discussion about. The patent I wrote about in that post was Using concepts as contexts for query term substitutions. The title for this new patent was very similar to that one (Synonym identification based on categorical contexts), and the more recent patent was granted on December 1st of this year.
Recently I wrote about Google’s Enriched Results Patent, where Google looked at query terms searched for, and for some of them the search engine returned special “enriched” search results that showed off things such as financial information when the query might have been something like a financial stock market term, such as “GooG” for Google.
At Search Engine Land in 2007, I wrote about Google’s OneBox patent, and much like Google looking for query terms that might return an enriched search result, under the onebox patent, Google might decide among a range of seven different types of search results, including things such as news results, images, videos, local results and others.
At Google’s 15th anniversary celebration last summer, shortly after Hummingbird was introduced, Tamar Yehoshua, Google VP of Search, showed us conversational search at Google by first demonstrating a query asking for “pictures of the Eiffel Tower”, and then following up with the query “How tall is It?”
In that second query, Google had to not only remember the Eiffel Tower was being asked about, but also to recognize the Eiffel Tower when it was being referred to as “it.” That is part of the new “conversational search” that Google is now engaging in, using something know by linguists as a “coreference.” I wanted to write about coreferences to clear up confusion that people might have had about them.
I’ve been exploring some of the different search results that we see at Google, including things such as rich snippets and question-answering results, and came across a couple of patent filings from Google that describe something called “Enriched Results.”
You’ve seen enriched results before. As the first of the patent filings tells us, these results tend to be for things such as: