A couple of months ago, I wrote about a Google patent that involved rewriting queries, titled Investigating Google RankBrain and Query Term Substitutions. There’s likely a lot more to how Google’s RankBrain approach works, but I came across a patent that seems to be related to the patent I wrote about in that post, and thought it was worth sharing and starting a discussion about. The patent I wrote about in that post was Using concepts as contexts for query term substitutions. The title for this new patent was very similar to that one (Synonym identification based on categorical contexts), and the more recent patent was granted on December 1st of this year.
The new patent starts off describing a scenario that is a good example of how it works. The inventors tell us:
Recently I wrote about Google’s Enriched Results Patent, where Google looked at query terms searched for, and for some of them the search engine returned special “enriched” search results that showed off things such as financial information when the query might have been something like a financial stock market term, such as “GooG” for Google.
At Search Engine Land in 2007, I wrote about Google’s OneBox patent, and much like Google looking for query terms that might return an enriched search result, under the onebox patent, Google might decide among a range of seven different types of search results, including things such as news results, images, videos, local results and others.
At Google’s 15th anniversary celebration last summer, shortly after Hummingbird was introduced, Tamar Yehoshua, Google VP of Search, showed us conversational search at Google by first demonstrating a query asking for “pictures of the Eiffel Tower”, and then following up with the query “How tall is It?”
In that second query, Google had to not only remember the Eiffel Tower was being asked about, but also to recognize the Eiffel Tower when it was being referred to as “it.” That is part of the new “conversational search” that Google is now engaging in, using something know by linguists as a “coreference.” I wanted to write about coreferences to clear up confusion that people might have had about them.
I’ve been exploring some of the different search results that we see at Google, including things such as rich snippets and question-answering results, and came across a couple of patent filings from Google that describe something called “Enriched Results.”
You’ve seen enriched results before. As the first of the patent filings tells us, these results tend to be for things such as:
When I’m looking for something at a search engine, I will often start out with a particular query and then depending upon the kinds of results I see I often change the query terms I use. It appears that Google has been paying attention to this kind of search behavior from people who search like me. A patent granted to Google earlier this month watches queries performed by a searcher during a search session, and may give more weight to the words and phrases used earlier in a session like that, and might give less weight to terms that might be added on as a session continues.
This patent seems like part of an evolution of algorithms from Google that has brought us to their Hummingbird update.
Added 2013-11-10 – Google was granted a continuation version of this same patent (Search queries improved based on query semantic information) on November 5th, 2013, where the claims section has been completely re-written in some interesting ways. It describes using a substitute term for one of the original terms in the query, and using an inverse document frequency count to see how many times that substitute term appears in the result set for the modified version of the query and for the original version of the query. The timing of this update of the patent is interesting. The link below points to the old version of the patent, so if you want you can compare the claims sections.
Back in September, Google announced that they had started using an algorithm that rewrites queries submitted by searchers which they had given the code name “Hummingbird.” At the time, I was writing a blog post about a patent from Google that seemed like it might be very related to the update because the focus was upon re-writing long and complex queries, while paying more attention to all the words within those queries. I called the post, The Google Hummingbird Patent because the patent seemed to be such a good match.
Google introduced a new algorithm by the name of Hummingbird to the world today at the garage where Google started as a business, during a celebration of Google’s 15th Birthday. Google doesn’t appear to have replaced previous signals such as PageRank or many of the other signals that they use to rank pages. The announcement of the new algorithm told us that Google actually started using Hummingbird a number of weeks ago, and that it potentially impacts around 90% of all searches.
It’s being presented as a query expansion or broadening approach which can better understand longer natural language queries, like the ones that people might speak instead of shorter keyword matching queries which someone might type into a search box.
When you search, especially for topics that you know little about, chances are that you might not include the most relevant terms in your query, or you might use words that may have ambiguous meanings.
One of the areas where search engines focus a lot of attention upon is in reformulating queries through query suggestions and query expansion to help searchers better meet their situational and informational needs quickly.
When you search, you might see a number of query suggestions at the bottom of the results that were first returned, like the ones above on a search for [find airedale terrier puppies]. Or a search engine might include synonyms or substitute queries to expand your original query.