Google introduced a new algorithm by the name of Hummingbird to the world today at the garage where Google started as a business, during a celebration of Google’s 15th Birthday. Google doesn’t appear to have replaced previous signals such as PageRank or many of the other signals that they use to rank pages. The announcement of the new algorithm told us that Google actually started using Hummingbird a number of weeks ago, and that it potentially impacts around 90% of all searches.
It’s being presented as a query expansion or broadening approach which can better understand longer natural language queries, like the ones that people might speak instead of shorter keyword matching queries which someone might type into a search box.
For example, the kind of query where it might potentially work best upon could be something like [What is the best place to find and eat Chicago deep dish style pizza?], where Google might use synonym and substitute query rules in combination with analyzing other non-skip words within the query itself to understand the context of a query term and a potential replacement for that query to reformulate (or replace) the terms being searched upon and provide potentially better results.
Google might look at the query [What is the best place to find and eat Chicago deep dish style pizza?], and understand that a searcher looking for results for that query would likely be more satisfied with the use of “restaurant” instead of “place”.
The use of “restaurant” instead of “place” might be considered as a potential synonym or substitute based upon substitution rules which focus upon co-occurring terms that might show up in search results when those terms are searched upon, or co-occurring terms in query sessions.
Google’s analysis of different search entities such as the relationships between queries might be identified in some cases as improving searcher satisfaction for search results based upon things such as how long someone might dwell on a page when they select it in a set of search results.
Google published a patent this week that builds upon the three patents I mention in the seobythesea.com links above to recent posts I’ve written that describes a process that seems like a very good match for the Hummingbird algorithm announced today:
Synonym identification based on co-occurring terms
Invented by Abhijit A. Mahabal, Takahiro Nakajima, Zachary A. Garrett, and Kenji Inoue
Assigned to Google
US Patent 8,538,984
Granted September 17, 2013
Filed: April 3, 2012
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for:
- Identifying a particular query term of an original search query,
- Identifying a candidate synonym for the particular query term in context with an other non-adjacent query term of the original search query that is not adjacent to the particular query term in the original search query,
- Accessing stored data that specifies, for a pair of terms that includes the particular query term and the candidate synonym of the particular query term, a respective confidence value for the other non-adjacent query term,
- Determining that, in the stored data, the confidence value for the other non-adjacent query term satisfies a threshold, and
- Determining to revise the original search query to include the candidate synonym of the particular query term, based on determining that the confidence value the other non-adjacent query term satisfies the threshold.
The patent tells us that a co-occurrence measure is used to evaluate candidate terms/synonyms pairs based upon how frequently those terms (or compound words or phrases) appear together, or in related user queries (for example, in consecutive queries within a query session) or that tend to appear together in related query results.
Google might consider a number of synonyms from a synonym database, to see how well those fit within the context of the whole query. For example, the terms “car” and “auto” are often considered synonyms, especially when they may appear in queries such as [car mechanic] or [auto mechanic], but might not be considered as synonyms within the context of a query such as [railroad car] and [railroad auto].
It’s unlikely that someone searching for [railroad car] would want to have results for [railroad auto] added to those results or even replaced by them. In my post linked to above about “substitute rules” for queries, similar rules for synonyms can also be created, and both can be used to create that synonym or substitute database. That database can contain data about the level of confidence that terms might be synonyms or substitutes based upon things like co-occurrence data, and whether or not they might be synonyms or substitutes based upon rules involving other terms that might be within the same query.
A patent filed by Google in 2005 covers a lot of the same ground, and is cited by the patent examiner as a related patent – Determining query term synonyms within query context. I wrote a post about it after it was granted, How Google May Expand Searches Using Synonyms for Words in Queries. So the basic ideas behind this kind of query expansion has been floating around Google for a number of years.
While people seem satisfied with typing keywords into a search box, it seems that it’s more common for people to actually abandon their focus on just matching keywords when they perform a spoken query. We’re more likely to see someone typing [chicago style pizza restaurant] into a search box, and someone speaking the query [What is the best place to find and eat Chicago deep dish style pizza?] into their phone.
The patent provides a number of additional examples of how the words with a query might be used contextually to better understand other words that might be replaced within that query with synonyms or substitutes.
It is possible that the Hummingbird algorithm works somewhat differently than what is described in the claims and/or description of this patent, but they seem to be a pretty good match. Is this the Google Hummingbird patent? What do you think?