How Google May Rewrite Your Search Terms
Within the announcement Google made earlier this year about the Hummingbird update is the search engine might rewrite queries, substituting some terms within them, when they think doing so might improve the results that searches see, and a very recent Google patent describes how Google might use a data driven approach to explore how effective those substitutions might be.
There is a history of Google making changes to queries and results to try to provide better search results.
Titles – In January of 2012, a Google Webmaster Central blog post told us that Google might sometimes change the title of a page in search results if they thought the new title might lead to more clicks and views of a page. While that might not be what the author of a page intended, it shows that Google is trying to make it easier for people to find the information they are searching for. I’ve run across sites where all the pages had the same titles, but unique main headings, and saw Google add the text for the main headings to those titles for each page.
In the screenshot below from a Matt Cutts video on snippets, (page includes some great suggestions for titles as well), Matt offers some unsolicited advice for Starbucks telling them that it might not be a bad idea to replace the word “homepage” in their title with “coffee” since few people probably search for “starbucks homepage”, and many more likely search for “starbucks coffee”.
Queries – Google also has a history of rewriting queries searchers perform if they think that the search results might lead to results that better match the intent behind a query. This can include returning pages that include synonyms or good substitutes. Google also has a long history of showing possible query refinements within search results, and even asking a searcher if they intended something different at the top of a query, especially when they might think that a searcher may have misspelled a query term.
OneBoxes – Google will also display one box results based upon indications that searchers might prefer to see things like definitions or weather boxes or local results in response to some queries as well. It’s not just a matter of which search results are most “relevant” for a query, but rather which results they think searchers might prefer to see based upon a number of factors. These can include click rates that the one box results receive.
Hummingbird is specifically aimed at returning better (higher quality) search results
Google announced the Hummingbird update earlier this year, on their 15th anniversary, and its focus is on rewriting long and complex queries of the type that people might speak on mobile devices, but hidden within that circumstance is an intent to also improve the quality of search results for all queries. As noted during a press conference the day that the update was announced, Search Engine Land’s Danny Sullivan told us:
In particular, Google said that Hummingbird is paying more attention to each word in a query, ensuring that the whole query — the whole sentence or conversation or meaning — is taken into account, rather than particular words. The goal is that pages matching the meaning do better, rather than pages matching just a few words.
How does Google work to return pages that “better match the meaning” of a query? Part of that challenge is in better re-writing a query to uncover such pages. It’s not a matter of finding pages in search results that have some of the words from the original query and looking for pages that might have more high quality links to them, or more Facebook likes, or more Google +’s, or some other kind of “correlation” between ranking signals and ranking search results. It involves ways to try to better interpret the words within the original queries and doing a better job of finding pages that better match the intent behind a search.
Google’s Hummingbird update involves changing some of the words within an original long and complex query to capture the meaning behind those words rather than just returning pages in search results that contain the all the words within the original query. This can be done by looking for synonyms or substitute terms for words within those queries from places like search results or from Query Sessions.
Those substitutes or synonyms within similar contexts might share a lot of similar words in documents returned for them in a Google search, as the words that they are replacing. For example, a search that includes “cat food” within it might be replaced by a search that includes “pet cat food” instead of just “cat food.” If you do a search for each of those terms, many of the same words (referred to in this context as co-occurring words) might show up within the documents that appear as search results for each, like in the screen shot at the top of this post.
But how might Google decide whether or not the rewritten query or placement terms might lead to higher quality results? Do they come closer to matching the intent of the person who performed the search with the pages returned?
A patent from Google granted this week explores how Google might test and investigate rules that they follow in finding substitutes/synonyms for terms in a query when doing that kind of re-writing, to see how well received those are by searchers. The patent is:
Removing substitution rules
Invented by Dan Popovici and Jeremy D. Hoffman
Assigned to Google
United States Patent 8,600,973
Granted December 3, 2013
Filed January 3, 2012
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for removing substitution rules. According to one implementation, a method includes:
- Identifying a revised search query that was revised to include a substitute term of a query term;
- Identifying search results that were generated using the revised search query, wherein each search result references a resource;
- Determining, by one or more computers, that none of the resources referenced by a subset of the search results include the substitute term of the query term; and
- In response to determining that none of the resources referenced by the subset of search results include the substitute term of the query term, incrementing a no-match score for the substitute term.
Google may test these substitute queries by (1) exploring whether or not the substitute terms appears with search results returned, and (2) checking to see if searchers click upon those particular results where the substitutes do appear. The screenshot below is from the patent I called The Hummingbird Patent, and I mentioned this substitution process and substitution rules in my post on The Google Hummingbird Patent?.
The patent tells us that the advantages of using this approach includes:
- Substitute term rules which do not improve search quality can be identified empirically from search result data.
- Substitute term rules which generate only a few additional search results may still be helpful if the users respond to the substitute term rule with positive feedback.
- Specific contexts of which substitute term rules improve search quality can be identified and the general context substitute term rule may be modified accordingly.
Much like Google may test titles that they change based upon whether or not those changes improve the click throughs on those result, or decides whether or not people want to see Onebox results by whether or not people click upon them, this data-driven approach to seeing whether or not synonym or substitute rules for changing queries results in actual clicks can give Google an idea of how helpful those changed queries might be.