How Google Might Make Better Synonym Substitutions Using Knowledge Base Categories

Shea Stadium
Leigh Miller – Yankee Stadium, francis_leigh, Some rights reserved

A couple of months ago, I wrote about a Google patent that involved rewriting queries, titled Investigating Google RankBrain and Query Term Substitutions. There’s likely a lot more to how Google’s RankBrain approach works, but I came across a patent that seems to be related to the patent I wrote about in that post, and thought it was worth sharing and starting a discussion about. The patent I wrote about in that post was Using concepts as contexts for query term substitutions. The title for this new patent was very similar to that one (Synonym identification based on categorical contexts), and the more recent patent was granted on December 1st of this year.

The new patent starts off describing a scenario that is a good example of how it works. The inventors tell us:

For example, learning that “restaurants” is a good synonym for “food” in the query [food in San Francisco] is relatively straightforward, because the volume of query traffic including the query term “San Francisco” is very large. For much smaller cities, such as Grey Bull, Wyo., the query stream may have never seen any supporting evidence for this synonym substitution.

That both cities are entities that fit into the same category, that of “Cities” means that they could potentially be good synonyms for each other. That’s what the inventors of this patent tell us specifically, using the San Francisco and Grey Bull example:

For example, if “San Francisco” and “Grey Bull” are both cities, and “restaurants” is a good synonym for “food” in queries about San Francisco, the synonym relationship may apply to queries related to “Grey Bull” as well. Thus, the category “city” may be considered a useful category when identifying synonyms for query expansion in circumstances such as this.

So, we are told that the process involved in this patent is to identify categories from a knowledge base involving a number of entities where other entities within that same category could potentially be synonyms for each other in similar contexts. The process from the patent involves identifying those entities from a query stream, and identifying the category as one that they call a “coherent” category.

The patent tells us that a coherent category is one in which a certain threshold of terms tend to co-occur in a query stream involving those entities. The patent tells us, for instance that a category that might include entities that are cities, villages, and towns might see a lot of co-occurring terms involving hotels and roads. If the number of co-occurring terms appearing in that query stream meet a certain threshold, it would be considered a coherent category, and the entities from the same categories could possibly then be used as synonyms for each other.

The patent in question is:

Synonym identification based on categorical contexts
Invented by: Zachary A. Garrett, Takahiro Nakajima, Tasuku Oonishi
Assignee: Google
US Patent 9,201,945
Granted December 1, 2015
Filed: March 8, 2013


Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training recognition canonical representations corresponding to named-entity phrases in a second natural language based on translating a set of allowable expressions with canonical representations from a first natural language, which may be generated by expanding a context-free grammar for the allowable expressions for the first natural language.

Take Aways

When I wrote about the query term substitution patent I refer to at the start of this post, I included a number of examples of queries that were re-written based upon some substitutions of query terms that might seem reasonable to a search engine looking at words that tended to show up, or co-occur, in a query stream involving those search terms.

For instance, someone searching for [New York Yankees stadium] was likely searching for results that involved “baseball” since queries that included “New York Yankees” and “stadium” also often included the term “baseball.”

That patent didn’t use the term “co-occur” nor did it explain how a knowledge base might be used to substitute entities that might be in the same categories like this one does, but the idea that a shared context like entity categories can be used to trigger entity substitutions in a query is interesting.

It’s worth spending time with both patents and reading through each of them multiple times and thinking about how they are being used.

34 thoughts on “How Google Might Make Better Synonym Substitutions Using Knowledge Base Categories”

  1. Interesting thought Bill! It applies more on the context which stresses on making a single page more richer in information than trying to divide things into pages.

    The landing page to targeted keyword theory I believe will slowly start to become less relevant and it would be more of a single landing page rich enough to target what you could think 🙂

  2. I appreciate everything you have added to my knowledge base. Admiring the time and effort you put into your blog and detailed information you offer.

  3. Hi Bill <
    I guees I read your recent blog about which you mention in this blog Investigating Google RankBrain and Query Term Substitutions.
    Bill I would like to know if Google changed the way searching user queries again.

  4. Google plays fair. Some companies spend thousands of dollars on SEO. You can have a great ranking for free if you know what you’re doing. Good piece

  5. I would like to encourage all of you really like your blog. Did you design this website yourself or did you hire someone to do it for you? Please reply as I’m looking to create my own blog and would like to find out where u got this from. Many thanks bookmark this page to your most used service to help get the word out.

  6. Hi Alan,

    Thank you.

    The theme I used is one of the default WordPress themes, though I made a number of tweaks to the CSS of the theme, and searched through the Library of Congress Website for the Japanese prints that I used as rotating masthead images on the site. There are a lot of great images there that are old enough to be in the Public Domain, and I’ll probably make more changes as time passes.

  7. Hey Bill,

    Thank you very much about this blog post.
    As I run my own blog and website –
    I find this most useful.
    Keep up the good work!

  8. Hi Daniel,

    Happy to hear that you find this post useful. I was excited learning more about how Google was working on rewriting queries, and using Knowledge base information to learn about synonyms.

  9. A big thank you for this article. It was a little bit difficult to translate for a french guy but I think I understood and learned a lot by reading this. First in english and hopefully on my new SEO job. Hope my english was not to bad.

  10. Thank You Bill! Your information was very usefull to me and the blogs i run. Still ranking is difficult.

  11. HI
    I just loved reading your articles.

    The best thing which I really like about your articles is, you covers each and every thing in your articles which makes your article more helpful.
    I have seen people love to read those articles more which are easy to understand and can help a lot. And you always write such kind of articles.

    I would also like to suggest you one thing. You should try to keep your paragraphs short so that people can’t scare before reading that paragraphs.
    Short and cute paragraphs increase interest of the reader to read the complete article.

    I hope you wouldn’t mind my suggestion.
    Either way, Thanks for this wonderful article.

  12. Good God! Nearly every day it seems I learn of more ways we are spied on! Thank you, Martin, for bringing this to my attention.

    Other companies doing this besides Silverpush appear to be Adobe, Drawbridge, Flurry (purchased by Yahoo last year), and Tapad.

    The only defenses against this for now seem to be, as you said, muting the microphone, or putting enough physical distance between devices so that audible signals cannot be picked up by the microphones.

  13. Hi Winecooler,

    Silverpush was uncovered by the FTC as being involved in this. I found some other companies doing this type of stuff other than the ones you mention, doing things like combining the audio watermarking with cookies, and using similar tracking. These activities aren’t mentioned much on most SEO sites, and I felt I had to publish something when I say Google publish those patent filings.

  14. great bill, your post has broken my every silenced corner. Admiring your post is very beneficial for every newcomers. i understood. Thanks for your post

  15. Hi Bill Slawski,
    It was a great experience to read your useful post.
    It cleared my all doubts that I had before reading your most useful & precious post on what is google doing for the betterment of the search engine.

  16. google anything done to hopefully make it better, and I appreciate what he has done today

  17. Hi,

    I am not sure how google can patent the English language. They haven’t invented synonyms. This is just adding some sophistication to an otherwise rudimentary search. It will be interesting to see the weighting of an exact match vs synonym match.

  18. Hi Bill,

    First to all great post indeed for ever for me because I am newcomer I also require this type of things Thanks alot, please keep sharing.


  19. I acknowledge all that you have added as far as anyone is concerned base. Appreciating the time and exertion you put into your site and nitty gritty data you offer.

  20. This is a topic near and dear to my heart. When searching for “braces” there are two very different meanings. While orthodontists put on braces there are also neck braces and back braces. Interestingly, when you search “braces” in my city the organic search results are for metal braces for teeth while the images are neck braces.


  21. Hi Derek,

    It is interesting that Google chooses one type of braces for organic search and a different type for image search. You have me wondering from the way you stated that if those results are similar in my location and in other locations. I am seeing braces for teeth in organic results and for images here in San Diego.

  22. I just loved reading your post!!!Much of time u spent for that!! it will informative for all newcomer’s.

    thank you Bill!!!

  23. Hello Bill Slawski, This is my first time i visit here. I found so many useful article in your blog especially this discussion. From the lot of comments on your posts, I guess I am not the only one having all the pleasure here. Thanks

  24. Thanks for the best blog. it was very useful for me.
    keep sharing such ideas in the future as well. Thanks for giving me the useful information.

  25. Hi Techgeeks onsite,

    I enjoyed the ideas in this patent, and could see the value of using the approach being described. Thanks for your kind words.

Leave a Reply

Your email address will not be published. Required fields are marked *