How Google Finds ‘Known For’ Terms for Entities

Google finds terms and phrases to associate with entities that can be considered terms of interest for businesses, locations, and other entities. These terms can influence what shows up in search results and in knowledge panels for those entities. Consider it part of a growing knowledge base of concepts, entities, attributes for entities, and keywords that shape the new Google after Hummingbird. Semantics play a role as things that specific entities are known for are identified.

The Red Truck Bakery in Warrenton, Virginia

For example, the Warrenton, Virginia, Red Truck Bakery (local to me) is known for:

  • Great tasting locally roasted coffee
  • Baked goods that include locally grown produce
  • A red truck parked in their lot that originally belonged to Tommy Hilfiger
  • A CIA trained chef who owns the place and was a longtime Art Director for the Smithsonian
  • A communal farmer’s table where townspeople customers share breakfast and lunch.

What are you known for? What are the things that you write about or sell online known for? Or the celebrities that line the inside of weekly tabloids, the figures that have shaped the timeline of our history, or the businesses that fill the yellow pages of local phone books?

google-known-for-system

Google was granted a patent last week that describes how the search engine might process and extract patterns from data it finds on the Web. We’re told that it might make “observations about form, behavior, or the nature of concepts represented by data” that can be used to create useful intelligence about those concepts.

Documents from a body of web pages can be analyzed to identify keywords or categories associated with both those documents and with entities on the Web. The patent mostly focuses upon businesses at locations, but its teachings can be applied to other types of entities as well.

The patent shows “known for” terms in a screen shot of example local search results.

google-known-for-system-2

Actual local search results show them as well, without the “known for” language:

google-known-for-system-3

But in addition to local search results, the patent tells us that these known for terms can be used in other ways, such as within search results as well. The patent is:
Assigning terms of interest to an entity
Invented by Jason Lee, Tamara I. Stern, Gregory J. Donaker, and Sasha J. Blair-Goldensohn
Assigned to Google
United States Patent 8,589,399
Granted November 19, 2013
Filed: March 26, 2012

Abstract

The subject matter of this specification can be embodied in, among other things, a method that includes identifying resources relating to an entity, where each resource includes multiple terms and is included in a corpus of resources relating to multiple entities.

Candidate terms from the resources for potentially associating with the entity and a category associated with the entity are identified. A relative frequency of the candidate terms in the identified resources is compared to a frequency of the candidate terms associated with other entities. Each of the candidate terms are weighted, for example, based on a source of the candidate term and the relative frequency of the candidate term.

A weighted frequency of each candidate term is calculated based on the weights, and candidate terms are selected as representative terms for the entity based on the weighted frequency.

How Terms Associated with an Entity are Identified

A number of web pages about a local business, or a person, or a place or other entity may have terms extracted from them that might potentially be associated with that entity. For instance, as we see in the local search result above for Red Truck Bakery, one of the terms associated with the bakery is “Granola.”

The next steps, according to the patent:

  1. A category associated with the entity is determined (Red Truck Bakery is in the category of Bakeries, for example)
  1. For each of the candidate terms, a frequency with which each candidate term appears in the pages is determined (Such as how frequently the term “granola” shows up in pages returned for “Red Truck Bakery).
  1. The candidate terms are weighted based on the source the term is found in and a relative frequency of the candidate term, wherein the relative frequency is the frequency of the candidate term in the one or more resources (Appearances for “granola” in all pages returned for “Red Truck Bakery) relative to the frequency of the candidate term in a subset of the corpus of resources relating to entities associated with the determined category (Appearances for “granola” in Red Truck Bakery pages relative to Appearances for “granola” in all Bakery pages, with “Granola” showing up much more frequently for the Red Truck Bakery than for other bakeries).
  1. A weighted frequency of each candidate term is calculated based on the assigned weights. One or more of the candidate terms are selected as being representative terms for the entity based on the weighted frequency, and the selected representative terms are associated with the entity in a data repository. Since the Red Truck Bakery makes its own Granola, and it’s not a very common thing that most other Bakeries are known for, it’s considered a term that is “representative” of Red Truck Bakery.

Terms considered to be “related” for one of these associated terms might also then be identified. These terms could be:

  • A term that is at least one of a plural of the first candidate term,
  • A substantially similar semantic variation of the first candidate term,
  • A synonym of the first candidate term, and/or
  • A subphrase of the first candidate term.

Some other terms might not be considered, such as stop words and words that fall into certain pre-defined categories. Those categories can include:

  • Terms that refer to a location of the entity;
  • Terms that are variations of a name of the entity;
  • Contact information associated with the entity;
  • Terms included in a list of stop words associated with a category associated with the entity;
  • Terms that are common in documents associated with the determined category; or
  • Temporal terms.

The patent goes into a lot more detail on how certain terms might be identified or might be purposefully omitted from consideration as one of these “known for” terms that might differentiate entities that are in similar categories.

Quickly, a Thai restaurant might be in a category for Thai restaurants with other Thai restaurants, but might be known for a particular chef, or a special that is only served at that particular restaurant. That chef or that menu special might be understood as terms associated with that particular restaurant (entity), could show up in local search results for that entity, might be seen as keywords specifically associated with a entity, and could potentially appear in a knowledge base result related to the entity as well.

Share

25 thoughts on “How Google Finds ‘Known For’ Terms for Entities”

  1. Excellent as ever Bill. Might this encourage ‘entity farms’ amongst the blacker hat community? Ie. Sites dedicated to creating association and ‘known for’ connections… This might be a good thing if the data is trustworthy – but what if there was a nefarious motive for making associations?

  2. Bill,

    There’s those who are good at what they do, then there are those like yourself. You are a great search professional Bill. I look forward to each article you publish. Here it is 6:20am and I’m reading your blog.

    Thank you for that,
    Gregory Smith

  3. I’m excited about this one. As a digital marketer, the very first thing I talk to potential clients about is “what is unique about you?”
    In other words, what sets you apart from the other 25 similar businesses you call competition?
    Many times, the client knows what makes him/her different or better, but they have never used it in any type of advertising be it digital or traditional. So that’s typically where we start – with differentiating them.
    I’m glad that Google is going to also recognize these efforts – it only makes sense.
    AL

  4. I think its important that things business are known for are also helping them in search terms. If a company is “known for” doing good business people should know this along with the fact that they sell sushi you know.

  5. This definitely seems to be going in line with what hummingbird and the future of google is all about. The knowledge graph and all that jazz. Sure that some people will game the system and might create known for farms but is that any different that spam back links of the past? We will just have to adapt and learn how to spot those fake sites and get rid of them.

  6. Hi David,

    I can’t even begin to imagine someone being able to create a “content farm” that would attempt to manipulate and abuse semantics in a meaningful way that Google couldn’t just blacklist and ignore.

  7. Hi Owen,

    I’m really not convinced that it would be possible to create “known for” farms that could manipulate the semantics of “known for” terms.

  8. Fantastic stuff again, Bill. Thanks for posting this.

    A couple of stray thoughts:

    1. I wonder how long it will be before the “known for” (AKA “at a glance”) snippets actually become integrated into the Knowledge Graph, such that each of those words/phrases is clickable and returns other local search results when you click on it.

    2. At least to me, Google’s apparent need for customer-review content to populate the “known for” snippets – and in turn the Knowledge Graph – partly explains why Google has been pushing Plus reviews so hard. If Google essentially can’t use Yelp’s review content (the way it used to in the “Reviews from around the Web” section of Google Places), then there aren’t a whole lot of popular sites from which Google can scrape tons of review content. So that would be at least one reason to encourage Plus users to be like Yelpers – very prolific.

  9. Regarding the Farms.

    I think a farm is a possibility, though it could be done “White hat”. through inbound channels you can Brand your business for having “The best chocolate cake”. Pushing every content / recipe you put online as part of your strategy with those mentions that you want to build and requesting customers for the mentions in reviews and by that manipulate.

    kinda reminds me optimizing content for keywords. If you think about it thoroughly the “known for” on an entity functions like a keyword to current website. It differs by the “Entity” and the link but those are mostly internal structural changes, not a different idea from the correlations of keywords to a website. (kinda like building a template class in OOP, you can have different var types coming in but it still functions the same).

    Find what the website is about by checking on-page info -> evaluate from off page sources -> calculate total relevancy of the keyword and link to the website if necessary.

  10. Doesn’t Google Hummingbird help a great deal with this then? Everything will be a lot more specific. Even more so than it was.

    I don’t know, Google is a complex beast and does anyone truly understand what’s going on? Weirdly there have been pre-video adverts on YouTube recently by Hotmail about how Google tracks your e-mailing preferences via keywords. So it’s Google allowing adverts… which have a go at Google for being privacy flaunting hawks? I presume this one slipped the the net by accident.

  11. Interesting stuff Bill, I’m really glad you have your finger on the patent pulse. Thanks for the post.

    As a semhead I still find it odd (but understandable) that people want to try to ‘game’ Google. I’m afraid it’s that ‘old school’ working with ‘new thing’ SEO mentality that will hinder people’s search visibility as even more semtech gets employed by the search companies.

    Perhaps not everyone has understood that semtech like ‘entity recognition’ essentially has a self-learning capability, meaning the longer it and other tech like ‘hummingbird’ are used, the easier it becomes to spot the good things your business is known for by other people as well as data spam.

    Serendipity will be your best friend in the SERP’s of tomorrow ;)

  12. Wow, this patent stuff is pretty highfalutin, but I guess it has to be that way.

    Reading point 3, in the context of where it talks a lot about the “candidate term”, couldn’t that term be replaced by keyword or key phrase? Doesn’t what it is saying suggest that consistent use of a word or phrase (or synonyms of) throughout the pages of a website will ultimately influence search results? If that is the case where does this fit into semantic search? Isn’t it saying that whilst the keyword is dead, long live the keyword?

    Obviously what we read here is the wrapper so to speak of the patent, but it does suggest that in writing effective content for the semantic search age, keywords still play an important role. In other words ‘things not strings’ is really ‘things including strings’. Thoughts anyone?

  13. @Bill Slawski
    While it might seem unlikely you never quite know just what people will think up. It might not manifest itself exactly as I’ve described but if it’s one thing, there are those who would find ways to game the system.

    I was more responding to @David Sewell initial comment.

    Either way, I still think that were in exciting times especially for SEO. Google is shaking things up and only the flexible will be able to roll with the updates :)

  14. Hi Bill!

    Another excellent post from you. I just love reading all your articles because they are indeed very informative and helpful, just like this post. I hope to read more of your articles. As a matter of fact, I am your fan! I will definitely share this to friends. Thanks for sharing :)

  15. Great article and it gave me a “ah-ha” moment as Oprah would say. I do a lot of local citation building for my clients and I’ve noticed a trend of these “know for” type of descriptions showing up later. I’m thinking I may have some influence in getting this created. I’ll be paying closer attention from here on out.

  16. Sound interesting and definitely will create good opportunities for local search experts. But I think marketer or business owner should pay more attention on their “zagat” presence, which Google might use more to carve those results. What you view?

  17. Google Hummingbird has had a great affect on search results. This is one of the additions. thanks for the excellent article Sir.

  18. Curious to know whether or not this is going to be a problem in html5. As a small business owner, SEO is all done in house and I am always researching any new leads to help maintain healthy website rank.

    Snippets still need to capture your audience, I’d prefer to create my own through meta data.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>