Entity Associations with Websites and Related Entities

Sharing is caring!

Understanding Relationships such as Entity Assocations

When we talk about how websites are related, it’s not unusual for us to talk about links between sites and pages. Google pays a lot of attention to such links. They are at the heart of one of its most well-known ranking signals – PageRank. PageRank is more than 15 years old, predating the origin of Google itself in the BackRub search engine.

Google is exploring other signals used to rank pages in search results. These include social signals for reputation scores for authors. They may also look at relationships between words that appear together on pages ranking for the same queries. Also relationships between pages in the same search results and in the same search sessions. A Google paper presented at an October 2013 natural language processing conference, Open-Domain Fine-Grained Class Extraction from Web Search Queries (pdf), provides some interesting hints at a possible Google of the future.

Entity Associations are Part of the Future of SEO

Google is also interested in building a knowledge base of concepts to better understand things like what different businesses or entities are ‘Known for’. Google is also interested in defining entities better in ‘is a’ relationships. Pages for specific entities may show up at the top of search results because they seem to be pages people are looking for when that entity is included in a query, like the first two results on a search for [Roald Dahl], as seen in the image below:

Search results showing authoritative results for Roald Dahl and then results for books he wrote.

Drawing Connections Between Different Named Entities with Entity Associations

A Google patent application on related entities published earlier this year also explores drawing connections between different named entities (specific people, places, or things) by looking at entity associations with specific websites and by understanding “related entities” for those original entities. An Entity Association is when a specific entity is connected with a particular website. This may be because a site is considered authoritative for that entity, or a page from the site is considered a navigational result for a query that includes that entity.

On a search for “John Wayne,” the official John Wayne website is the top result in Google and the second result is the John Wayne Wikipedia page. Those may rank well not because of traditional ranking signals such as PageRank and information retrieval scores based upon relevance, but rather because they are pages that are authoritative on the entity “John Wayne,” and great responses to those queries as navigational results.

While the Roald Dahl search result from the patent application shows books authored by Roald Dahl, the Knowledge Panel result for John Wayne shows movies that he has starred in, and other people whom searchers also look for search for John Wayne, who are considered related entities.

"Knowledge Panel at Google for John Wayne and related entities

How similar are the processes for including related entities within a set of search results, and including related entities within a knowledge panel in Google Results? This patent application tells us that it looks at search results to try to identify related entities. At the same time, the knowledge panel results also appear to look at query log files and find things that people also search for when they search for an entity that triggers a knowledge panel result. The patent filing is:

Related Entities
Invented by Peter Jin Hong, Pravir K. Gupta, Nathaniel J. Gaylinn, Ramakrishnan Kazhiyur-Mannar, Kavi J. Goel, Omer Bar-or, Jack W. Menzel, Christina R. Dhanaraj, Jared L. Levy, Shashidhar A. Thakur, Grace Chung, and Benson Tsai
US Patent Application 20130238594
Published September 12, 2013
Filed: February 22, 2013


Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying entities that are related to an entity to which a search query is directed. One of the methods includes:

  • Receiving a search query, wherein the search query has been determined to relate to the first entity of a first entity type, and wherein one or more entities of a second entity type have a relationship with the first entity;
  • Receiving search results for the search query;
  • Determining that a count of search results identifying a resource containing a reference to the first entity satisfies a first threshold value;
  • Determining that a count of search results identifying a resource having the second entity type as a relevant entity type satisfies a second threshold value; and
  • Transmitting information identifying one or more entities of the second entity type as part of the response to the search query.

Here’s an abbreviated look at the entity associations process described in the patent filing, using images from the related entities patent application:

A flowchart from the patent showing the creation of an association between a query and a web page.

Search results from a query are explored to see whether there are authoritative resources for an entity within them. If so, then those results are said to be targeted towards that entity.

Screenshot from the patent showing the identification of related entities for the query.

If the search result titles and snippets also contain related entities, they may be identified and included within a related entity database.

Screenshot showing the ordering of related entities and their inclusion in a database.

The patent does tell us that these related entities might be presented in ranked order and it provides some of the signals used to order the related entities. (Note that there’s not a link involved at all.)

Ranking scores for Related entities

These scores can be based at least in part on:

  • How often someone searches related entities after submitting a query for the first entity.
  • How globally popular related entities might be (sounds like search volume).
  • How often a recognized reference to related entities co-occur in a same previously submitted query as a recognized reference to the original entity.
  • If there is data indicating that two or more of the related entities of the second entity type are members of a set of entities that has a specified order, and matching that order (For example, if the entity is a person with children and the children are usually listed in birth order.)
  • If data indicates that two or more of the related entities are better known as part of a broader entity and replacing them with the broader entity in ordering the related entities.

Entity Associations Take-Aways

When Google decides to associate an entity with a particular query, it may also identify whether related entities show up in those search results in places like titles and snippets. It may include those entities within the search results. Again, this wouldn’t require matching keywords with the original query or a PageRank analysis.

The patent application shows how this would work within search results, but it seems to apply to knowledge panel results.

As Google’s knowledge base grows, things like Entity Associations and related entities will continue to be a part of it.

I’ve written a few posts about named entities. These are some that I wanted to share:

Last Updated June 26, 2019.

Sharing is caring!

19 thoughts on “Entity Associations with Websites and Related Entities”

  1. Thanks, Grant

    I’m pretty excited about this movement towards how search results can be expanded by understanding relationships between entities better, and by looking at semantic relationships uncovered in things like search results and in query logs. It’s pretty exciting watching how Google is evolving to take such relationships into account.

    I don’t think we’re going to see links disappear any time soon, nor will matching words in a query with words on potential search results pages go away, but they do seem like they are going to pay a lesser role in the future.

  2. Great analysis Bill

    Adds to “is a” the connections of “is related to”, “is more important than”, “is more popular than” and others… When relating queries to entities to other entities.

    I tied a few of these concepts together in explaining query “expansion” in natural language search, where query analysis depends on a lot of relative questions to determine rankability potential and strength.

    And not a link in sight 🙂

    Good stuff


  3. Links I am sure will be phased out and that would be good for websites as a whole. Lot of time is wasted in building irrelevant links and most of the time manipulation of ranking is done. Great article as usual. Great start to the New Year. Thanks.

  4. Thanks again for the breakdown Bill – sure saves time when trying to stay on top of things! This is all pretty interesting, and makes a lot of sense. Imo it’s a bit sad where all this has been going, though. It seems to solidify Search as an afterthought or a reaction to what exists. Instead of Search = discovery, Search = delivery and I think a lot of value is lost via that approach.

  5. Does this incorporate the use of schema, ie are they using the schema to classify the for example Local Business, and then defining the relationship or relativity to the query?

    Thanks for the great article and all the best for 2014!

  6. Hello Bill,
    Another great article to understand the future of search! No doubt, 2013 was the year of experiment on entity integration in search query by Google and that trend started from the very beginning of last year. Though PR influence in ranking has been greatly diminished but link is still holding major role as a deciding ranking factor. Yes, space for spam links is gradually constricted by Google strongly during the last two Penguin updates in 2013. Understanding entity means to decipher human psychological behavior and I look forward to how Google will successfully work out this through its algorithm.

    Wish you and fellow commenter a great year ahead!

  7. Hey Bill. Thanks for your contributions (again).

    I was reading and thinking .. yeah yeah old stuff, crafted SERPs, ranking only a factor for segments of the page (query-dependent), etc. and then I thought wow Bill has found his niche… “communicating” the relatively complex issues to “everyman”, at the niche (seo industry) level.

    And then the second half of your article highlighted (for me) that many SEO people don’t “see” how today’s Google has monetized them so comprehensively. Specifically, linking.

    SEOs used to argue that without SEOs, Google would have a much harder time knowing the good from the spam. This was because SEOs were “forced” to build increasingly-targeted, increasingly-authoritative content, even to the point where it wasn’t profitable to do so. The past few years have seen a destruction of that that industry (partly because the profit was taken off the table), but some of the BASICS remain fixed in SEO publishing — such as co-citation.

    Everyone starts out linking out to the authorities, to earn a semantic relevance in the eyes of the crawler/classifiers.

    Well, look at your flowchart decision diamond “404” above — if ranking URLs are supporting the authority of an “entity”, then the traffic intent is assumed to be navigational for that entity (or that entity’s URLs).

    The obvious take-away is if WidgetCo ranks dozens of owned pages for Topic A, they all support the idea that searcher wants WidgetCo. That’s SEO strategy from 2012-13.

    Another view: if ranking URLs link to WidgetCo URL as authority for Topic “A”, then user wants WidgetCo (or WidgetCo page with highest relevance for Topic A). SEO strategy from 2011-12 (link networks, blog networks, guest articles, domain stacking, etc).

    So in reality, co-citation (anywhere) pointing to authorities can be used to SUBVERT your ranking position, because (as you note) Google can determine that your own support for the authority of The Entity (and it’s owned pages) means even you agree the user probably wants THEM. There’s only “10” results on page 1, so someone has to go to page 2.

    This is not new… this is part of what has been labeled “brand preference” for over a year. In my opinion it’s also the root of much of the poor-quality SERPs we’ve been served… Google can’t tell criticism from praise. It has a hard time telling a “compare” co-citation from a “contrast” co-citation (and based on my observations, is using domain factors to make that decision).

    So Google uses SEO efforts (to find and associate owned content with most relevant/authoritative resources indexed) as a way to bypass said owned content – a Judo approach to SEO fighting. That’s one of the parts that Google left in place during the SEO attacks of 2013: you’ve been safe to link out to brands without nofollows, partly because Google can use that against you.

    To sum up my lengthy note: co-citation can hurt you.

    Of course it’s strongly query-dependent (intent-dependent) and like most modern SEO, not a simple matter even when dissected into meaningful parts like this. But it’s definitely NOT true that co-citation is either helpful or benign.

  8. Great article, as always, Bill. I’ve always been fascinated by the path and process of the searcher and how their queries evolve, not only in individual sessions, but over time as they become more accustomed to Google’s capabilities as a search engine. Anticipating the needs of the searcher is going to be more important than ever in 2014, but the truth is that those who have been thriving at SEO have long considered its impact and importance.

    My kingdom for access to search query string patterns for my client’s niches! Or for access to any substantial amount of query session data, really. Anyone have any recommendations on books or studies on searcher behavior?

  9. Hi Chase,

    You’re welcome. I’m not sure that the value of search as a means of discovery is lost through a process like this. In cases where people are searching for information on topics that they might not know a lot about, surfacing related entities that might show up in search results for the same initial query, or in searchers’ queries during search sessions that happen at the same time as the initial query does provide additional options for searchers to explore if they want to click and do so. This approach seems to open up doors to things that searchers might not have otherwise looked at before.

  10. Hi Dan,

    Thanks! Google’s dependence on links might not go away completely – it’s still an integral part of how pages are ranked on the search engine, even though they might not carry as much weight as they once did. Anyone relying solely upon links may want to consider expanding their marketing strategy to consider other signals as well.

    Looking forward to a fun and interesting new year – hope you have a good one.

  11. Hi Dillip,

    Someone asked me at a search conference in 2007 the question, “what’s new in search” and I mentioned things like phrase-based indexing and named entities. We’re going to likely see an even bigger influence on their parts as we move forward.

    I agree that search engines attempting to decipher human psychological behavior and and how we relate different words together and different entities will bring some significant changes to us.

    Happy New Year to you, too!

  12. Hi Andre,

    Thanks. Google has been applying an understanding of entities to more than just local search, and while it’s not a bad idea to make it easier for the search engines to understand the entities that appear on your pages using things like Schema.org metadata, it’s a process they are working on regardless of that kind of markup.

    You have a great 2014, too.

  13. Hi Robert,

    Trying to get into the heads of searchers is something that both search engines and SEOs are both striving for. At least Google has its query and click logs to look at to give it ideas about what people are looking for, and are using them in ways that might show things like what searchers also tend to look for after they’ve searched for a particular entity.

    We can get some hints from tools like the one at http://ubersuggest.org/ as to other things people are looking for, or the query refinements that Google will often show for particular queries.

  14. Hi Bill,

    So if I am understanding this correctly, Google is going to try to move away from ranking and relying so much on links, and is instead going to rely on Trust Anchors? Your example of John Wayne having the second result being a Wikipedia Page about John Wayne, which I would assume that Wikipedia page has a link to the official John Wayne website on it? this would mean it doesn’t matter if its a follow or dofollow link, because its passing authority vs passing Page Rank or the number of inbound links a domain has?

    If so,then I think this is actually a very good idea on Google’s part, and in some ways justifies their pushing of the Google+ platform which may morph into Google’s version of Wikipedia wrapped with a bit of Social Networking. This may even make more sense with businesses in the Google local, as Google+ pages verified by Google may pass the same type of authority and relationships to the domain or business it is associated with as a Wikipedia trust at some point?

    I am totally new to all of this, but I can tell I am going to learn a lot here on your site. It looks like I have a lot of reading to do. Hope you have a great 2014! 🙂

  15. Thanks for another informative article. Although Google’s sophistication levels continue to increase, surely effective SEO remains beautiful simple. For me that means produce unique informative content and push it to those who may be interested in it. Hopefully that will remain the best way to being authoritative and relevant.

  16. What an extremely insightful article! Thank you Mr. Slawski for truely shedding some light on Website Entities. Can’t wait for another article as well!

Comments are closed.