Entity Associations with Websites and Related Entities

When we talk about how web sites are related, it’s not unusual for us to talk about links between sites and pages. Google pays a lot of attention between such links, and they are at the heart of one of its most well known ranking signal – PageRank. PageRank is now more than 15 years old, predating the origin of Google itself in the BackRub search engine.

Google is exploring other signals that may be used to rank pages in search results, including social signals that may result in reputation scores for authors, in relationships between words that might appear together on pages ranking for the same queries, and in relationships between pages that show up in the same search results and in the same search sessions. The Google paper presented at an October 2013 natural language processing conference, Open-Domain Fine-Grained Class Extraction from Web Search Queries (pdf), provides some interesting hints at a possible Google of the future.

Google also seems to be very interested in building a knowledge base of concepts that better understands things like what different businesses or entities are ‘Known for’ or by defining entities better in ‘is a’ relationships. Sometimes pages for specific entities show up at the top of search results because they seem to be the page that people are looking for when they include that entity within a query, like the first two results on a search for [Roald Dahl], as seen in the image below:

Search results showing authoritative results for Roald Dahl and then results for books he wrote.

A Google patent application published earlier this year also explores drawing connections between different named entities (specific people, places, or things) by looking more closely at how certain entities might be associated with specific websites, and by understanding “related entities” for those original entities.

For example, on a search for “John Wayne,” the official John Wayne website shows up as the top result in Google and the second result is the John Wayne Wikipedia page. It’s possible that those rank well not necessarily because of what we might think of as traditional ranking signals such as PageRank and information retrieval scores based upon relevance, but rather because they are pages that have been identified as authoritative on the entity “John Wayne,” and great responses to those queries as navigational results.

While the Roald Dahl search result from the patent application shows books authored by Roald Dahl, the Knowledge Panel result for John Wayne shows movies that he has starred in, and other people whom searchers also look for when they search for John Wayne.

Knowledge Panel at Google for John Wayne

How similar are the processes for including related entities within a set of search results, and including related entities within a knowledge panel in Google Results? This patent application tells us that it looks at search results to try to identify related entities, while the knowledge panel results also appear to look at query log files as well, to find things that people also search for when they search for an entity that triggers a knowledge panel result. The patent filing is:

Related Entities
Invented by Peter Jin Hong, Pravir K. Gupta, Nathaniel J. Gaylinn, Ramakrishnan Kazhiyur-Mannar, Kavi J. Goel, Omer Bar-or, Jack W. Menzel, Christina R. Dhanaraj, Jared L. Levy, Shashidhar A. Thakur, Grace Chung, and Benson Tsai
US Patent Application 20130238594
Published September 12, 2013
Filed: February 22, 2013

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying entities that are related to an entity to which a search query is directed. One of the methods includes:

  • Receiving a search query, wherein the search query has been determined to relate to a first entity of a first entity type, and wherein one or more entities of a second entity type have a relationship with the first entity;
  • Receiving search results for the search query;
  • Determining that a count of search results identifying a resource containing a reference to the first entity satisfies a first threshold value;
  • Determining that a count of search results identifying a resource having the second entity type as a relevant entity type satisfies a second threshold value; and
  • Transmitting information identifying the one or more entities of the second entity type as part of the response to the search query.

Here’s an abbreviated look at the process described in the patent filing, using images from the patent application:

A flowchart from the patent showing the creation of an association between a query and a web page.

Search results from a query are explored to see whether or not there are authoritative resources for an entity within them. If so, then those results are said to be targeted towards that entity.

Screenshot from the patent showing the identification of related entities for the query.

If the search result titles and snippets also contain related entities, they may be identified and included within a database of related entities.

Screenshot showing the ordering of related entities and their inclusion in a database.

The patent does tell us that these related entities might be presented in a ranked order, and provides some of the signals that could be used to order the related entities. (Note that there’s not a link involved at all.)

Ranking scores for Related entities can be based at least in part on:

  • How often someone searches for the related entity after submitting a query for the first entity.
  • How globally popular the related entity might be (sounds like search volume).
  • How often a recognized reference to the related entity co-occurs in a same previously submitted query as a recognized reference to the original entity.
  • If there is data indicating that two or more of the related entities of the second entity type are members of a set of entities that has a specified order, and matching that order (For example, if the entity is a person with children and the children are usually listed in birth order.)
  • If there is data indicating that two or more of the related entities are better known as being part of a broader entity; and replacing them with the broader entity in ordering of the related entities.

Take-Aways

When Google decides to associate an entity with a particular query, it may also identify whether or not there are “related” entities showing up in those search results within places like titles and snippets, and include those entities within the search results as well. This wouldn’t require matching keywords with the original query or a PageRank analysis.

The patent application shows how this would work within search results, but it seems to be applicable to knowledge panel results as well.

As Google’s knowledge base grows, things like relationships between entities will likely be a part of it.

Share

19 thoughts on “Entity Associations with Websites and Related Entities”

  1. Great analysis Bill

    Adds to “is a” the connections of “is related to”, “is more important than”, “is more popular than” and others… When relating queries to entities to other entities.

    I tied a few of these concepts together in explaining query “expansion” in natural language search, where query analysis depends on a lot of relative questions to determine rankability potential and strength.

    And not a link in sight :-)

    Good stuff

    Cheers

  2. Thanks, Grant

    I’m pretty excited about this movement towards how search results can be expanded by understanding relationships between entities better, and by looking at semantic relationships uncovered in things like search results and in query logs. It’s pretty exciting watching how Google is evolving to take such relationships into account.

    I don’t think we’re going to see links disappear any time soon, nor will matching words in a query with words on potential search results pages go away, but they do seem like they are going to pay a lesser role in the future.

  3. Thanks again for the breakdown Bill – sure saves time when trying to stay on top of things! This is all pretty interesting, and makes a lot of sense. Imo it’s a bit sad where all this has been going, though. It seems to solidify Search as an afterthought or a reaction to what exists. Instead of Search = discovery, Search = delivery and I think a lot of value is lost via that approach.

  4. Links I am sure will be phased out and that would be good for websites as a whole. Lot of time is wasted in building irrelevant links and most of the time manipulation of ranking is done. Great article as usual. Great start to the New Year. Thanks.

  5. Hello Bill,
    Another great article to understand the future of search! No doubt, 2013 was the year of experiment on entity integration in search query by Google and that trend started from the very beginning of last year. Though PR influence in ranking has been greatly diminished but link is still holding major role as a deciding ranking factor. Yes, space for spam links is gradually constricted by Google strongly during the last two Penguin updates in 2013. Understanding entity means to decipher human psychological behavior and I look forward to how Google will successfully work out this through its algorithm.

    Wish you and fellow commenter a great year ahead!

  6. Does this incorporate the use of schema, ie are they using the schema to classify the for example Local Business, and then defining the relationship or relativity to the query?

    Thanks for the great article and all the best for 2014!

  7. Great article, as always, Bill. I’ve always been fascinated by the path and process of the searcher and how their queries evolve, not only in individual sessions, but over time as they become more accustomed to Google’s capabilities as a search engine. Anticipating the needs of the searcher is going to be more important than ever in 2014, but the truth is that those who have been thriving at SEO have long considered its impact and importance.

    My kingdom for access to search query string patterns for my client’s niches! Or for access to any substantial amount of query session data, really. Anyone have any recommendations on books or studies on searcher behavior?

  8. Hey Bill. Thanks for your contributions (again).

    I was reading and thinking .. yeah yeah old stuff, crafted SERPs, ranking only a factor for segments of the page (query-dependent), etc. and then I thought wow Bill has found his niche… “communicating” the relatively complex issues to “everyman”, at the niche (seo industry) level.

    And then the second half of your article highlighted (for me) that many SEO people don’t “see” how today’s Google has monetized them so comprehensively. Specifically, linking.

    SEOs used to argue that without SEOs, Google would have a much harder time knowing the good from the spam. This was because SEOs were “forced” to build increasingly-targeted, increasingly-authoritative content, even to the point where it wasn’t profitable to do so. The past few years have seen a destruction of that that industry (partly because the profit was taken off the table), but some of the BASICS remain fixed in SEO publishing — such as co-citation.

    Everyone starts out linking out to the authorities, to earn a semantic relevance in the eyes of the crawler/classifiers.

    Well, look at your flowchart decision diamond “404” above — if ranking URLs are supporting the authority of an “entity”, then the traffic intent is assumed to be navigational for that entity (or that entity’s URLs).

    The obvious take-away is if WidgetCo ranks dozens of owned pages for Topic A, they all support the idea that searcher wants WidgetCo. That’s SEO strategy from 2012-13.

    Another view: if ranking URLs link to WidgetCo URL as authority for Topic “A”, then user wants WidgetCo (or WidgetCo page with highest relevance for Topic A). SEO strategy from 2011-12 (link networks, blog networks, guest articles, domain stacking, etc).

    So in reality, co-citation (anywhere) pointing to authorities can be used to SUBVERT your ranking position, because (as you note) Google can determine that your own support for the authority of The Entity (and it’s owned pages) means even you agree the user probably wants THEM. There’s only “10” results on page 1, so someone has to go to page 2.

    This is not new… this is part of what has been labeled “brand preference” for over a year. In my opinion it’s also the root of much of the poor-quality SERPs we’ve been served… Google can’t tell criticism from praise. It has a hard time telling a “compare” co-citation from a “contrast” co-citation (and based on my observations, is using domain factors to make that decision).

    So Google uses SEO efforts (to find and associate owned content with most relevant/authoritative resources indexed) as a way to bypass said owned content – a Judo approach to SEO fighting. That’s one of the parts that Google left in place during the SEO attacks of 2013: you’ve been safe to link out to brands without nofollows, partly because Google can use that against you.

    To sum up my lengthy note: co-citation can hurt you.

    Of course it’s strongly query-dependent (intent-dependent) and like most modern SEO, not a simple matter even when dissected into meaningful parts like this. But it’s definitely NOT true that co-citation is either helpful or benign.

  9. Hi Chase,

    You’re welcome. I’m not sure that the value of search as a means of discovery is lost through a process like this. In cases where people are searching for information on topics that they might not know a lot about, surfacing related entities that might show up in search results for the same initial query, or in searchers’ queries during search sessions that happen at the same time as the initial query does provide additional options for searchers to explore if they want to click and do so. This approach seems to open up doors to things that searchers might not have otherwise looked at before.

  10. Hi Dan,

    Thanks! Google’s dependence on links might not go away completely – it’s still an integral part of how pages are ranked on the search engine, even though they might not carry as much weight as they once did. Anyone relying solely upon links may want to consider expanding their marketing strategy to consider other signals as well.

    Looking forward to a fun and interesting new year – hope you have a good one.

  11. Hi Dillip,

    Someone asked me at a search conference in 2007 the question, “what’s new in search” and I mentioned things like phrase-based indexing and named entities. We’re going to likely see an even bigger influence on their parts as we move forward.

    I agree that search engines attempting to decipher human psychological behavior and and how we relate different words together and different entities will bring some significant changes to us.

    Happy New Year to you, too!

  12. Hi Andre,

    Thanks. Google has been applying an understanding of entities to more than just local search, and while it’s not a bad idea to make it easier for the search engines to understand the entities that appear on your pages using things like Schema.org metadata, it’s a process they are working on regardless of that kind of markup.

    You have a great 2014, too.

  13. Hi Robert,

    Trying to get into the heads of searchers is something that both search engines and SEOs are both striving for. At least Google has its query and click logs to look at to give it ideas about what people are looking for, and are using them in ways that might show things like what searchers also tend to look for after they’ve searched for a particular entity.

    We can get some hints from tools like the one at http://ubersuggest.org/ as to other things people are looking for, or the query refinements that Google will often show for particular queries.

  14. Hi Bill,

    So if I am understanding this correctly, Google is going to try to move away from ranking and relying so much on links, and is instead going to rely on Trust Anchors? Your example of John Wayne having the second result being a Wikipedia Page about John Wayne, which I would assume that Wikipedia page has a link to the official John Wayne website on it? this would mean it doesn’t matter if its a follow or dofollow link, because its passing authority vs passing Page Rank or the number of inbound links a domain has?

    If so,then I think this is actually a very good idea on Google’s part, and in some ways justifies their pushing of the Google+ platform which may morph into Google’s version of Wikipedia wrapped with a bit of Social Networking. This may even make more sense with businesses in the Google local, as Google+ pages verified by Google may pass the same type of authority and relationships to the domain or business it is associated with as a Wikipedia trust at some point?

    I am totally new to all of this, but I can tell I am going to learn a lot here on your site. It looks like I have a lot of reading to do. Hope you have a great 2014! :)

  15. Thanks for another informative article. Although Google’s sophistication levels continue to increase, surely effective SEO remains beautiful simple. For me that means produce unique informative content and push it to those who may be interested in it. Hopefully that will remain the best way to being authoritative and relevant.

  16. What an extremely insightful article! Thank you Mr. Slawski for truely shedding some light on Website Entities. Can’t wait for another article as well!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>