How Google May Identify Main Entities

Sharing is caring!

A Google patent granted this week describes how Google might try to understand main entities that appear on Web pages, and how that awareness might influence the search results that the search engine shows off in search results.

An entity is a specifically named person, place, or thing (including ideas and objects) that could be connected to other entities based upon relationships between them. Some pages may make certain Entities to be the main entities of a page, while others may include additional information about entities that are related in some manner to those first entities. When some entities appear on pages, they may be presented in an ambiguous manner that doesn’t make them the main topic for the page they appear upon.

Entities are said to exist in a graph that connects them to other entities based upon relationships between them. For instance, Google and Bing are both Search Engines, both internet domains, both employers of many search engineers, and have CEOs, Vice Presidents, Marketing staff, headquarters, data centers, Web indexes. There are a lot of related entities that might show up on Web pages about both.

This view of Entities being related to each other, and belonging to an “Entity Graph” is very similar to what the Microsoft Patent I wrote about recently in How Bing May Expand Queries Based upon Finding Entities Within them. A number of the ideas behind how that patent works and this one are similar in that some knowledge about an entity might cause a search engine to display information about related entities.

This newly granted patent describes how Google may work to identify and understand main entities that appear upon pages, to be able to answer queries or questions about those pages.

The patent starts by trying to understand candidate entities for a page and understand an entity graph for that page and the relationships between those entities, and how those might fit and be best represented in response to a query made by the search engine that chooses that page to respond to the query.

Scoring an identified main entity

This process may start by generating a score for the entities identified as to how central it is to a page or resource. That score may be based at least in part on the weights of outgoing edges of a node corresponding to the particular identified central entity in a second entity graph – in other words, looking at how strong the relationship is between them. This can be done by looking at a count of several times the identified central entity appears in a query log of queries to a search engine, or “upon a frequency of occurrence of the identified central entity from the first resource.”

Advantages described in the patent from this approach include:

  • Entities representing main topics of a resource can be identified
  • Entities representing more peripheral topics can be discarded
  • Topical entities from a resource directed at one topic can be more clearly identified
  • Entities mostly used in a different context than that of the resource (page, site) can be identified
  • A combined entity may be created from identified entities so that their scope is limited to the topics related to the resource.

Displaying Additional Entity Content

A user’s web browsing experience can be enhanced by providing additional content that is interesting and is relevant to resources that are being presented to the user. Because that additional content is generated using only main entities representing topics of the resources, the additional content’s relevance to the first resource and value to the user can be improved. For example, depending on the resource presented to the user, the additional content can include:

  • related video content
  • news content
  • image content
  • web pages
  • price comparison
  • map content
  • business listing content
  • so on

Because so much variety of types of content can be added, the patent tells us that, “the user’s web browsing experience can be improved, and can be adjusted based on their browsing history.”

The patent is:

Identifying central entities
Invented by Tomer Shmiel, Ziv Bar-Yossef, Alexander Sobol, Eran Ofek, Haran Pilpel, Eldad Barkai, Yossi Matias
Assigned to Google
US Patent 9,009,192
Granted April 14, 2015
Filed: June 3, 2011

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for identifying central entities. In one aspect, a method includes obtaining candidate entities for a first resource;

  • Filtering a first entity graph whose nodes represent different entities found in a plurality of resources to remove nodes that do not correspond to a candidate entity, wherein pairs of nodes in the filtered first entity graph that are connected by an edge correspond to pairs of candidate entities that are associated with the same resource;
  • Generating a second entity graph for the first resource from the filtered first entity graph, wherein the second entity graph does not include nodes from the filtered first entity graph that are not connected to other nodes in the filtered first graph; and
  • Identifying candidate entities that are represented by nodes in the second entity graph as being main entities for the first resource.

Take Aways

A knowledge of what entities important to a page can influence what other pages and results that a search engine might show when that page is a good response to a query and can include things like news stories, videos, images, and more. This is how a Knowledge Base can influence search results.

If I search for a popular TV series such as “Arrow” or “The Flash,” the search engine can likely look through a source such as their query log files and identify the people who are important figures acting in the show, important “Characters” from the show, important plot lines, and themes. It might use Knowledgebase information to help inform that knowledge even more. It can identify which entities associated with the shows are central figures, and which might be less important. It may gain a sense of topics that are similar between the shows, such as both having origins in the world of Comic Books, and are written about in many of the same sources.

If you switch that thinking over from TV shows to Business entities, like Microsoft or Google or Apple, again, you could look through a source such as Google’s query log files and knowledge base sources, such as Wikipedia and Freebase and others, and learn more about the entity graph associated with those businesses, and themes, and learn as a search engine, what additional content to display in search results when someone performs a relevant query.

Google Related News
Related News for a story might be identified by looking at the central entity is it about.

An awareness of related entities on a topic that you care about when creating content about it on the Web can be helpful. It’s something that you should likely investigate.

I’ve written a few posts about named entities. These are some that I wanted to share:

Last Updated June 26, 2019.

Sharing is caring!

18 thoughts on “How Google May Identify Main Entities”

  1. Hi Nadia,

    Google being able to look through query log files to see what people searched for before they choose a page, provides them with a lot of information, such as, in this case, examples of entities that might be the most central entities for specific pages on a site.

  2. This is really a major up-gradation in the algorithm, very tough to identify primary entities and establish the relationship with other entities. This would require a really smart bot which could gauge the context of the web-page first and then goes on to identify the entities.

  3. “the user’s web browsing experience can be improved, and can be adjusted based on their browsing history.”

    Am I correct is saying this is leading towards a tailored search experience depending on the user/ machine? As in your results would be different to another user based on your search history?

  4. Hi Bill!

    Really interesting! This is going to be a major algorithm up-gradation for sure.

    Best Regards
    Miraj Gazi

  5. Hi Lee,

    It’s saying that a person’s search history can influence the search results they are shown – so yes, there’s an element of personalization described by the patent’s writers.

  6. Way overdue in my opinion. This will positively touch everything from search to rebranding.

  7. I do like that searches are improving in ways such as this but I almost feel like this would continually bring the user the same page and for multiple searches. Of course the algorithm still could be tweaked to prevent this I suppose?

  8. Hi Trish,

    There are a lot of pages on the web that cover the same topics, but different reasons why some pages and sites rank best for those queries. Google having an approach to help it understand which topics might be the central topic for a page or a site doesn’t necessarily mean that the same page would be the top ranking page in Google’s search results for that query.

  9. Came across this amazing stuff , wanted to say I really enjoyed it . In this article you have shared new and great information that is really helpful for all of us . Thanks for sharing.

  10. Hi Bill, Google are trying to fine tune and as you say “personalise” search results. I’s all about giving the user the best results they can based more specific data that is segmented more and more. Thanks for the article

  11. Google is constantly updating its algorithm and its really important to keep the content as good as possible.Your article adds more insight to that.

  12. Interesting article Bill, I think the idea in principal will be brilliant for many users. I think that search engines need to be careful in that they don’t over do this as it might cause harm to the user’s search if they just provide information which they “think” the user will want to see. There’s also things like how exactly this personalisation will be enforced. I can see how it might work with a user being logged into something like Gmail, but for a family of various ages that share a home computer, this just wont work at all.

  13. Every time i come onto this blog. I get something new to learn. Today again i come to know about this concept. Thank You so, much for sharing these kinds of article.
    This helps me to coming more closer to Google.
    Thanks For This Article.

  14. @Bill,

    When is the next major Algorithm upcoming.Will the next update cause any havoc in SEO industry?

    Mia

  15. Hi Mia,

    I have no answer for you. Google will update when they decide to, and they don’t provide much in the way of warnings about those. Google historically has released 500-600 updates of their algorithms over the past few years, and the next one that impacts a lot of sites will happen when it happens.

Comments are closed.