Understanding Relationships such as Entity Assocations
When we talk about the relationships between websites, it’s not unusual for us to talk about links between sites and pages. Google pays a lot of attention to such links. They are at the heart of one of its most well-known ranking signals – PageRank. PageRank is more than 15 years old, predating the origin of Google itself in the BackRub search engine.
Google is exploring other signals used to rank pages in search results. These include social signals for reputation scores for authors. They may also look at relationships between words that appear together on pages ranking for the same queries. Also relationships between pages in the same search results and in the same search sessions. A Google paper presented at an October 2013 natural language processing conference, Open-Domain Fine-Grained Class Extraction from Web Search Queries (pdf), provides some interesting hints at a possible Google of the future.
Entity Associations are Part of the Future of SEO
Google wants to build a knowledge base of concepts to better understand things like what different businesses or entities are ‘Known for’. The search engine is also interested in defining entities better in ‘is a’ relationships. Pages for specific entities may show up at the top of search results because they seem to be pages people are looking for when that entity is in a query. For example the first two results on a search for [Roald Dahl], as seen in the image below:
Drawing Connections Between Different Named Entities with Entity Associations
A Google patent application on related entities published earlier this year also explores drawing connections between different named entities. These could be specific people, places, or things. It does this by looking at entity associations with specific websites and understanding “related entities” for those original entities. An entity association is when a specific entity connects with a particular website. This may be because a site is authoritative for that entity. Or because a page from the site is a navigational result for a query that includes that entity.
On a search for “John Wayne,” the official John Wayne website is the top result in Google and the second result is the John Wayne Wikipedia page. Those may rank well not because of traditional ranking signals such as PageRank and information retrieval scores based upon relevance. Instead, because they are pages that are authoritative on the entity “John Wayne,” and great responses to those queries as navigational results.
What is In A Knowledge Panel for An Entity?
While the Roald Dahl search result from the patent application shows books authored by Roald Dahl, the Knowledge Panel result for John Wayne shows movies that he has starred in and shows other people whom searchers also look for when they search for John Wayne, as related entities.
How similar are the processes for including related entities within a set of search results and including related entities within a knowledge panel in Google Results? This patent application tells us that it looks at search results to try to identify related entities. At the same time, the knowledge panel results also appear to look at query log files to find things that people also search for when they search for an entity that triggers a knowledge panel result. The patent filing is:
Invented by Peter Jin Hong, Pravir K. Gupta, Nathaniel J. Gaylinn, Ramakrishnan Kazhiyur-Mannar, Kavi J. Goel, Omer Bar-or, Jack W. Menzel, Christina R. Dhanaraj, Jared L. Levy, Shashidhar A. Thakur, Grace Chung, and Benson Tsai
US Patent Application 20130238594
Published September 12, 2013
Filed: February 22, 2013
Methods, systems, and apparatus, including computer programs encoded on computer storage media, identify entities related to an entity to which a search query goes. One of the methods includes:
- A search query, wherein the search query relates to the first entity of a first entity type, and where entities of a second entity type have a relationship with the first entity;
- Search results for the search query;
- A count of search results identifying a resource containing a reference to the first entity satisfies a first threshold value;
- Search results identifying a resource having the second entity type as a relevant entity type satisfies a second threshold value
- Transmitting information identifying one or more entities of the second entity type as part of the response to the search query.
A Look at the Entity Association Process
Here’s an abbreviated look at the entity associations process described in the patent filing. It uses images from the related entities patent application:
Are There Authoritative Resources for an Entity on the Web?
Search results from a query see whether there are authoritative resources for an entity within them. If so, then those results show for that entity.
If the search result titles and snippets contain related entities, they may be within a related entity database.
The patent does tell us that these related entities might be in ranked order, and it provides some of the signals used to order the related entities. (Note that there’s not a link involved at all.)
Ranking Scores for Related Entities
These scores can be in part on:
- Someone searching for related entities after submitting a query for the first entity.
- whether a recognized reference to related entities co-occur in a same prior submitted query is a recognized reference to the original entity.
- If there is data indicating that two or more of the related entities of the second entity type are members of a set of entities that has a specified order, and matching that order (For example, if the entity is a person with children and the children are usually listed in birth order.)
- when data indicates that two or more of the related entities are better known as part of a broader entity and replacing them with the broader entity in ordering the related entities.
Entity Associations Take-Aways
When Google decides to associate an entity with a particular query, it may also identify whether related entities show up in those search results in places like titles and snippets. It may include those entities within the search results. Again, this wouldn’t need matching keywords with the original query or a PageRank analysis.
The patent application shows how this would work within search results, but it seems to apply to knowledge panel results.
As Google’s knowledge base grows, things like Entity Associations and related entities will continue to be a part of it.
I’ve written a few posts about named entities. These are some that I wanted to share:
- Do You Have a Named Entity Strategy for Marketing Your Web Site?
- How I Came to Love Entities and Start Doing Entity Optimization
- How Google Uses Named Entity Disambiguation for Entities with the Same Names
- How Named Entities Connected to Trending Topics can address real time search results
- Not Brands but Entities: The Influence of Named Entities on Google and Yahoo Search Results
- How Knowledge Base Entities Rank in Searches
- Finding Entity Names in Google’s Knowledge Graph
- Google Gets Smarter with Named Entities: Acquires MetaWeb
- Entity Associations with Websites and Related Entities
- How Google Might Identify Entity Synonyms Using Anchor Text
- Extracting Facts for Entities from Sources such as Wikipedia Titles and Infoboxes
- Extracting Semantic Classes and Corresponding Instances from Web Pages and Query Logs
- How Google May Identify Main Entities
- How Google’s Knowledge Graph Updates Itself by Answering Questions
Last Updated June 26, 2019
19 thoughts on “Entity Associations with Websites and Related Entities”
I’m pretty excited about this movement towards how search results can be expanded by understanding relationships between entities better, and by looking at semantic relationships uncovered in things like search results and in query logs. It’s pretty exciting watching how Google is evolving to take such relationships into account.
I don’t think we’re going to see links disappear any time soon, nor will matching words in a query with words on potential search results pages go away, but they do seem like they are going to pay a lesser role in the future.
Great analysis Bill
Adds to “is a” the connections of “is related to”, “is more important than”, “is more popular than” and others… When relating queries to entities to other entities.
I tied a few of these concepts together in explaining query “expansion” in natural language search, where query analysis depends on a lot of relative questions to determine rankability potential and strength.
And not a link in sight 🙂
Links I am sure will be phased out and that would be good for websites as a whole. Lot of time is wasted in building irrelevant links and most of the time manipulation of ranking is done. Great article as usual. Great start to the New Year. Thanks.
Thanks again for the breakdown Bill – sure saves time when trying to stay on top of things! This is all pretty interesting, and makes a lot of sense. Imo it’s a bit sad where all this has been going, though. It seems to solidify Search as an afterthought or a reaction to what exists. Instead of Search = discovery, Search = delivery and I think a lot of value is lost via that approach.
Does this incorporate the use of schema, ie are they using the schema to classify the for example Local Business, and then defining the relationship or relativity to the query?
Thanks for the great article and all the best for 2014!
Another great article to understand the future of search! No doubt, 2013 was the year of experiment on entity integration in search query by Google and that trend started from the very beginning of last year. Though PR influence in ranking has been greatly diminished but link is still holding major role as a deciding ranking factor. Yes, space for spam links is gradually constricted by Google strongly during the last two Penguin updates in 2013. Understanding entity means to decipher human psychological behavior and I look forward to how Google will successfully work out this through its algorithm.
Wish you and fellow commenter a great year ahead!
Hey Bill. Thanks for your contributions (again).
I was reading and thinking .. yeah yeah old stuff, crafted SERPs, ranking only a factor for segments of the page (query-dependent), etc. and then I thought wow Bill has found his niche… “communicating” the relatively complex issues to “everyman”, at the niche (seo industry) level.
And then the second half of your article highlighted (for me) that many SEO people don’t “see” how today’s Google has monetized them so comprehensively. Specifically, linking.
SEOs used to argue that without SEOs, Google would have a much harder time knowing the good from the spam. This was because SEOs were “forced” to build increasingly-targeted, increasingly-authoritative content, even to the point where it wasn’t profitable to do so. The past few years have seen a destruction of that that industry (partly because the profit was taken off the table), but some of the BASICS remain fixed in SEO publishing — such as co-citation.
Everyone starts out linking out to the authorities, to earn a semantic relevance in the eyes of the crawler/classifiers.
Well, look at your flowchart decision diamond “404” above — if ranking URLs are supporting the authority of an “entity”, then the traffic intent is assumed to be navigational for that entity (or that entity’s URLs).
The obvious take-away is if WidgetCo ranks dozens of owned pages for Topic A, they all support the idea that searcher wants WidgetCo. That’s SEO strategy from 2012-13.
Another view: if ranking URLs link to WidgetCo URL as authority for Topic “A”, then user wants WidgetCo (or WidgetCo page with highest relevance for Topic A). SEO strategy from 2011-12 (link networks, blog networks, guest articles, domain stacking, etc).
So in reality, co-citation (anywhere) pointing to authorities can be used to SUBVERT your ranking position, because (as you note) Google can determine that your own support for the authority of The Entity (and it’s owned pages) means even you agree the user probably wants THEM. There’s only “10” results on page 1, so someone has to go to page 2.
This is not new… this is part of what has been labeled “brand preference” for over a year. In my opinion it’s also the root of much of the poor-quality SERPs we’ve been served… Google can’t tell criticism from praise. It has a hard time telling a “compare” co-citation from a “contrast” co-citation (and based on my observations, is using domain factors to make that decision).
So Google uses SEO efforts (to find and associate owned content with most relevant/authoritative resources indexed) as a way to bypass said owned content – a Judo approach to SEO fighting. That’s one of the parts that Google left in place during the SEO attacks of 2013: you’ve been safe to link out to brands without nofollows, partly because Google can use that against you.
To sum up my lengthy note: co-citation can hurt you.
Of course it’s strongly query-dependent (intent-dependent) and like most modern SEO, not a simple matter even when dissected into meaningful parts like this. But it’s definitely NOT true that co-citation is either helpful or benign.
Great article, as always, Bill. I’ve always been fascinated by the path and process of the searcher and how their queries evolve, not only in individual sessions, but over time as they become more accustomed to Google’s capabilities as a search engine. Anticipating the needs of the searcher is going to be more important than ever in 2014, but the truth is that those who have been thriving at SEO have long considered its impact and importance.
My kingdom for access to search query string patterns for my client’s niches! Or for access to any substantial amount of query session data, really. Anyone have any recommendations on books or studies on searcher behavior?
This is great information Bill, as always thanks for sharing your insight.
You’re welcome. I’m not sure that the value of search as a means of discovery is lost through a process like this. In cases where people are searching for information on topics that they might not know a lot about, surfacing related entities that might show up in search results for the same initial query, or in searchers’ queries during search sessions that happen at the same time as the initial query does provide additional options for searchers to explore if they want to click and do so. This approach seems to open up doors to things that searchers might not have otherwise looked at before.
Thanks! Google’s dependence on links might not go away completely – it’s still an integral part of how pages are ranked on the search engine, even though they might not carry as much weight as they once did. Anyone relying solely upon links may want to consider expanding their marketing strategy to consider other signals as well.
Looking forward to a fun and interesting new year – hope you have a good one.
Someone asked me at a search conference in 2007 the question, “what’s new in search” and I mentioned things like phrase-based indexing and named entities. We’re going to likely see an even bigger influence on their parts as we move forward.
I agree that search engines attempting to decipher human psychological behavior and and how we relate different words together and different entities will bring some significant changes to us.
Happy New Year to you, too!
Thanks. Google has been applying an understanding of entities to more than just local search, and while it’s not a bad idea to make it easier for the search engines to understand the entities that appear on your pages using things like Schema.org metadata, it’s a process they are working on regardless of that kind of markup.
You have a great 2014, too.
Trying to get into the heads of searchers is something that both search engines and SEOs are both striving for. At least Google has its query and click logs to look at to give it ideas about what people are looking for, and are using them in ways that might show things like what searchers also tend to look for after they’ve searched for a particular entity.
We can get some hints from tools like the one at http://ubersuggest.org/ as to other things people are looking for, or the query refinements that Google will often show for particular queries.
You’re welcome, Alex.
So if I am understanding this correctly, Google is going to try to move away from ranking and relying so much on links, and is instead going to rely on Trust Anchors? Your example of John Wayne having the second result being a Wikipedia Page about John Wayne, which I would assume that Wikipedia page has a link to the official John Wayne website on it? this would mean it doesn’t matter if its a follow or dofollow link, because its passing authority vs passing Page Rank or the number of inbound links a domain has?
If so,then I think this is actually a very good idea on Google’s part, and in some ways justifies their pushing of the Google+ platform which may morph into Google’s version of Wikipedia wrapped with a bit of Social Networking. This may even make more sense with businesses in the Google local, as Google+ pages verified by Google may pass the same type of authority and relationships to the domain or business it is associated with as a Wikipedia trust at some point?
I am totally new to all of this, but I can tell I am going to learn a lot here on your site. It looks like I have a lot of reading to do. Hope you have a great 2014! 🙂
well great post thanks for sharing
Thanks for another informative article. Although Google’s sophistication levels continue to increase, surely effective SEO remains beautiful simple. For me that means produce unique informative content and push it to those who may be interested in it. Hopefully that will remain the best way to being authoritative and relevant.
What an extremely insightful article! Thank you Mr. Slawski for truely shedding some light on Website Entities. Can’t wait for another article as well!
Comments are closed.