An Affiliated Page Link is a Link between Pages From the Same Owners
Might Google rank links to pages differently based on a perception of how related or affiliated those pages might be to each other? For instance, if three pages authored by the same person link to a fourth page, and two other pages, each written by other people, also link to that fourth page, should the three links from the same author count as passing along three times as much link weight as the links from the independently written pages?
A patent granted to Google today shows how the search engine might analyze how “affiliated” pages or sites are to each other, and how their degree of affiliation might influence the amount of weight passed along by each link.
So, for instance, a page that has two links pointing to another page might not pass along twice as much link weight as a single link from that page. A site with 20 links from its pages to another page may not pass along 20 times as much link weight as a link from one page would.
There are a few different ways that Google might determine how affiliated pages might be to each other, and the patent provides many examples of how pages or sites might be considered to be affiliated with one another.
Interlinking between pages and sites: For instance, Google might look at all the links between pages on the web, and pages or sites that are more closely interlinked to each other might be considered to be affiliated.
Traffic patterns: Pages or sites that are visited by many users in the same search or browsing session might also be considered to be affiliated.
Similarity of hostnames Pages that share a domain name or are on subdomains of the same domain can be considered affiliated.
Similarity of IP addresses The Internet Protocol or IP addresses of two web servers may be compared, and if the leading two or three components (octets) of the ID address are identical, affiliation may be inferred.
We don’t know if Google is using this method or not, but they may be. The patent was originally filed in 2004, and its inventors include Krishna Bharat, who amongst many other things invented Google News, Amit Singhal, who is Google’s present Head of Search Quality, and Paul Haahr, a co-inventor listed on several Google patents, including one on Information Retrieval based upon Historical Data, another on identifying meaningful stopwords in keywords, how multi-stage query processing might happen at Google, and how to query refinements might be identified.
The patent is:
Determining quality of linked documents
Invented by Krishna Bharat, Amit Singhal, and Paul Haahr
Assigned to Google
US Patent 7,783,639
Granted August 24, 2010
Filed June 30, 2004
A ranking component ranks documents, such as web pages or websites, to obtain a ranking score that defines a quality judgment of the document. The ranking score of a particular document is based on the ranking score of the documents which link to it and based on affiliation among the documents.
To sum up, the concept involved in this patent as succinctly as possible, a ranking score calculated for a page might be based upon a function that may (1) limit the value passed along through links from affiliated pages to some maximum value, while (2) adding independent values from non-affiliated pages.
So, for example, a sitewide link to a page might pass along more value than a single link from the same site. Still, the amount of link weight it would pass along may be capped off as some limited maximum amount because all the links are affiliated since they are on the same domain.
Interestingly, the patent doesn’t mention PageRank, which is the original ranking algorithm based upon looking at links between pages to come up with a query independent rank for those pages developed at Stanford by Google’s founders Larry Page and Sergey Brin, as described in the Stanford patent Method for node ranking in a linked database.
But the process described in this patent does echo a number of the processes described in the PageRank patent, while also including a “set location component” that analyzes documents in a database (such as Google’s index of web pages). It groups those documents into sets of related documents. The patent shows a screenshot of a limited link graph of the Web, with some pages shown as affiliated sets:
Google’s patent does describe many alternative approaches to determine how much of a contribution value might be passed along to the final ranking score of a page being linked to by non-affiliated pages, and the maximum value that affiliated pages might pass along.
This Affiliated Page Link patent was filed the same month in 2004 as Google’s Reasonable Surfer patent, which told us that the weight or contribution of a link on a page to the ranking score of a page being linked to might vary from link to link based upon a score involving features of the link, the page the link appears upon, and the page being linked to.
While there’s nothing within either patent that indicates they might be related to each other, it’s possible that both patents may be, or may have been used by Google.
Google uses many ranking signals, and some are query-dependent signals that depend upon the specific query used to rank pages in search results, such as whether or not the query terms appear upon the pages themselves.
Others are query-independent signals that try to weigh the quality or importance of a page, such as PageRank, which looks at the quality and quantity of links pointing to a page to determine how “important” that page might be.
The Affiliated Page Link approach tries to limit a calculation of the importance score of a page by links pointing to that page based upon whether or not the source of those links is perceived to be independent or not. The patent tells us, for instance, that “additional links by the same author…should not excessively raise the ranking score” of a document.
I’ve seen many people writing about sitewide links from the same domain surmise that those links don’t pass along as much link weight as they might if they were individual links from different domains. That would be likely under this affiliated page link approach. Is it the reason why they might not?
If you are a site owner, what sites might Google consider to be affiliated with your site or sites?
I’ve written a few posts about links. These were ones that I found interesting:
5/30/2006 – Web Decay and Broken Links Can be Bad for Your Site
12/11/2007 – Google Patent on Anchor Text Indexing and Crawl Rates
1/10/2009 – What is a Reciprocal Link?
5/11/2010 – Google’s Reasonable Surfer: How the Value of a Link May Differ Based upon Link and Document Features and User Data
8/24/2010 – Google’s Affiliated Page Link Patent
7/13/2011 – Google Patent Granted on PageRank Sculpting and Opinion Passing Links
11/12/2013 – How Google Might Use the Context of Links to Identify Link Spam
12-10-2014 – A Replacement for PageRank?
4/24/2018 – PageRank Update
Last Updated July 1, 2019.
68 thoughts on “Google’s Affiliated Page Link Patent”
Thank you very much for this extremely educational article. There are so many things to take consideration of when trying to rank in Google.
WOW am glad I stumbled on this article today. This changes quite a few things. I have already forwarded this link to my colleagues and we will definitely have a discussion about this in the office- this can potentially change our seo strategy in the coming days.
I’ve always felt that site wide links, or blogroll links providing hundreds or 1000’s of links wouldn’t give your site the same benefit as a 1000 different links from different domains. I love reading your patent research, you are one of the few who actually break down real data. Thanks for such a great detailed post. 🙂
A detailed explanation, thank you.
It makes sense that some links are stronger than others, and that yes, people have affiliated links from other sites that they own. As a blog creator/helper (I need a better job title) this may impact on the links we gain through client sites if they are hosted by us.
It’s certainly something to think about.
So I guess we can refer to this as the Modulated PageRank Patent.
Is there some truth in becoming part of like minded communities. Offering valuable content to the readers without seeking personal gain. Always update good quality content on your site, and with time and diligence your audience, including google, will find you?
For newbie’s that seems like the best formula. It take time to learn and implement. These articles are helpful, however, sometimes it takes 3,4, or 10 times of reading, then doing, before they start to make sense.
Offer value is the key!
You’re welcome. Definitely there are lots of things to take into consideration. For example, there has been some speculation for a while that Google doesn’t pass along as much in the way of link-based ranking value for things like sitewide links, but no real official word on the topic from Google. While there’s no guarantee that Google has, or will follow this patent, it’s good to see something from the search engine that says they might not give as much weight from links that are affiliated in some manner.
You’re welcome. I know some sites link to the people who designed their sites somewhere on their pages, often in their footers. Are those links enough for Google to consider the sites to be affiliated? I’m not sure, but I’d recommend obtaining links to your site in other ways as well regardless, whether by attracting those by creating great content and blog posts, or submitting to quality directories, or in other ways as well.
I spent some time trying to think of a name for either the patent or the variant or flavor of PageRank that it might be considered. As I mentioned in the post, the patent doesn’t actually even mention PageRank, which surprised me a little.
This patent does have some interesting implications that might suggest altering some strategies towards linkbuilding and SEO. I’ve always believed in pursuing links from a very wide variety of sources, and not relying upon any one approach too much.
It seems like an intuitive thing, that a search engine would be careful about giving too much credit for a sitewide link or a link in a blog roll, but it’s nice to get a statement like this patent that search engineers feel the same way. Thanks.
I think Google is definitely using this, especially with social media these days. It’s real easy for someone to create a bunch of articles and link to their original.
I wonder if ‘time’ is considered here as well. Google’s algorithm might know general time patterns that it takes for documents to link to certain types of new documents. And therefore, if say, 5 documents were created today about “xxxx”, then they may all be the same person/ company promoting themselves.
AS usual, great article and thanks for the insight!
Thanks. It’s quite possible that Google is using this approach. All of the major search engine have admitted at one place or another (patent filings and white papers and blog posts) that they keep an eye on a link graph of the Web, and track changes to links from sites over time, to identify things such as abnormal levels of link growth or other unusual changes. The timing of the creation of new links and new pages might be something that might indicate some affiliation between links.
Creating great content that people want to link to and discuss and refer others to, and being involved in a community of like minded people isn’t a bad start, but it potentially falls short in a number of ways.
It really helps to have a site that is is search engine friendly, and that a search engines crawling programs will be able to crawl and index. It helps to make sure that you use words on your pages that people who might be interested in what you offer will likely use to search for your pages and expect to see on your pages. It helps to intelligently include those words in meaningful places on your pages, such as in page titles, heading, anchor text pointing to the pages, in the content of the pages.
The web is filled with some great noncommercial pages and with some excellent commercial pages, so the concept of whether or not you might be seeking personal gain isn’t really something that search engines are too concerned with when ranking your pages in search results. Though there is the possibility that they might be more likely to send searchers to more commercial oriented sites when there appears to be an intent to buy something, and more informational type sites when there’s an indication of an intent to learn something rather than purchase.
I wrote a post not too long about that addressed your question and some related ones, while discussing what SEO is, and you may find that helpful:
Thank you for the educational piece as usual. However, I must say that there are some additional filters to the sidewide links. Hopefully, I will find some patent on it. I am still stuck at basics of IRS 101.
If your interpretation of this patent is correct, then I have even bigger regrets for switching to VPS hosting instead of shared hosting. With shared hosting, there are multiple IP addresses, so it’s probably better for SEO if you have multiple websites.
It is really surprisng that it doesn’t mention page rank when it’s concept is like page rank. This patent will definitely change some things.
I wouldn’t worry about that. Please check Matt Cutts view on the topic. He has shared a lot of insight about it and one the rings bell is the WebmasterVideos on YouTube. As long as your Host has done his job then there is nothing to worry about. In fact, watch the video as he explains it in detail.
Remember, the fact that most sites out there are running shared hosting. Bluehost alone has million.
Thanks. There are definitely some other things that may influence the impact of sitewide links, such as whether a search engine might decide to “merge” links that it finds on a page to the same page. Regardless of the existence of other filters or patents on the subject, this one is an additional one that we’re first seeing in writing from one of the search engines just now.
The IP address information by itself may not be one of the better indications of whether or not more than one site might be affiliated with another, and it’s possible that it might be looked at in combination with some other factors as well, for Google to make that determination.
I thought it was a little odd that the patent didn’t mention PageRank either, but after spending some time thinking about it, I’m not sure that it really needed to.
Matt has done a number of videos on IP addresses, but I don’t believe this particular topic has been raised or addressed by any of his videos – whether or not the value of links pointing to a page might be influenced by factors that show that the sites where those links come from might be shown to be affiliated in some manner, such as coming from IP addresses that might be very near to each other.
You are right though – most sites are running on shared hosting, and as far as the examples of how sites might be seen to be affiliated with one another, that one may be the weakest of the bunch.
You are right on that, I was referring to shared hosting and ranking. I doubt having your site on a IP with tons of other sites have any impact unless of course if there are 200 porn sites and your is the only non-adult. Which was addressed by Matt in an issue or video I think.
In regards, with the IP thing there has to be something for sure. As I reckon that puts some limitations and control over the SPAM. It is easy to fight a few but when you have billion pages things can get hard and it could be one of the ways for Google to fight many forms of spam(networks and such).
Yes, you are right. These are the factors I think Google uses to see if a page is from the same person:
– IP address
– Whois domain data
– Registration in Webmaster Tools
– Maybe Google Analytics and Google Adsense Data
Of course if you link to your own sites the importance is lower if you ask me.
I perceive this as being about ID’ing blogrolls, sitewides, etc., as well as large multi-server domains that might link to a site multiple times by different authors at different points. Take a large university. Having worked at one for years on the web side, the chem department and the biology dept and the vet school and the physiolgy dept all have a legit reason to link to a site that sells lab testing equipment, like test tubes, but they could have no idea each other did so, because they are located in different places across campus, and on different servers. All four links should be credited, IMO. But, if that same lab testing equipment site has a blogroll link somewhere that produced 612 links last year via the 612 posts, then like Michael M. said, modulate!
Thank you so much for this article. It’s a fact that tecnology and the different algorythms are changing every moment, so I’m glad that your blog always have the last minute updates.
Matt did discuss that very issue, and mentioned that most of the time the sites that you may share an IP address with won’t affect your site – unless like you say the other sites are all spam or porn. I usually check most sites I’m working on using Microsoft/Bings IP search operator – just in case.
Fight spam definitely doesn’t scale well on the Web. I wouldn’t find myself surprised if Google would check IP addresses when analyzing the values that links might pass along, especially since it’s something they necessarily come across when crawling pages anyway.
I’m not sure if Google would look at Whois Domain data, even though they are a registrar themselves. I know that Matt Cutts has stated a number of times that they don’t.
But you’re right, Google does have a number of ways that they could potentially tie sites together based upon things like registration for Google’s tools.
There are a number of legitimate reasons to link to your own sites, including common ownership, and offering complementary services on other sites. I don’t think this patent is really intended to penalize pages and sites as much as it is apply some reason and limits when there are multiple links from pages that are affiliated with other pages and sites.
Thanks for those examples – I agree with you about the purpose behind this patent. As I first started reading it, I was wondering if it would make mention of link spam and excessive reciprocal linking and similar topics, and I was a little surprised that it really didn’t. While it does address those types of issues to a degree, it’s focus does seem to be more upon things like sitewide links in a footer or sidebar, or blogroll links that appear on every page of a site.
University site links are another great example, and many Universities have separate schools or departments with a certain level of autonomy as well. It’s possible though, that they may be seen to be affiliated with each other, since they may be on subdomains, or all link heavily to main University pages, or both, and the value of their links in the example that you provide may be capped off at some maximum value.
Thank you. I don’t know how up to the minute the news about this particular patent might be. It was originally filed a number of years ago. But it possibly explains some things that Google might be doing involving links that we hadn’t really heard about before, from Google itself.
Thank you, I always wonder how Google determines which sites are affiliated. For example, all links from blogger to my blog are considered internal links.
Hey Bill, very impressed that you got this info especially since no one truly knows Google’s search engine algorithm!
Definitely something to think about. There is a subtle difference between “internal links” and “affiliated links,” though internal links (links with the exact same domain) are likely to be affiliated, affiliated links don’t have to be on the same domain. I don’t think that Google would treat all links from “blogger.com” pages as if they were internal links to blogspot blog pages, but it may be possible that they might treat them as affiliated.
When Google shows pages from sub-domains in search results, they usually treat them as if they are different websites, and in many cases they may be under the control of very different people who often aren’t affiliated with each other. For example, many people who blog under subdomains at wordpress.com are likely doing so without any relationship between other bloggers there.
Google gave us a list of four examples they might use in the patent, but I’m guessing that there may be exceptions to a number of those, and there may be other things that they are looking at as well.
Only Google know how it works, for normal website owner, create quality content and links are the only ways to go.
It’s always a bad idea to “share” pagerank between own projects through “affiliate” links. I prefer creating a proper set of independent links for each project so I can avoid global penalties and it’s easier to resell each site.
Another example of links which G. might take as “affiliate” could be an editorial site, with lots of pages of content linking from each content page to its corresponding product in its own store, placed in a subdomain. Sadly, I think it’s my case, but it made sense to link to the product detail page, in order to promote sales.
Thank you for the seriously educational post. In a way, it is frustrating that Google is so complicated to start with and is always evolving the algorithms it uses, but I guess I should be happy because otherwise SEO as a profession wouldn’t be what it is.
While this is anecdotal based on a small sample of competitorsâ€™ backlinks I was checking this morning, I would not totally discount the value of site wide links and affiliated links quite yet. One particularly spammy PR2 site in particular seems to be getting a big benefit from being in the site wide blogroll of a PR5 blog. It is frustrating that other folksâ€™ gray hat linking still seems to works, and mildly surprising that some gray hat linking that I might had some involvement with in the distant past still seems to producing results.
I agree with Eric Ward. This patent in its simplest applications would seem to be a way for Google to identify blogrolls, sitewides, interlinking of multiple company domains, etc. But if you think about it. The application of this patent could go far beyond that. It could be used to identify link wheels, link farms and countless other linking schemes that are the flavor of the day… now and in the future.
I think many of us here have known for some time that sitewide links and blog roll links were likely “devalued” somehow. This patent just gets the wheels turning thinking about all the ways it can be used to tweak a URL’s scores on all kinds of link-related ranking factors.
I could see the results from the application of the patent being used to devalue the amount of PageRank passed to a target URL from sitewide links on another site. It could affect the quality scores for the inbound links used for PR calculations.
The results of the application of this patent could also be used to devalue the importance of link text used in sitewides, blog rolls and other affiliated links. In other words, 1000 sitewide links to URL X with link text “my keyword phrase” would not affect the target URL’s ranking for “my keyword phrase” and derivatives of that phrase nearly as much as 1000 independent links, one from each of 1000 non-affilated sites all using “my keyword phrase” as the link text.
So this could affect query and non-query based ranking factors… essentially almost any link-dependent ranking factor.
Fascinating! Thanks for bringing it to light!
Thanks. It’s hard to know too much about the details behind what the search engines are doing to rank pages if you don’t work for the search engines themselves. But sources like patents and whitepapers from the search engines can provide some hints, which is why I like keeping an eye on them.
If you’re working on sites that you intend to sell, I think you’re right in trying to keep them as independent from each other as possible. I think that makes each more attractive to potential buyers as well.
I’m not sure that I would call what is presented here as a penalty as much as a recognition that not every link should carry as much weight as others, especially when the cost of creating and pointing links to your own pages or pages on sites that you might be affiliated with can be fairly cheap and may not be a good indication of value or quality as the independent views of others who might link to those pages because they find them useful or valuable.
I do think there’s some value in creating informational type content that can be associated with products or services. For one thing, it provides a chance to help people make more informed decisions when they are considering buying something, so there’s some value to consumers. For another, it can help merchants rank for informational type queries, in addition to transactional ones, so it may increase sales.
There may be some diminished value in the weights of multiple links pointing to the same content using such an editorial approach, but there’s still some value in those links, from a link-based ranking approach, and the other benefits to consumers and sellers are a plus as well.
I’m a firm believer in making informed decisions on the basis of the best information that you can find. There’s definitely value in creating quality content and links, but there’s also value in keeping your eyes open, and watching out for things like this patent.
There have been public statements by representatives from the search engines like Matt Cutts that things like sitewide links from a page may not pass along the full value of those links, and that’s something that many people who have been paying careful attention to search engines and search rankings have observed as well.
What’s nice about this patent is that (1) it’s directly from the search engine itself, (2) it validates a lot of the observations from others, (3) it reinforces what we’ve been hearing from people like Matt Cutts, and (4) it gives us an idea of one possible approach that Google may be using to determine the value of links between pages where there is a possible relationship between the people who created those links.
You’re welcome. I’ve been reading a lot about health and nutrition lately, and the roles of vitamins and supplements in health, the many factors that can influence how healthy we are such as allegies, blood types, body types, environmental conditions, and so on. I’ve come to the conclusion that tThe human body is infinitely more complex than a search engine, and it’s something that we have to live with everyday.
But I agree with you – search and search engines are constantly evolving, and that’s part of what makes SEO as interesting for me as it is.
I think that’s a good point. The approach described in this patent doesn’t say that they might devalue sitewide links completely, but rather that those links might not provide an additive value. Instead the value of those links might be capped off at some point.
If you have a sitewide link to a page from a site that has 10 pages, and 5 of those have a pagerank 5, another 3 have a pagerank 4, and the last 2 have a pagerank 2, you may not get the full value of link weight from all 10 pages as you would if each of those links were on other pages that might not be seen as affiliated with each other, but chances are that you would get at least more than the value of a link from a pagerank 5 page.
Hi Canonical SEO,
I agree with Eric as well, and this could be used as a way to help identify affiliated pages that were created primarily to boost rankings such as link farms and link rings.
The patent itself didn’t describe those types of “manipulative” links, nor did it discuss whether or not the value of anchor text might be affected, but I think it wouldn’t be a stretch to say that those kinds of things would be potentially considered as well.
Donnie, I think the question of whether or not Google tracks and/or analyzes timing patterns of posts/links is a great one. That seems like a very logical factor to analyze when trying to determine if a poster is affiliated or not.
We’ve seen timestamps from blog and forum posts appear in Google search results, so they are definitely paying attention to those. Likely one reason for why Google is showing those is to give a potential visitor an idea of when something might have been written, but those timestamps can also possibly be helpful to a search engine in other ways as well.
Thanks for this – I really enjoy indepth takes on what can be very complicated and near abstract topics that are too often mistakenly just breezed over by others in the industry.
You’re welcome. I do believe this topic is one that should be considered by everyone who either has more than one website, which may link to each other, or who has a web site that might be considered by the search engines to be affiliated with other sites based upon either how they might be linked together, or might be visited by many of the same people.
Thanks for your post. That’s true, Google tries to organize the web on his own way : they do find affiliated websites. That’s why it’s better to make your link pointing to others and get links from others too.
You’re welcome. It’s not so much that Google is organizing the Web as it is Google is creating its own index or model of the Web. When we search at Google, we don’t actually search the Web, but rather that index that Google has created. One of the focuses behind that index is attempting to understand relationships between web pages, including the links between them, so their attempts to discover whether or not certain pages are affiliated with one another shouldn’t be surprising.
Since the patent took so long (must have had multiple rejections to take so long) and the nature of the internet has changed so much in that time, do you think that methodology may have morphed over the years into something somewhat different?
I suspect that there’s some expectation on the part of the people filing patents like this one that it will take time for a patent to be granted, and it’s likely that changes will happen on the Web. It’s possible that some of the content within the description of the patent may appear dated, but the basic concepts, about things like the important of links, identifying ownership of sites and relationships between pages that link to each other are still valid concerns.
With any patented process, the processes detailed in the descriptions are likely to have changed – the description isn’t supposed to limit how a patent owner might implement the process described. If the processes in the patent have changed considerably, it’s possible that a Google or Yahoo or Microsoft might file a new patent, or a continuation of the old one that adds some modifications.
It takes 6 years from filing until completion – wow! If I were to purchase 20 websites placed on different servers with links pointing back to only one, there would be a benefit but only if the 19 sites were of relevant content. Of course, you also have to consider the age of the domains since registration and a host of other strategies. Target your content, get people interested and let the World do the hard work. Much like this blog 😉
It’s not unusual for it to take a number of years from when a patent is filed until it’s granted.
I’m not sure that sites being on different servers might make a difference. Instead of looking to see if sites on on the same block of IP addresses (or in addition to it), Google’s taking a close look at the linking patterns between sites, and possibly other data as well.
Hi Bill, very detailed explanation of search engine Algorithm. Now I understand why Google gives more weight to the backlinks from a specific page.
That was a nice Info. Hats off to you and thanks for sharing!!
Thank you for the excellent post. Search engines are constantly evolving, and not all for the better. I found it very educational.
Hi Geek Revealed,
Thanks. This is potentially only part of the analysis that Google may do when it determines how much weight to give a link, but I think it’s an important aspect of the analysis. If Google didn’t diminish the weight that a link from an affiliated page or site might bring, then if someone set out to create lots and lots of related pages solely for the purpose of linking to their other pages, those links together would carry a lot more weight than a link from an independent source given freely based upon a desire to link to something of value.
Thank you, Marianne
The search engines are constanly evolving, and sometimes it isn’t for the better. In this case, I think it is.
Comments are closed.