Google’s Paid Link Patent

There are things that we just don’t know about search engines. Things that aren’t shared with us in an official blog post, or search engine representative speaker’s conference comment, or through a publicly published white paper. Often we do learn some aspects of how search engines work through patents, but the timing of those is controlled more by the US Patent and Trademark Office than by one of the search engines.

For example, back in 2003 Google was filing some of their first patents that identified changes to how their ranking algorithms worked, and among those was one with a name similar to the original Stanford PageRank patents filed by Lawrence Page. It has some hints about PageRank and Google’s link analysis that we haven’t officially seen before.

If you want a bit of a history lesson you can see the first couple of those PageRank patents at Method for scoring documents in a linked database (US Patent 6,799,176) and Method for node ranking in a linked database (US Patent 6,285,999).

The similarly named patent is:

Ranking nodes in a linked database based on node independence
Invented by Paul Haahr, Martin Kaszkiel, Amit Singhal
Assigned to Google Inc.
US Patent 8,719,276
Granted May 6, 2014
Filed January 4, 2011

Abstract

A system includes a ranking component that ranks nodes, such as web sites, to obtain ranking values that define a quality judgment of the nodes. The ranking values are based on links between the nodes and, among other things, deemphasize links between affiliated nodes. Additionally, the amount of rank that any particular node can contribute to another node may be capped at a threshold level, thus tending to prevent some nodes from unduly influencing the ranking values.

This is the fourth version of the patent. It is a continuation of the first one filed in 2003, with the original three officially abandoned by Google. It’s based upon PageRank and it’s aimed at addressing some problems with PageRank. These include people “paying another site, with high rank, to link to the web site.” It’s also aimed at:

In general, any artificial attempts to improve the ranking of a web site by “tuning” the web site to a specific ranking algorithm does not improve the user-perceived quality of the web site and may thus decrease the overall performance of the search engine.

One of the mysteries alluded to Google is the role that Amit Singhal, presently the head of Google search quality, had in the early days of Google.

He supposedly helped re-write Google’s ranking algorithms in 2001 according to articles such as Exclusive: How Google’s Algorithm Rules the Web. What we aren’t told is what changes he implemented in that particular re-write. At this point, it’s still a mystery waiting to be uncovered. Amit Singhal is one of the named inventors on this patent, and it makes some changes to the way that PageRank may be used when ranking pages. Is it the “re-write” of Google’s ranking algorithm?

We don’t know if the changes described in the patent are ones that were implemented by Google, or if Google has held off on making the changes. Google’s persistence in seeing this patent through from being filed in 2003 to the third followup being granted a week or so ago as a continuation patent shows us that they are serious about it being used, though whether or not it has been used over the last decade is something that we can’t be sure of.

Today, Google’s Matt Cutts responded by video to a question from Barry Schwartz, “Was there a key moment in your spam fighting career where you made a mistake that you regret, related to spam?” Interestingly, the answer involves one of the topics that this recently granted patent covers – not fighting paid links more actively.

The patent doesn’t go into a great amount of detail on how it might identify paid links, but we’ve seen Google penalize sites over the years because they are linked to by paid links.

The patent does discuss how it might cluster links from “affiliated” or related sites. These are sites which might “have related or shared organizational control, or otherwise do not appear independent.”

In other words, ranking component may determine that multiple nodes should be clustered when there is a high probability that all of the nodes are controlled by a single entity. Ranking component may automatically classify nodes into clusters based on one or more of a number of possible factors. For example, the determination of affiliation can be based on node graph structure, similarity of node content (e.g., text or structure), ownership records, manually entered information, or other factors.

One implementation for determining affiliated nodes may be based simply on common ownership information as given by a WHOIS search.

The patent also tells us that some nodes or sites might be determined to be a “trusted authority” which might pass along a certain threshold of link value. This calculation of how much of an authority a node might be looks like it’s part of a process very similar to the calculation of PageRank.

It appears that even a node that might be a completely trusted authority might not pass along a full vote, but instead may be limited to a threshold.

The following conclusion in the patent points to different ways that it might limit the weight of links pointed from one node (or site) to other pages when pages are seen to be affiliated with one another.

The calculated ranks reflect a number of desirable properties when ranking nodes based on node quality. Multiple links from affiliated nodes are deemphasized, thus reducing the possible effect on the rank by a single entity, such as a commercial “link farm” attempting to artificially boost the rank of certain nodes.

Additionally, because the maximum vote amount that a single node can contribute is capped to a full vote value, “super nodes” that receive an extremely high number of inbound links, and thus would otherwise have an extremely high rank, are restricted from having undue influence on the ranks of the nodes to which it links. Further, because authority nodes contribute a set vote amount regardless of the number of nodes linked to, nodes are discouraged from hoarding rank by only linking to a few sites.

So this patent has added to our knowledge of SEO the ideas that Google may assign sites some level of trust/authority that may determine how much link weight that each can pass along, and that Google actively attempts to understand when sites are affililated with each other by things like common ownership or control (including control via paying for links).

Takeaways

I was reminded of another Google patent when I first saw that one, that I wrote about in Google’s Affiliated Page Link Patent which shares Amit Singhal as a co-inventor, and provides more details on how Google might identify “affiliated” or co-owned web sites.

The discussion in this patent about thresholds is the first I can recall from Google on how they might score link weight from a site and limit how much might get passed along based upon some kind of threshold. There probably are other thresholds that are associated with ranking signals, though they might be a unknown and unnamed mystery like the thresholds from this patent was until a week ago.

Share

13 thoughts on “Google’s Paid Link Patent”

  1. Thanks Bill!
    Distilling the details down to a broadly digestible level is an art form you’ve perfected.

  2. Fascinating article once again Bill, I always look forward to reading your analysis of big G’s patents.

    I wonder how this affects geographical sites and their impact on one another; I mean many big brands have different country-specific sites that sit on separate domains but that link to each other.

    This patent would suggest that these links are down-weighted somewhat right?

  3. Thanks, Rick.

    That’s definitely one of the challenges around writing about patents – trying to keep some of the nuances that come with a patent while also making it accessible and pointing out what makes it important. :)

  4. Hi Steve,

    Thanks. I’m not sure if Google was really thinking about sites on a geographic level when they first started tackling topics such as sites with common ownership. With Google’s adoption of hreflang, we’ve seen them concerned about showing visitors the right version of a site in the right language. I’m not sure that Google is treating those co-owned sites as attempts to manipulate PageRank, though it’s possible that Google might apply a threshold to the amount of PageRank it allows to be passed along. So yes, they might be down-weighted to a degree – but there’s no way to tell how much.

  5. I’m wondering how how you would apply this sort of link value when it comes to paying a network of sites to instantly update your NAP as well as deep links into specific product pages, bios, and more. Take for example the ability to instantly update an anchor text link in realtime across a network of sites that includes most IYP’s – except sites like Google , LinkedIn, Manta… Yext I’m asking about Yext

  6. Hi Bill,

    Thanks for the heads up now, I need to pick your brain “please”

    Here is the scenario; an SEO manages two independent websites which are in the same niche for two un connected clients and each site has its own unique Google analytics id which was set up by each client.

    The, SEO uses the same white hat strategy for both sites i.e. both sites are getting links from the same sources. But if one client without the knowledge of the SEO went out and bought a batch of paid links from a bad neighbourhood and incurred a Google penalty is there a possibility that the other site that the SEO manages which has not been involved in paid links could also be penalised under common ownership, because Googles cookies will have identified that the SEO is managing both sites and each site have a similar backlink network?

  7. Interesting stuff again, thank you. I’ve seen the rumblings of discontent online regarding Penguin 3.0. There doesn’t seem to be any knowledge of whether it exists or not, but everyone’s panicking all the same. If it is real, it will be intriguing to find out if ranking signals change at all. I’ll bet Matt Cutts is getting pretty fed up of the endless barrage of questions he receives.

  8. Hi Geoff,

    I would like to hope that common management of two different sites as indicated by things such as common Google Analytics account logins or Google Webmaster Tools login accounts wouldn’t be the kind of thing used to determine whether those sites are related.

    If I took off my site owner hat or SEO hat, and instead looked at this issue from the perspective of a search engineer, I could find myself asking, “would it hurt to at least let such common ownership of management accounts for analytics or webmaster tools provide something to explore?”

    So, considering those two different sites to be related based upon the same person or agency just managing accounts isn’t something that I could see happening, but having that raise a red flag for further exploration from search engineers seems possible.

  9. Great post. If it was not for you we would never know all this stuff. thanks a lot otherwise it is Greek to most of us. You break it down to look so simple. thanks .

  10. Another solid post Bill. Thoroughly enjoyed the read. Gotta agree with Dan Carter up above on your breakdown of these topics. Makes it easy for folks like myself who are relatively new to this field.

    PS: I’ve now come to appreciate why/how it takes you [quite] some time between posts :]. Said it before, but the planning, detail and thoughtfulness in your posts is really a great service to the Internet!

    Great work bud. Big fan.

  11. Wow talk about distilling a complex topic down to simple language that us newbys can understand. Thanks for the diligence in your research!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>