Google Patent Granted on Web Link Spam

When a search engine indexes pages and other documents on the web, hoping to provide meaningful and relevant results to searchers, it doesn’t just rely upon the content found on web pages, but also considers the quality and quantity of of links pointing to those pages.

examples of link farms and clique attacks

A search engine like Google might determine that a page is relevant to a specific query based upon the content found on that page, and the anchor text found in links pointing to the page.

It might also look at what it considers “relationships” between pages by looking at how pages are linked to each other. PageRank is one method of viewing those links that Google states that it uses, and assigning a measure of importance to pages that are linked to from other pages. This measure, or rank might be simplified as a probability that someone might arrive at a certain page if they are arbitrarily and randomly clicking on links on pages that they’ve surfed.

This combination of relevance in content and anchor text, as well as importance based upon link relationships helps to determine the order that pages show up in response to queries from searchers. While it’s possible that Google might proceed to rerank a certain number of those top search results based upon other signals, this method of determining the top results can influence whether or not a page might be seen by searchers.

However, there’s a problem with link-based ranking methods such as PageRank. It’s possible that the structure of links between pages can be deliberately manipulated to artificially inflate the ranks of some pages.

A patent granted to Google today describes a way that the search engine might use to identify two different methods of spamming pages, and take action against artificially inflated importance (or PageRanks) for pages.

This method of identifying link spam involves looking at a sampling of links to a page to see if the search engine can identify certain characteristics that appear in a couple of different types of manipulative linking.

Link Farms and Clique Attacks

A search engine might explore a number of links to a page to see if there are certain characteristics shared by those links that might be different than a page that is authentic (that doesn’t engage in manipulative linking).

The description in the Google patent specifically picks out two types of link spam, link farms and clique attacks, and explains how links involved in those behaviors might be different than links to authentic pages.

Link Farm – A link farm is usually a large set of pages that were created primarily to point to a single page, in order to falsely give the impression that page pointed to is important.

An example might be the home page of an ecommerce site which is being artificially increased in rank by the creation of many “dummy” web pages that all have links to the home page. Those links might case the site to appear higher in search results if the links from the link farm are considered by the search engine.

In a link farm, all of the pages pointing to the central page will tend to have very low importance scores (or PageRanks). Authentically importantant pages will be more likely to have links from some high-ranked pages pointing to them in addition to links from low-ranked pages.

Clique Attacks – Another type of link spam is the clique attack, or web ring, which is a set of pages that predominantly point at each other, to present a false appearance of authority or importance.

The pages in this kind of clique attack, or web ring, don’t link much outside of the other pages in the ring, and their links to each other might cause each site to appear higher in search results if the links from the web ring are considered by the search engine. Many of these will tend to not link out to other pages outside of the web ring, like authentically important pages might.

Taking Action on Artificially Inflated Importance

When pages that are likely to be spam link have been located, under this patent, Google take action to account for the “artificially inflated importance” of those pages.

As a first step, a human review or another algorithm might be used to examine whether or not those pages are used as a link spam scheme.

If a page is determined to be link spam or a candidate link spam, the following measures might be taken:

  1. Links from the page might not be considered at all in determining link importance of other pages.
  2. The impact of links from the page might be reduced in importance.
  3. A predetermined penalty might be applied to the importance of links from the page.
  4. The importance of the page might be reduced in a way that doesn’t rely upon links.
  5. The importance of the page might be reduced in a way that doesn’t rely upon links, while also reducing the importance of links from the page.

The patent does go into depth on some of the math behind the identification of link spam in link farms and clique attacks, and is worth spending time with if you want to delve deeper into how Google might use the methods described in the patent:

Method for detecting link spam in hyperlinked databases
Invented by Sepandar D. Kamvar, Taher H. Haveliwala, and Glen M. Jeh
Assigned to Google
US Patent 7,509,344
Granted March 24, 2009
Filed August 18, 2004

Share

48 thoughts on “Google Patent Granted on Web Link Spam”

  1. Hey Bill

    This is an interesting one. Again the thing that pops out to me is that this was filed in 2004. I think it’s fair to say that things have likely progressed rapidly from there.

    Best rgds
    Richard

  2. Very interesting, things have certainly come along way since this was filed. Another great post!

  3. Hi Richard,

    This patent does seem to fill in a void in understanding how Google might be looking at links and link spam. For a long time, many people where mistakely attributing Yahoo’s concept of Trustrank to Google, and in this patent we seen a different approach completely.

    It is likely that Google has incorporated other or additional ways of finding and acting on link spam since this was originally filed, but I think it provides some insight into an area where we hadn’t seen too much directly on the topic from Google previously.

    Hi Wollongong,

    Thank you. It has been a few years, and it’s probable that Google and the other search engines have been picking up many new ideas through resources like the AIRWeb workshops.

  4. It shows that it certainly isn’t worth risking your site’s rankings by joining a web ring or link farm. You don’t want to find this out the hard way. It’s very hard to get out of one you’re in one too, and it can take your site ages to recover.

    I can’t believe this took the best part of 5 years to pass!

  5. Hi Adam,

    I agree. :) Participating in link farms or clique attacks probably isn’t a good idea.

    Patents can take a while to get through the examination stage. It appears that the patent examiner was raising some objections based at least in part upon one of the original pagerank patents, and some similarities to another patent application filed by most of the people listed as inventors on this patent:

  6. This patent application provides some insight into just how poorly Google understood Webspam five years ago. It also raises some questions. How is it that Google so consistently manages to botch up the well-documented definitions established by the SEO industry?

    A link farm is NOT a group of Web sites that all point to one central site. In order to be a link farm, every member site must link to every other member site.

    And Webrings have absolutely nothing to do with cliques, Web spam, or search engines. In a Webring, every member site shares navigation code (that may or may not be search-indexable) to help visitors find other sites in the Webring. Webrings were very popular in the 1990s and Yahoo! actually bought the largest Webring service, Webring.org.

  7. Hi Michael,

    Thanks. I agree with your criticism, especially of the use of terms that Google included in the patent. I wouldn’t say that these are terms that have been well defined and established by the SEO industry though – industrial and academic researchers, site owners, web users, and others have also helped give them meaning.

    As you note, web rings have been around since the very early days of the web, before Google or Yahoo, and their purpose was to provide a way to navigate from one site to another that shared some common theme or purpose. Webring.org itself dates back to 1994, and has survived being purchased and then abandoned by Yahoo. I don’t like Google’s use of the term “web ring” in this patent either, but think their use of “clique attack” is more useful.

    Google’s definition in the patent, of a link farm as a group of many pages pointing to one page is a too narrow definition of a link farm, which can involve a large number of pages linking excessively to each other.

  8. Think of the patent application language as a snapshot of Google’s thinking in 2004. Although the patent was just granted this year, they formulated these ideas and arguments many years ago (in Internet terms).

    Google has rolled out two major redesigns of its search technology (Bigdaddy/Google 2.0 in 2006 and Searchology/Google 3.0 in 2007).

    This year we’re seeing them roll out new semantic features that have been hinted at in some patent applications as well. However, no patent is really going to provide us with much information about what Google may be doing now.

  9. I know that quite a few sites use directory content management systems to create hundreds and even thousands of links to their web sites. This doesn’t seem to work as well now as it has in the past for things like ‘Google bombing’, etc.

    Judging from this Google patent, it will be even less effective in the future.

  10. Hi Soren,

    This patent does seem limited, doesn’t it. :) Google doesn’t discuss too much of their approach to fighting spam in public, but they have published at least a couple of other patent filings that involve identifying web spam that take different approaches.

    My post titled Google Patent on Web Spam, Doorway Pages, and Manipulative Articles involves a granted patent from Google originally filed in 2003. It provides a wider and more complex approach to identifying web spam. Another post, Phrase Based Information Retrieval and Spam Detection provides some information on how spam pages could be identified in a phrase-based indexing system.

    Google has also been a participant in the AIRWeb workshops with other search industry and academic members. I think it’s safe to say that Google does know more about link spam than what is reflected in this patent.

  11. Thanks, Michael.

    It can take a long time for a patent to go from just filed application to granted patent. One of Google’s vice presidents, Udi Manber, mentioned in an interview last April that Google updated their search algorithm over 450 times in 2007 alone. Thankfully not every change is captured in a patent filing. :)

  12. It is hard to distinguish which sites are link farms and which are not because Google can easily accuse sites to being link farms rather than a web ring.

  13. Hi Albert,

    It can be difficult to distinquish between pages that are linking together because those links provide value to visitors, and pages that link together solely to increase each other’s ranks. But, I’ve seen many pages where the site owner explicitly states that the purpose for linking to each other is to help one another raise their search rankings. In that case, it does become pretty easy to tell that those links aren’t there to help visitors to the site find helpful related resources…

  14. A really illuminative article, props! There is one question for me: I own multiple pages with the same IP and don’t try to hide this “network” from Google. They are all linked like a web ring but is it really damaging if I just advise visitors of my other projects? I’m quite scared Google don’t realise that I don’t want to cheat them. ;)

    Greetings, Florian

  15. Hi Florian,

    Thank you. Regarding your multiple pages, it may be a question of scale and what the search engines might perceive as going on. If you’re talking about a handful of sites, or a dozen, that’s not as bad as hundreds or thousands. If it appears that you’re not hiding anything, and that your links are there because of common ownership of legitimate and reasonable looking sites, that’s much better than many hundred fake blogs, or splogs, linking together.

    The search engines don’t like sites that link to each other primarily to increase each others rankings, which offer very little to searchers in terms of value. If your pages also link out to other sites, and look like useful resources for visitors, you have much less to worry about than sites which scrape and aggregate content from other pages, and which are all or mostly low ranking and low quality pages.

  16. I’ve been a website designer for over 13 years and this is the first of of hearing about a clique attack. I wonder if it’s possible for the algorithms to pick up legitimate links of a site that has multiple links to it as a link farm.

  17. Hi Joe,

    It’s possible that you’ve tripped over clique attacks many times without recognizing them, or knowing that someone at Google was calling them by that name.

    It is possible that some aspects of legitimate linking, such as webrings, may seem similar to clique attacks, but usually when someone gets involved in a web ring, they are also getting links from other places as well, and linking out to other sites in different ways, too.

  18. If people today are that stupid to get involved with a link farm then they deserved to be spanked by the search engines. There is enough written online about the subject that anyone caught doing this should be well aware. It is funny the lengths some people will go to increase their online rankings.

  19. Hi Bill,

    There are some people who get involved in link farms knowing the potential risk, who do it anyway. There are others who end up getting so caught up in other aspects of putting their business online that they don’t pay enough attention to the kinds of practices that might cause problems with search engines. Anyone putting a web site online, and hoping that it might show up well in search engines should spend some serious time with the search engine guidelines, which do warn about link farms.

    I’ve seen sites that have been penalized for participating in link exchange programs who have learned that lesson the hard way. For those sites, reinclusion is sometimes possible after cleaning up their sites. Many don’t know how to go about building links for their sites, and do pursue some link building possibilities that they shouldn’t.

  20. Interesting article and interesting comments particularly about the spread between the filing 2004 and the grant 2009 – you would think it would all be out of date by now. It would be interesting to know how much of this is built into ‘caffine’.

  21. Hi Mark,

    Google’s caffeine is an update of how Google stores and accesses information in their databases – the basic infrastructure of their data storage and collection. The impact of that will likely be that they can store more information, and acccess it quicker. I don’t think that there will be a direct impact upon the way that they identify link spam, and act upon that identification, but it’s possible that their more robust infrastructure may have an indirect impact by allowing them to bear more resources upon web spam.

  22. It makes you wonder if link building in SEO will become a thing of the past. With Spam becoming more of problem, how long will it take for google to exclude it from their algorithm…

  23. Hi Neil,

    Links had value in the days before search engines that relied upon them for ranking, and they likely will after search engines place less reliance on links. There’s still value to providing links to visitors on your site.

    I do think the search engines are finding more ways to address link spam, advanced beyond what is described in the Google patent that is the topic of this post. Google’s exclusive license to use PageRank does expire next year, but the PageRank that they use today is likely very much different than the PageRank of the 90s. We also know that Google and the other major search engines look at a very large number of other signals in ranking pages, and will likely continue to do so.

  24. Very informative Bill. And thanks for your answer of Florian’s question. I was wondering the same thing!!!

    And I agree with Bill Gassett. I don’t know why people bother with link farms. Its common knowledge.

  25. Hi Lori,

    You’re welcome.

    There are people who will attempt to take advantage of things like link farms or splogs or paid links or other web spam as an inexpensive way of creating back links, even though the risks associated with them can be high, especially when the aim is a short term benefit. I do believe that the search engines are getting better at identifying web spam, but there may always be people who will attempt to test the search engines, and gain some kind of benefit over others from doing so, even though the risk is high.

  26. I have a local competitor that is involved with a large web ring, and he dominates the search engines. When I first started to build my online presence, I was copying his methods. I now know what he is doing is spam, but he still ranks high.

    I don’t get it. He is clearly manipulating the search engines, but he continues to rank at the top for every major keyword in my area for real estate. How can he be obvious spam and still not get slapped?

  27. @Neil

    I would be very, very surprised if search engines dropped link count as a gauge for ranking web pages. Other than contextual links, what other form of information could a “computer” use as a foundation to make contextual decisions with? Sure you have things like “bounce rate”, but these are used as a gauge to evaluate the “trustworthiness” of the link. Personally, I think that the search engines are “stuck” with using links as a foundational means to tap human opinion for the foreseeable future.

    Mark

  28. Hi Lisa,

    Keep on doing what you are doing, and avoid spam if you can. While your competitor may be succeeding presently, if what he is doing might be perceived as spam by the search engines, then he is vulnerable to losing his rankings at any point.

    The search engines attempt to address spam methods programmatically, rather than on a case-by-case basis. So while someone might get away with something for a while, chances are that it will catch up to them. It’s also possible that some of the links that you see for his site aren’t counted by the search engines as well at this point.

  29. Hi Mark,

    There are other areas that the search engines are exploring to determine the importance of a web page, from user-behavior signals other than bounce rate (such as time spent on a page), to annotations in bookmarks and tags and search wiki’s and social networks. Links may continue to play a role in how pages are ranked by search engines, but that role may become smaller and smaller in the future.

  30. Did some research on the top ranking “make money online” bloggers and affiliates and there are a huge amount of serp “manipulation” going around. What do you think about this niche and the tactics beeing used?

    Best regards,
    Trond$Moneyonline.net

  31. Hi Trond,

    The “make money online” niche reminds me of the old newspaper scam classifieds promising people untold riches stuffing envelopes.

  32. It’s the biggest irony, Bill. These gurus supposedly teaching how to make money online are making money online by telling others how they make money online.

    This is pretty much how its done:

    Guru: Pay me, and I’ll tell you how I made money.
    Noob: OK, I paid you. Now tell me your secret.
    Guru: Well, there it is. I just made money online.
    Noob: That’s it?
    Guru: Yep. Now go do the same.

    After the Noob sells a few eBooks, he’s a Guru.

    And repeat.

    Beautiful system, eh?

  33. Hi Brandon,

    Not quite sure the word guru is appropriate. Snake oil salesman fits a little better.

    I’m not quite sure how that’s associated with the topic of this post, but I agree with you.

  34. Brandon: I’ve been working in this industry since 96. I’ve also written books about SEO, social media marketing and digital strategy. Also, I have more than 20 International and recognized technical certifications (Cisco CCNP, CCIE Written Exam, Microsoft MCSE, Master CIW, ..)

    I don’t see any good reason why I should NOT be able to offer my knowledge and services to my readers out there? I’m an ordinary guy who loves my job and like to help people. If my experience and knowledge can help others achieving their goals without all the blood and tears (like me), how is this considered to be more “monkey business” than New York SEO? I built my own SEO company in 2006 and sold it in 2009 for a nice amount of money but the reason why I did it was because the money did NOT motivate me. I simply missed the nerds stuff and being able to have more freedom. Owning my own company was not what I expected. I like SEO better than CEO. I’m much better as a SEO than a CEO also.. ;-)

    I’m sure both you and Bill are earning some money in the “cloud” too, right?
    Want a banana? I’ll take some peanuts please :-)

    Best regards,
    Trond

  35. Hi Trond,

    There’s a difference between what you are doing by providing actual value in terms of books and services that aim at helping others, and what some people offer that are the equivalent of “how to make money stuffing envelopes” scams.

    Unfortunately, there are a lot of scams that offer people the opportunity to make money online, and don’t. It’s not hard to find them.

  36. Hey Bill,

    The link to the patent is not live at present so maybe you can email an updated version as I’d be interested to read further.

    A few things spring to my mind on reading this post.

    1. It’s still a grey area as to how much link juice is allocated for each link from a site. Google have never been very specific on whether any page has a cetain amoutn that is diluted every time a link is given out or if each link is effectively run in parallel and doesn’t dilute at all. Matt Cutts always seems to dodge this question as well.

    2. Providing people give good valued content then I don’t see any issue with returning the favour and linking back to their site, even with anchor text because any manual reveiw will still view the page as relevant and valued content so therefore the pages linking off are also more likely to be relevant with good content too.

    3. I just also wanted to touch on the point Brandon made. What he describes is a Pyramid / Ponzi type scheme. Now the difference to this and the way most MLM companies work is that in the former there is absolutely NO value given whereas in the latter the prodcuts / services have great value (in the most part) and so the user gets something good. If they then want to recommend this onto someone else then great. Fortunately most MLM companies nowadays offer good products and services but there are still a few scams out there which is a shame.

  37. Hi Justin,

    The link to the patent is working properly. Maybe there was a problem with the patent office website this morning.

    Chances are that there was always a different amount of PageRank passed along by different links on a page, since the launch of Google. Matt Cutts has said a number of times over the past few years that different links on a page pass along different amounts of PageRank.

    I really have no interest in debating the merits of MLM sites, linking to them, discussing them, or promoting them in any way.

  38. I know that quite a few sites use directory content management systems to create hundreds and even thousands of links to their web sites. This doesn’t seem to work as well now as it has in the past for things like ‘Google bombing’, etc.

    Judging from this Google patent, it will be even less effective in the future.

  39. Hi Roy,

    If anyone is relying upon the search engines finding those links, and using them to “improve” the rankings of pages, then the links will also be viewed and measured and analyized to see if they might have been created to attempt to manipulate rankings. I think this is an area that Google continuously gets better at detecting on a regular basis, so that if they don’t detect them now, it may only be a matter of time.

Comments are closed.