How a Search Engine Might Fight Googlebombing

The first known appearance of the phrase “googlebomb” showed up in an article by Adam Mathes in the online magazie uber.nu, in a request to help pull a joke on a friend of his, by making the friend’s website rank highly for the term “talentless hack.”

You’ve possibly noticed that some pages rank well in Google search results for terms of phrases that don’t actually appear on those pages, because other pages link to those pages using those words as the text that accompanies those links. For example, search for “click here” and the top search result at Google is the Adobe Reader download page, which is linked to by millions of links across the Web using “click here” as a link to the page.

I’ve used the phrase “Googlebomb” in this post, but this is something that happens at Yahoo and Bing as well. Given enough links from enough pages using the same text pointing to a specific page, and there’s a chance that the page being linked to might rank very well in search results from any of the major search engines, even if the content of the page has nothing to do with the text in those links.

Usually, when people link to pages, the text used in those links if often descriptive of what people might find at the pages being linked to. This can help a search engine understand what the page being pointed to is about. Search engines have been associating the text in links to the pages that they refer to since the early days of the Web. As Google’s founders, Sergey Brin and Lawrence Page note in one of the first white papers about Google, The Anatomy of a Large-Scale Hypertextual Web Search Engine, the idea is something that they incorporated in Google, but it didn’t start with them:

This idea of propagating anchor text to the page it refers to was implemented in the World Wide Web Worm [McBryan 94] especially because it helps search non-text information, and expands the search coverage with fewer downloaded documents.

We use anchor propagation mostly because anchor text can help provide better quality results. Using anchor text efficiently is technically difficult because of the large amounts of data which must be processed. In our current crawl of 24 million pages, we had over 259 million anchors which we indexed.

Trying to understand the relevance of a page from links pointed to it is often referred to as hypertext relevance, and while its been employed by search engines for almost as long as there have been search engines on the Web, it’s also been manipulated by people for personal, political and commercial purposes.

The talentless hack Googlebomb was intended as a joke, but one of the most famous googlebombs was inspired by political activism, with a large number of people linking to the presidential biography page on George Bush’s Whitehouse biography using the phrase “miserable failure” in the anchor text of their links. A September 2005 statement in the Official Google Blog, Googlebombing ‘failure’, explained why that page was showing up for that result:

Google’s search results are generated by computer programs that rank web pages in large part by examining the number and relative popularity of the sites that link to them. By using a practice called googlebombing, however, determined pranksters can occasionally produce odd results. In this case, a number of webmasters use the phrases [failure] and [miserable failure] to describe and link to President Bush’s website, thus pushing it to the top of searches for those phrases.

In January, 2007, a post on the Google Webmaster Central blog, A quick word about Googlebombs told us that Google had solved the “Miserable Failure” Googlebomb:

We wanted to give a quick update about “Googlebombs.” By improving our analysis of the link structure of the web, Google has begun minimizing the impact of many Googlebombs. Now we will typically return commentary, discussions, and articles about the Googlebombs instead. The actual scale of this change is pretty small (there are under a hundred well-known Googlebombs), but if you’d like to get more details about this topic, read on.

The post doesn’t tell us how the problem was solved, other than mentioning an improvement to the way that they analyze links, and that the solution was an algorithmic one. How did Google solve the problem?

The only patent or whitepaper reference that I’ve seen on Googlebombs from Google appears in the Google patents on Phrase-Based Indexing. Until today, I hadn’t seen any other references from any of the other search engines about how they may have attempted to solve the problem, until a Yahoo patent granted today, which describes how they fight “search engine hijacking,” which uses the example of a query for “miserable failure” showing the Presidential biography page.

The Google phrase-based indexing approach that I mentioned may or may not be the method used, as described in the Google Webmaster Central post above. But, it may account for the President’s bio page starting to show up in search results a few months later when the word “failure” was added to that bio page. Here’s a snippet from the first phrase-based indexing patent:

[0156] This approach has the benefit of entirely preventing certain types of manipulations of web pages (a class of documents) in order to skew the results of a search. Search engines that use a ranking algorithm that relies on the number of links that point to a given document in order to rank that document can be “bombed” by artificially creating a large number of pages with a given anchor text which then point to a desired page.

As a result, when a search query using the anchor text is entered, the desired page is typically returned, even if in fact this page has little or nothing to do with the anchor text. Importing the related bit vector from a target document URL1 into the phrase A related phrase bit vector for document URL0 eliminates the reliance of the search system on just the relationship of phrase A in URL0 pointing to URL1 as an indicator of significance or URL1 to the anchor text phrase.

Once the whitehouse staff added “failure” to the bio page, it suddently became relevant under a phrase based indexing approach for all of those links pointing to it that used “miserable failure” as anchor text.

Yahoo and Bing are also subject to Google Bombs, and a search at both Yahoo and Bing for “miserable failure” shows the George Bush whitehouse bio in the top four results. The Yahoo patent describes a way of diffusing Google Bombs using sentiment analysis, and if it works, it’s possible that Microsoft might want to license the approach from Yahoo.

The patent is:

Mitigation of search engine hijacking
Invented by Shanmugasundaram Ravikumar and Bo Pang
Assigned to Yahoo!
US Patent 7,870,131
Granted January 11, 2011
Filed: December 13, 2007

Abstract

The subject matter disclosed herein relates to mitigation of search engine hijacking. In one example implementation, a sentiment value associated with anchortext in a search engine result may be determined.

Similarly, a sentiment value of one or more web pages referenced by the anchor text may also be determined. A divergence between sentiment values associated with the anchortext and a web page may then determined.

Here’s the technical language on how the Yahoo method works, straight from the patent:

More specifically, given an anchortext-page pair (q, p), a sentiment classifier may be applied to the anchortext and the web page separately, resulting in the sentiment of the anchortext (C(p)) and the sentiment of the web page (C(q)). In the case where C(p) U C(q)={acceptable, unacceptable}, a determination may be made to see whether the anchortext q is trying to hijack web page p. Where Pq is the set of all pages with anchortext q, and Qp is the set of all anchortexts for page p, hijacking may be indicated where C(p)={acceptable} and C(q)={unacceptable}. This may correspond to a case in which an invalid anchortext tries to hijack a valid web page. In this case, anchortext q may be declared as hijacking page p if the multi-set Pq, treated as a distribution, has low entropy and if most of the anchortext in the set Qp are “acceptable”. Such a result may indicate that the goal of the anchortext q is to slander web page p as web page p is also indicated as having a significant amount of other “labelings”(in the form of diverse, and mostly “acceptable” anchortexts).

Likewise, for example, hijacking of search engine may be indicated in cases where anchortext has a sentiment value that is acceptable and the web page has a sentiment value that is unacceptable. Ranking component may determine that such hijacking is occurring if a set of anchortexts referencing the web page has a distribution with low entropy, and if a majority of web pages within a set of web pages containing the anchortext have an acceptable sentiment value. In such a case, the acceptable anchortext sentiment value diverges from the unacceptable web page sentiment value, and such divergence may be shown not to be a normal occurrence due to the low entropy of the set of anchortexts referencing the web page.

In other words, if anchor text used to point to a page has a negative sentiment value, and the text on the web page being pointed to has a positive sentiment value, then the relevance of that anchor text may not be used by the search engine to analyze what the page is about. Likewise, if the anchor text has a positive sentiment value, and the text on the page linked to has a negative sentiment value, then the anchor text also may not be applied to the page pointed towards.

A link using “miserable failure” as anchor text expresses a negative sentiment, and the bio page of the former president expresses positive sentiments. Under this system, presumably, the “miserable failure” text wouldn’t be applied to the bio page.

I’m not sure if Yahoo tried this out, and with Bing now powering Yahoo’s search results, it’s impossible to test whether or not this was effective if Yahoo had implemented it. At this point, Yahoobombing and bingbombing still seem to work.

Conclusion

Is Phrase-Based indexing responsible for the disappearance of Googlebombs at Google?

There are two different sets of Phrase-based indexing patents that were published by Google. The first set described a number of ways that it could be used by the search engine. The second set described how the system could be incorporated into a large scale search engine index like Google’s. Phrase-based indexing would stop the miserable failure query from showing George Bush’s bio, and would explain why the bio started appearing again for a query using just “failure” once the whitehouse added that word to the page after the miserable failure googlebomb was diffused.

Is Google using some kind of sentiment analysis approach to solve Googlebombing, like described in the Yahoo patent?

It’s possible, but it’s hard to say whether or not the Yahoo approach even works, at this point.

Share

48 thoughts on “How a Search Engine Might Fight Googlebombing”

  1. Great article as always. Putting the negative/positive sentiment aside,it seems that websites are still ranking based on the number of link and keywords in anchor text, is this going to go away?

    M. Azrikan

  2. When they first addressed the Googlebombing issue it seemed that it was a simple fix, the keyword must appear on the page at least once or it wouldn’t rank for that term. But that has problems – a page about “automobiles” could never rank for the term “cars” without using the word on the page. Now it seems that some pages do rank for synonyms without using the keyword, perhaps that is one part of a sentiment algorithm?

  3. Thanks to make me discover this approach.
    But sentiment analysis isn’t so efficient. Perhaps the reason for what bombing fight doesn’t work so good ;-)

  4. I think it’s quite hard to detect if a page or an external anchor text link is intended for a negative dispute if search engines will try to base it through sentiments. As for smart Googlebombers, it will just give them more ideas on how to bomb a site through skimming words within the content (ex. if a content contains the word “site” or “spam”, they can just simply build external links pointing to with the anchor text “bad site” or “spam site”), just a thought.

  5. Something tells me that Google bombing isn’t going away. Don’t get me wrong, the engineers at Google are smart, very smart, but if the part of the algorithm that diffuses Google bombs was written by men, then it can and will be reverse-engineered by men. 40 PHD’s holed up in a room somewhere just can’t compete with the army of BH-SEOers looking to shank their system.

    Still though, I applaud Google for trying and I hope they succeed as I am one of their customers.

  6. WOW i really thought this was a thing of the past and that there was no such thing as a bad link any more – just low quality, poor relevancy links.

  7. Google desperately fights googlebombing by trying to find other ranking approaches like real-time results from viral social media hype.

  8. I seriously had no idea that this was still as severe of an issue as it is. Great post and thanks for the information. Good luck to Google on this matter.

  9. Google might not be saying how they solved it so that those doing the googlebombing would not know how it was solved, thus it would be harder for them to override that solution. But if Google did indeed use what was described in that yahoo patent then at least it worked.

  10. I think that Google trys to counter Google bombing with a part of their QDF …intersting and Id likle to learn more about others experiments if they have any

  11. Hi Mike,

    Definitely some of the basics of search engine optimization, such as the quantity and quality of links to pages, the content upon those pages, and the keywords in anchor text are still around. But, chances are good that even those have been evolving over the years, and the search engines are looking at a good number of other signals as well.

    It’s hard to tell how a Google or Yahoo might use sentiment analysis. We do know that Google is using sentiment analysis as part of the process behind deciding which reviews to show about businesses in local search. See this paper for some examples:

    http://www.ryanmcd.com/papers/local_service_summ.pdf

    The patent involving sentiment analysis is from Yahoo, and it’s possible that Google may be focusing upon a different approach, like the phrase-based indexing method that I mention in the post.

  12. Hi Jim,

    From the Google blog post addressing googlebombing, they made it sound like a couple of Google engineers spent a leisurely weekend coming up with a solution for the problem. They didn’t explicitly describe how they had though, and it wasn’t necessarily that if the word in the anchor text didnt’ appear on the page then it wouldn’t count. I still come across many pages in search results that rank well for terms that don’t appear on those pages, and if I look in Google’s cache for the page, it tells me that the term or phrase only exists in links to that page.

    The phrase based indexing approach actually accounts for that, to a degree though.

    The synonym issue is another matter entirely. This Official Google Blog post addresses one way that they will rank pages for synonyms, even though the original query terms don’t appear on the pages themselves:

    Helping computers understand language

  13. Hi Renaud,

    If Bing hadn’t started powering Yahoo’s results, we might have been able to do some testing on Yahoo to see how well or poorly they handled this kind of search engine hijacking (the term they use to refer to Googlebombing on Yahoo). The sentiment analysis approach is one that Yahoo came up with. Hard to say if its sometime that Google uses.

    The difficulty though, is that all search engines still find anchor text useful in understanding pages being linked to. Except when that anchor text attempts to intentionally mislead the search engines. Should the search engines err on the side of using that anchor text, especially in cases like when the link uses text like “click here,” or “read more”? Those are pretty neutral terms. Maybe Google does use some level of sentiment analysis in fighting Google bombing?

  14. Hi Ben,

    The good news is that things like phrase-based indexing, and some level of sentiment analysis can both possibly make Googlebombing more difficult. The bad news is that it’s probably still something that people can do if they are smart enough about it.

  15. Hi Mark,

    I think some amount of Googlebombing will probably be with us for a while to come. As I mentioned in the post, it’s still very helpful for a search engine to look at the text in links to a page to try to get an idea of what a page is about. And the anchor text in some links probably contain very valid criticisms when linking to pages, from satire, to serious concerns.

    Google has knocked out some of the most obvious and most publicized Googlebombs. Chances are, they’ve been focusing upon googlebombing campaigns that seem like concerted efforts to manipulate what web pages rank well for, based upon anchor text. But what if that anchor text isn’t part of some concerted effort?

  16. Hi Mark

    As long as the search engines look at anchor text to determine the relevancy of a page, I think there’s going to be some concern about whether people are going to try to use anchor text to manipulate search results.

  17. Hi Shailender,

    Google seems to have fixed some of these Googlebombs, though chances are there are still any number of them out on the Web. As I noted in my post though, Yahoo/Bing still have the “miserable failure” problem.

  18. Hi Andreas,

    I agree – increasing the number and types of signals that the search engines look at, and exploring how they might make the ones they use now more immune to abuse seems to be a battle that search engineers are waging on a regular basis. As for realtime rankings, there are ways they’ve been exploring to try to determine the quality of those ranking signals as well.

  19. Hi James,

    Thanks. Googlebombing isn’t something that gets talked about much since Google solved a number of the most well known google bombs, and if Google indeed is using phrase-based indexing, it should take care of a number of Googlebombs as well. At least until people start making better googlebombs.

  20. Hi Andrew,

    They didn’t tell us how they solved the problem. And that was definitely on purpose.

    I’ve included two possible approaches – Google’s phrase-based indexing, and Yahoo’s sentiment analysis. They may be part of the solution, or Google may have come up with something completely different. Chances are good that Google won’t be disclosing that information with us. :)

  21. Hi David,

    The query deserves freshness approach seems to be an attempt to provide timely information when the search engine thinks the topic a query is about may benefit from pages that present fresh news and information. I’m not sure at this point if it’s something that might be helpful in fighting off Googlebombs, but it’s definitely part of the mix in determining what we see in search results.

  22. Hi Bill. Great research and article. I think Google’s method of discounting websites from the SERPs if the searched-for phrase doesn’t appear on the page, is the best way of fighting Googlebombing. Do you think Yahoo / Bing will eventually match this method of indexing?

  23. Hi Web design Solihull,

    I think Google’s method of discounting websites from the SERPs if the searched-for phrase doesn’t appear on the page, is the best way of fighting Googlebombing.

    Except that’s not what Google is doing, and I don’t think you’ll be able to find anything on the Web anywhere that says that it is. The phrase-based indexing approach might limit how much weight a link passes along if the anchor text in a link isn’t “related” to the query that a page is ranking for, and that’s part of how it fights Googlebombing. But, the searched for phrase doesn’t have to appear on the page itself.

  24. I don’t think Google need to devote too much energy to combat Google bombing. It was a fad that has largely died out. If anyone has the time, energy, incliniation and resources to organise a Google Bomb, then good luck to them. It’s not really that damaging to anything.

  25. I agree with the previous comment by Jon Rhodes. I think Google bombing is not a major issue in itself. It may be a minor legal risk – but Google can probably defeat any complaints by “algorithmic rankings” argument. It does provoke some thoughts on possible relevance improvements – but again, from SE point of view, if so many people say GWB is a miserable failure, then maybe he is.

    One technical note I can make is that I can still observe “keyword only in link text” results in Google cache sometimes when looking for more obscure terms. So while having the term on the page seems to have a higher weight now, it is not strictly required.

  26. Hi Jon,

    I’m not sure that I agree that Googlebombs aren’t really damaging to anything. There have been a considerable amount of recent complaints in both mainstream media and on blogs that Google’s search results appear to be less relevant than they were in the past. The whole point behind a Googlebomb is to make a page rank well for a term that the page really isn’t relevant for, so this kind of search engine hijacking is something that search engines should be concerned about. As I noted above, Bing/Yahoo still seem to be ineffective against stopping fairly unsophisticated Googlebombs, with the George Bush website still ranking well for the term “miserable failure.”

    Rather than a “fad,” Googlebombing is an approach that takes advantage of search engines’ use of hypertext to understand what a page being linked to might be about, and is an attack on that kind of ranking.

  27. Hi Val,

    It’s likely that the major search engines will continue to look at the text of links pointing to pages to get a sense of what a page is about. Google didn’t want to get rid of the possibility that you will continue to see “keyword only in link text” results. As I guoted from the Brin/Page paper above:

    We use anchor propagation mostly because anchor text can help provide better quality results.

    But, like any ranking signal, there is a risk that people will attempt to manipulate it. Googlebombing is a fairly simple approach to attempt to manipulate search results. Google may have come up with a way to cancel out the effects of fairly simple Googlebombs, but that approach still has some limits, as we saw when the White House webmaster added the word “failure” back to George Bush’s bio, and the page started ranking well for that term in Google.

    Chances are that whatever approach Google used to defeat those simple Googlebombs may have made it more difficult to Googlebomb, but likely didn’t resolve the problem for good. Chances are also good that a more sophisticated way of Googlebombing will spring up, if it hasn’t already.

  28. I like Googlebombing. It remember me an epoch when you queried on Google something like “stupid donkey” and you get a vip people at first ranking position.
    So if anchor text used to point to a page has a negative sentiment value, and the text on the web page being pointed to has a positive sentiment value, then the relevance of that anchor text may not be used by the search engine to analyze what the page is about.
    That is the future.

  29. Nice post. This is really a question how these search engines will overcome this “Bombing” on the linked text that provides irrelevant information.

    I am a freelance web developer and working on PHP/MySQL from the past 5 years. I am very much interested in SEO stuffs thus I liked this very much, especially the term “GoogleBombing”

    Thanks,
    MBelgaila

  30. Isn’t all anchor text(ually) sound link building activity merely a targeted (commercially so) form of small-scale Googlebombing? How then have Google largely mitigated this “problem” in weeks back in 2006/2007?

  31. Hi Poker Falcon,

    The patent is from Yahoo, and we don’t know what Microsoft might be doing with Yahoo’s patents. It’s possible that if the yahoo patents might be useful to them that they would license them, but that’s not certain.

    It’s possible that Google might use some kind of sentiment analysis with hypertext analysis if they thought it would help. It’s worth investigating.

  32. Hi Matthew,

    There’s a difference between using anchor text in a link that is relevant for a page being pointed towards, and a widespread effort to use irrelevant anchor text to get a page to rank well for something other than what the page is about.

  33. Pingback: ¿Se pueden clasificar las páginas de web de positivas o negativas? | Ayuda - Buscadores
  34. It seems to me that Google’s ability to combat any kind of manipulation is growing exponentially. A few months after this article was written, I’m wondering if “bombing’ is effective at all now. ???

  35. Hi Dan,

    It’s possible that there may still be some power to influence search results through an approach like Googlebombing, but maybe not in the brute force type manner that was used in the past. If lots of people point links to a page with text like “miserable failure” and the page’s content has nothing to do with that, the page might not rank well for that term. But if the text in those links was much more related to the actual content of the page, then those links might have more impact.

  36. Excellent overview Bill on GoogleBombing. It really does not come as a surprise to me when you see the volume of links in your example for the phrase “click here”. We all know how much value links have and it is not like you are going to see a page on the net that is about that phrase anyways.

  37. Hi Bill,

    Thanks. There are an incredible amount of those “click here” links. It looks a little like once a page hits a certain amount of links with certain anchor text, regardless of how related that text might be or not, the page may rank for that term regardless of what Google might be doing to limit the impact of googlebombing.

  38. Hello Bill,

    I came to this post from the one you posted today, it seems phrase based indexing has come a long way in the time between both posts (January-December 2011).

  39. Hi Eliseo,

    I’m not sure how much phrase-based indexing has changed in that time period. This post discusses how Yahoo might be using a sentiment analysis approach to try to stop Googlebombing rather than focusing upon how Google might be treating phrase based indexing differently. The patent I’m discussing in the post is Yahoo’s rather than Google’s.

    Chances are that Google is exploring how they can use sentiment analysis as well, but we don’t know if this is an area where they are experimenting with it.

  40. I thought Google bombing as when you dropped a rivals site through spamming the arse out of it, Just googled and that called Google bowling , learn something new everyday. How do you get your head around all this ??

    We ran a social bookmarking campaign for a client I was half asleep when I ordered the campaign and mistakenly , placed the clients anchor against a totally different site, The site got around 500 shares on a totally unrelated phrase. it got as high as 2nd page and still sites on I think 3rd page now.

    Gutted GW Bush got Google to change the results , surely he was fairly ranked there.

    Thanks for all the info , may the force be with you

  41. Hi Duncan,

    Right, Google bombing and Google bowling are two different things, but share similarities in that both are attempts to manipulate or harm rankings for a particular page or site based upon activities outside of that site.

    There is definitely a potential problem with a search engine following a practice like letting anchor text relevance play too strong a role in how pages are ranked without having someway to limit or control attempts to manipulate a ranking signal like it. Both Google’s phrase-based indexing and the Yahoo patent above attempt to do that, and provide some interesting ways to try to stop it.

Comments are closed.