Post your URL in an SEO forum and get labeled as a Web Spammer? Maybe.
Some site owners and internet marketers attempt to increase how well their websites rank in search engines by buying links to their sites or exchanging links with others. Those kinds of activities are frowned upon by the major search engines because that kind of manipulation can impact which pages show up in search results. As Google notes on one of their help pages on link schemes:
Your site’s ranking in Google search results is partly based on an analysis of those sites that link to you. The quantity, quality, and relevance of links count towards your rating. The sites that link to you can provide context about the subject matter of your site and can indicate its quality and popularity. However, some webmasters engage in link exchange schemes and build partner pages exclusively for the sake of cross-linking, disregarding the quality of the links, the sources, and the long-term impact it will have on their sites. This violates Google’s webmaster guidelines and can negatively impact your site’s ranking in search results.
Likewise, there are SEO forums where people publicly discuss the exchange of links to manipulate search results.
Microsoft has published a patent application that describes how they might target (and possibly hand pick) Search Engine Optimization (SEO) related forums where they believe such activity may occur and crawl those to see if they can identify requests for links exchanges.
Forum Mining for Suspicious Link Spam Sites Detection
Invented by Bin Gao, Tie-Yan Liu, Hang Li, and Congkai Sun
Assigned to Microsoft
US Patent Application 20090198673
Published August 6, 2009
Filed: February 6, 2008
An anti-spam technique for protecting search engine ranking is based on mining search engine optimization (SEO) forums. The anti-spam technique collects web pages such as SEO forum posts from a list of suspect spam websites and extracts suspicious link exchange URLs and corresponding link formation from the collected webpages.
A search engine ranking penalty is then applied to the suspicious link exchange URLs. The penalty is partially determined by the link information associated with the respective suspicious link exchange URL.
To detect more suspicious link exchange URLs, the technique may propagate one or more levels from a seed set of suspicious link exchange URLs generated by mining SEO forums.
There’s a nice discussion in the background section of the description in the patent filing about some of the methods that search engines have developed to try to identify web spam, including a few paragraphs on the evolution of web spamming approaches:
Web spamming techniques have also evolved in time. The first generation spam involved keyword stuffing when the ranking was dependent on document similarity. The second generation spam involved link farms when the ranking was largely dependent on on-site popularity. The third generation spam uses mutual link exchange through “mutual admiration societies” when ranking is largely dependent on page reputation. In general, third-generation Web spamming is harder to detect than the previous generations.
Link spamming techniques, including busying/selling links, exchanging links, and constructing link farms, are a major category of the commonly used spam techniques. Link spamming refers to the cases where spammers set up structures of interconnected pages to boost their rankings in link structure-based ranking systems such as PageRank. Since link analysis is a crucial factor for commercial search engines, link spam is among the most popular and harmful techniques for search engines nowadays.
The patent application also defines and discusses anti-link spam approaches such as TrustRank, BadRank, and SpamRank, and how they attempt to detect link spam and webspam automatically. We’re told that those methods aren’t effective in certain situations and that the “link spam problem has yet to be solved.”
One attempt at a solution is to pay more attention to places where people may be openly discussing the exchange of links on the web and take URLs identified in those discussions to use as a “seed set” of URLs to crawl to identify other pages those link to. The patent filing refers to these places as “search engine optimization (SEO) forums,” which may be manually selected.
Search engine ranking penalties may be applied to URLs that have been identified through the methods described in the patent filing, which relies upon finding URLs mentioned in discussions of links exchanges without actually visiting the sites themselves or analyzing the content of those sites. We’re told there that:
To conveniently and efficiently exchange link trade information, spammers usually log onto SEO forums to communicate with each other for trading links, including link exchange, link sale, and recommendation link exchange.
These forums are increasingly more popular. Spammers post requests for “link exchange,” “buy & sell a link,” and “recommendation exchange” in these forums, along with the URLs of their websites, and other interested spammers may reply to the requests and provide the URLs of their websites.
In recognition of these activities, instead of searching and analyzing these spamming websites themselves, the technique described herein identifies the URLs of them by analyzing the context in the posts by spammers on the SEO forums.
There are many forums where search engine optimization is discussed that provide helpful and useful information to people who participate in those forums.
They may offer a chance for people to discuss best practices, exchange ideas on how to create better experiences for their visitors, offer constructive criticism on design and other aspects of a site. Many forums operate as a Community of Practice or an online Third Place as envisioned by Ray Oldenburg.
But there are also forums where discussions about “links for sale” or “exchanging links” or “reciprocal links” may take place. I’m not sure why the researchers at Microsoft felt that they needed to file a patent to protect the idea of finding such sites and using them to attempt to identify potential webspam.
The patent application does go into much more detail on some of the processes that Microsoft (and possibly other search engines) might use and is recommended reading if you participate in a forum that discusses SEO.
50 thoughts on “Are Search Engines Doing SEO Forum Mining to Find Web Spam?”
Pretty interesting and sort of pretty funny, IMHO.
A test of this might reflect a review of certain categories in DP forum and then test the sites within those categories on Bing.
And to think that I pretty much relied on that place years ago. LOL
It’s crazy that this patent specifically addresses “search engine optimization (SEO) forums.” Yes, these are places where link selling and purchasing is discussed, but by no means the only places, and probably the LAST places where a savvy “black hat” SEO is going to expose him or herself. (And since when did link exchange invariably equal “spam”? Sites link to one another (“exchange links”) all the time without intending to game the search engines).
As well, those that “efficiently exchange link trade information” are not necessarily “spammers.” As you say, you may want to examine this patent if you’re active in such forums – not necessarily because you might be “outed” as because you (or rather, a domain you’re associated with) might be found guilty by association.
To that final point, this introduces a potential malicious exploit. Might it not be possible for forum participants to introduce competitor URLs into forums with the suggestion that one may buy/sell/exchange links there?
> I find it hard to believe that search engines havenâ€™t been doing this for years.
Yes they have! In fact, we actually struggled to get them interested in the forums back in 98/99 (well, actually there was really justy one back them – Jim’s Search Engine Forums) – invited them and inspired them to come there. Now they won’t leave 🙂
I find it hard to believe that search engines haven’t been doing this for years. Perhaps they have.
All very good points. I’m very much concerned about two of the points that you raise.
The first being that people might post their competitors’ links into forums with offers for link exchanges.
The second being people who know very little about search engines and their guidelines who might come to a forum looking for help with their sites, and posting to a thread about link exchanges, and possibly being penalized for it. The patent filing mentions that they might penalize without even visiting the URLs in question, and examining the content at those URLs.
I remember lurking in Jim’s Search Engine Forums some, in those early days, though I never signed up and participated. It’s amazing how much more accessible the search engines have become in the past 10 years, with multiple “evangelists” from the search engines participating in blogs, forums, newsgroups, and social networks like twitter.
It’s not really a surprise that people from search engines likely have been visiting SEO forums. We know that one person or another from Google was posting on Web Master World under the name “Google Guy” for a long period of time.
The idea that search engines might be doing something like what this patent describes – crawling forums, and looking for indications of things like link exchanges is probably something that has crossed the mind of anyone who administers or moderates a forum that discusses SEO. What’s most surprising to me is that one of the search engines would file a patent application on doing so.
If people would just stop buying links they will never get in this problem. Link exchanges then? Well, I can’t see any reason to why you should write and ask for links on a public forum.
I have difficulty understanding where the “invention” is within this patent application. How does obtaining a patent for this invention protect Microsoft? Seems far fetched, but is their goal here to send a shot across the bow of the SEO community, knowing that Bill would discover the filing?
Does that mean you’re penalized for going to an SEO forum and innocently asking members to review your site? Where do they draw the line with this filing?
I’m not sure that the search engines really mind people buying links. It’s the buying of PageRank (or other link equity) that bothers them.
I caution people against buying links or engaging in link exchanges. I’ve seen sites that have been peanlized for participating in link exchanges. Regardless, people do use public forums to try to gain links solely to increase their rankings in search engines.
I’m not quite sure of the incentive behind this filing either.
I’m not filled with enough hubris to think that they did solely because I might blog about it. There were a lot of other patent filings worth blogging about this week.
There is an actual process for attempting to find link exchanges in forums within this patent application. I didn’t go into the fine details, but it does exist. I’m not sure that it is an application that will be granted one day, but Microsoft does seem to churn out a good number of patents and patent applications every week.
It would take more than just posting to a forum, and innocently asking members to review your site to possibly have it labeled as spam.
A search crawler would actually look for information in a forum thread that indicated that it might be about buying or exchanging links first. It would then look for some other characteristics that indicated that thread might be about acquiring links in eschange for money or other links back.
The patent is difficult reading, but if you give it a try (start at the description part), there’s some interesting information in there.
I’m not sure if actually crawling these sites would solve their problem. The majority of link sales and exchanges are done through private messaging on these forums, the urls are not actually posted. They would need to be checked and infiltrated manually, which they are anyway.
Two points here that have over the last six months turned my head slightly from the whiter than white approach I recommend to clients…
1. people might post their competitorsâ€™ links into forums with offers for link exchanges…
2. people who know very little about search engines and their guidelines…
I think there is an awful lot of scaremongering from the search engines because they don’t know what to do about these two aspects.
That’s not to say I have decided to play a little dirty, but just to highlight that there are thousands of sites out there making a ton of cash because they are getting away with it.
I know of some VERY well known brands that are clearly buying links for SEO benefit… it is blatantly obvious and yet they are not getting penalised… why so?
What annoys me is that I play straight only to see others getting away with utter filth.
The thing I can’t understand is how so many companies openly advertise the sale of links to improve SEO rankings… surely the search engines can (or already have) approach as a prospective client to uncover the network of sites and then block their SEO value?
They must have done that…. or as Bill pointed out, perhaps they “don’t mind us buying links” after all and maybe the scaremongering is a poor method of quality control?
Could this also be applied to comments left on SEO blogs like this one with a link to your site?
I suspect that you’re right on both points – that the majority of link exchanges on forums are probably discussed via private messenging, and that some participants on webmaster related forums may be from the search engines. But those kinds of discussions do still take place publicly, and I suspect that at least a few people who may end up participating in those kinds of links exchanges don’t realize that they may be hurting themselves by doing so.
So what’s to stop me posting all my competitors websites and getting them flagged as spam? Just like the LSI stuff that Google scare webmasters into believing that they use it’s just not practical.
The day that the search engines decide to punish webmasters for things they do not control is the day spam will really rule the search results.
I understand your frustration. I heavily recommend to anyone who has a web site, and an interest in showing up in search results that they spend some significant time going though the guidelines from the search engines, and trying to understand what they mean in those guidelines. I’m concerned as well about the possibility that someone might try to use a method like this to make it look like their competitors are engaging in conspiracies to manipulate search rankings.
As I responded to Stefan, “Iâ€™m not sure that the search engines really mind people buying links. Itâ€™s the buying of PageRank (or other link equity) that bothers them.”
We’ve been told by Matt Cutts, and others at the search engines to do things like disclose paid relationships through things like nofollow values for rel attributes in links, so that the search engines can understand that a link is paid for. There’s a huge media buying/link buying industry on the Web that sells links, including Google’s Doubleclick. The search engines don’t mind people buying links. It’s buying links in a way that passes along PageRank and manipulates search results that they are concerned about.
I do believe that you’re most likely right that there are some very brands that are getting away with buying links that shouldn’t be. And there are sites online that openly advertise the sale of links to improve SEO rankings, or that promote link exchange programs to do so.
Google has taken action against a number of big brands, and sites that sell links or set up link exchanges. The following post from Matt Cutts is a couple of year old, but it’s worth reading for anyone who wants to learn more about some of the steps that Google might take when they discover paid links that pass along PageRank:
How to report paid links
There are very many ways to have people link to your sites that don’t require paid links or link exchanges.
Hi People Finder,
Good question, and the thought that SEO blogs might also be focused upon crossed my mind. The patent application doesn’t expressly mention SEO blogs, but I know that I do get some visitors from the search engines every so often. I don’t believe that they visit with an intent to penalize people who might participate in conversations here. Matt Cutts has left a couple of comments over the years, with links back to his site, as have a few other people from different search engines.
I do know that sites other than forums, that discuss and engage in automated link exchange programs and in the buying and selling of links for ranking purposes have been penalized by the search engines. I haven’t participated in those types of exchanges or purchases, and I won’t.
My weekend was wonderful. I hope yours was as well. I wondered why they felt the need to file a patent as well. I enjoyed reading your post, and I think that the custom search engine that you created was a great idea.
I agree with your concern about penalizing webmasters for things that they have no control over.
What you describe does appear to be a potential problem with the process described in this patent application. However, the patent does go on to state that the mention of a URL under such circumstances may possibily only be considered preliminary evidence, and it then take some further steps, such as looking to see if actual linking is going on between URLs that might be mentioned in such threads.
Microsoft has also released a large number of other whitepapers and patent filings on web spam that consider other aspects of identifying web spam, including looking at the content of pages. While this patent explicitly states that this process wouldn’t necessarily consider the content of the pages under consideration as web spam, I can’t see the search engine not investigating more fully.
On a side note, I have seen a number of references by Google in patent filings to something known as PLSI, or probablistic latent semantic indexing, but that’s a very different animal than LSI, and I’ve never ever seen a reference in Google’s patent applications to LSI.
I think it comes down to numbers. If Microsoft really wants to compete with Google by hiring thousands of manual reviewers then fair play to them. I just can’t see anything as basic as this holding water.
As far as LSI is concerned you are 100% correct, there is nothing patented by Google with regard to LSI. It’s just an assumption held by a great number of SEO professionals that it’s there. It’s born the whole idea of using related phrases on the same page in order to not appear spammy. Whilst us white hats play by the implied rules there are plenty of other webmasters out there making a forune by good old fashioned keyword stuffing.
The point I’m trying to make is that the search engines tend to let stories like this circulate the webmaster community and it’s overly easy to jump to conclusions and hang of rumour and implication.
I thought the same thing as one the PP, what is to stop a competitor from blasting your site all over forums like that in hopes of getting you penalized? I like the idea in theory, but reality makes implementation seem quite difficult.
Bill: “Weâ€™ve been told by Matt Cutts, and others at the search engines to do things like disclose paid relationships through things like nofollow values for rel attributes in links, so that the search engines can understand that a link is paid for. ”
Bill, that is absolute nonsense. There is no way you can “disclose” that a link is paid for by using “rel=’nofollow'”. If that were truly the case, then every blog and forum and social media site that has implemented “rel=’nofollow'” is claiming that it sells links.
This ridiculous disclosure subtrefuge should be eschewed by everyone — you don’t disclose anything by putting nofollow on a link. All you do is block the PageRank and anchor text from flowing.
It’s possible that Microsoft would use this as a way of identifying possible link exchanges, but then look for additional information. If they did, that might be a safer approach, less likely to be abused.
The question I was asked was whether or not search engines minded that links are paid for, and my answer was that they didn’t as long as there was some way to indicate that. Google has stated in a number of different places that the use of nofollow would do that. Here’s what they say on their About rel=”nofollow” page:
* My emphasis
Their page on Paid Links doesn’t use the term “machine-readable disclosure.” but it does recommend the use of nofollow on those links.
The use of “rel=”nofollow” was originally for sections of web sites where webmasters enable visitors to add content and links in an attempt to combat link spam. I didn’t think that the approach would be all that helpful at the time.
People also started using “rel=nofollow” to try to manipulate the flow of PageRank through pages of their site instead of taking the steps to build well structured sites. I also haven’t thought much of that idea.
I agree with you that the use of “rel=nofollow” is a poor mechanism for disclosing paid links (as well as a poor method of fighting link spam and trying to manipulate the flow of PageRank through a site), but I’m not making that argument. And I’m not making the statement that the use of “rel=nofollow” anywhere is an indication to search engines of paid links, as opposed to someone trying to not pass along PageRank for one reason or another.
But we have been told by Matt Cutts and by Google that they will interpret it as a “machine-readable disclosure.”
Going one step further, it bothers me that the word “nofollow” was used in the first place. Since it can be easily confused with the meta robots “nofollow” value, it makes things worse instead of better. The same with the use of the word “sitemap” for XML sitemaps.
Very interesting article. I was under the impression that exchanging links was a legitimate way to build links, not so?
There normally isn’t anything wrong with exchanging links, but sometimes search engines might have problems with them.
I understand that Google recently changed it’s webmaster guidelines to include the use of the word “excessive” when referring to the exchange of links. One line on their links schemes page of “things to avoid now reads:
Businesses that have relationships or common owners often do exchange links. It’s often a good thing to link to businesses that provide services or goods that complement what you offer – doing so provides value to your customers. And it wouldn’t hurt if they linked back if a link to your site was a useful resource on theirs. Blogs that cover the same or related topics often link back and forth to each other in blogrolls as well, and search engines also likely see no problems with those links.
It’s when sites seem to link back and forth solely to attempt to increase each others rankings in the search engines that a problem arises. Many reciprocal link exchange programs have shown up on the Web, where people exchange links or pay for links, pretty much to show up higher in search results. Sometimes these types of exchanges show up in places like forums, and sometimes they are advertised on web sites. The search engines are paying attention to them. Exchanging links in an “excessive” manner can create the perception that you may be linking solely to gain rankings in search results.
they should not consider forums or make all forums a no follow. in that case internet marketers will not spam the forum.
I admire the time and effort you put into your blog and detailed information you offer!
Thank you. The patent isn’t about preventing forums from being spammed, but rather how a search engine might use an automated method of trying to find out information about link exchange programs or paid links offers on forums.
They have been doing this for years. We have all seen proof of it in one way or another.
Why wouldn’t they cross reference forum material programatically?
They already scrape our content. Its just a matter of building a filter and cross referencing formula.
They do it. We know it.
As for the patent? My guess is one of their desk jockies got bored one day, and had nothing else to do.
There is also the possibility that they are leading into more relaxed rules/laws on how they use our content,
and putting a patent in to do this could potentially open the pandora’s box later.
Google should understand that Link Exchange or Posting Links in Forums are the only ways for smaller website to get backlinks.
Why would any website will link to any small unknown website even if it has good content unless its getting a backlink. Google has gone mad.
I understand Google wants to end paid links because it hurts its business but killing Link Exchange and Links in Forums Signature Are is stupid, now Google may even have problems with links in comments.
I think Google is trying to force everyone to forget SEO and advertise with it. Surely Google is Evil and I hope Bing will kill this evil.
Very Good Blog Bill I know you won’t agree with me on this but I think Google is wrong and Evil.
I’m not sure why anyone felt that there was a need to apply for a patent to do this either. But, having a patent filing from one of the search engines that describes how they might identify link exchanges and offers for paid links on forums provides us with a primary source from one of the major search engines that we can point towards when someone asks us about whether or not they should participate in a discussion about exchanging or buying links on a forum.
I’ve seen sites which appeared to be penalized by at least one search engine, where link exchanges where offered to visitors on the pages of those sites, and after the removal of those offers and links acquired from those were removed, the pages appeared to be no longer penalized after a while. Having a resource directly from one search engine (other than search engine webmaster guidelines), that can be shown to people in that situation can help influence a decision to remove that kind of content on pages.
Thanks. There are other ways to get backlinks for smaller web sites, than participating in discussions to exchange links or buy links on forums.
The patent filing didn’t say that people shouldn’t include links to their pages in signature files at forums, though there are many other ways to get links to your site as well.
The patent application is from Microsoft (Bing), and not Google. But consider it quite possible that Google and Yahoo can see those forums as well, and may be paying attention to discussions regarding exchanging and buying links.
I think link exchanges are a difficult one, as decent PR sites never want to spread their links around too much, and people do like to hoard it somewhat. Fair enough though, they’ve earned it. What I have found with sites though is that by going dofollow they can actually increase traffic massively, so PR ceases to matter and the presence of dofollow saves them.
The times of Black Hat SEO are over. You have to provide real content to make your site interesting.
And also when you post on Blogs to get a backlink to your site, try to write something usefull.
I have a couple of blogs my own, and the comment spamming is really annoying.
Hi Orchid Box,
I’d rather have less comments and better conversations – its not a question of “hoarding” PageRank, or increasing traffic. If someone leaves a thoughtful or meaningful comment, I’m happy to see it – that’s why I have comments enabled here.
I do like people stopping by and commenting here, but I’m not really fond of automated and manual comment spam, with anchor text instead of names in the name fields, and canned comments that they’ve posted over and over elsewhere, or comments that show that they haven’t bother to even read the post that they are commenting upon.
I think that this is just another attempt by Google to dictate the way that the internet “looks and feels”. This can happen when one comapany controls 70% of the market. Hopefully, increased competition from MSN/Yahoo will provide a better experience for all internet users.
The patent filing is actually from Microsoft, and not Google. So it shouldn’t be a reflection on Google. However, I wouldn’t be surprised Google and Yahoo were also both doing something similar.
It wouldn’t surprise me but I doubt if the spiders target things as spam. They probably have a department of forum surfers who do it manually.
As far as the fellow saying forums and link exchanges are the only way for the little guy to get links, that’s nuts. The web is full of places to get links it just takes a little work and effort.
I don’t have any issues with the search engines hunting down the paid link posts people KNOW that Google doesn’t like it and if they’re dumb enough to publicize it then Darwin is going to go to work.
In part, I think you’re right about reviewing for spam on forums.
I think it’s safe to say that search engine would rather automate the process of crawling web pages, identifying useful, unique and indexable information to index, and also identifying information that might be spam and either putting it aside for further review or skipping past it completely. With billiions of URLs on the Web, relying upon manual reviewers could become pretty costly in terms of time, manpower, and expenses. But I wouldn’t be surprised if some forums were visited regularly by some search engineers. We do know that Webmaster World was frequented on a regular basis by someone from Google calling himself Googleguy, who would even participate in some threads.
The web is full of a variety of opportunities to get links that do go beyond link exchanges and purchases promoted in forums.
I am in complete agreement with them doing this. It would be a smart and efficient way to get an idea of the link neighborhoods that spammers are using to manipulate the engines. Bill, what are your thoughts on link neighborhoods as a whole (myth or fact), and if Google actually devalues sites based on IP range?
Hi Answer Blip,
I’ve always had a suspicion that someone from the search engines has been keeping an occasional eye on at least some of the major forums involving design and search, not only to learn about spam, but also to learn about what people think of their search engines, to understand better some of the challenges of designers and developers, and so on. Automating some of that process isn’t a bad idea at all.
We’ve been told for many years by Google to avoid linking to bad neighborhoods, but without any real description from Google as what a “bad neighborhood” might actually be. There have been a number of whitepapers released, like Stanford/Yahoo’s paper on Trustrank from 2004 Combating Web Spam with TrustRank, which looks at links between sites, and tells us that some sites might be devalued. It’s not the only paper like that out there, and many of the AIRWeb papers (for workshops on Adversial Infomation Retrieval) discuss similar topics.
I don’t believe that Google would devalue sites based upon IP range alone, but that might be one hint amongst a number of others that there might be some kind of shared ownership or other relationship between sites, and if there were “problems” with some of those sites, there could be an issue if some of the other signals are there, such as excessive linking between those pages, and certain kinds of content. I do like to look at which sites are on the same IP address as one that I might be working upon to see what else is there.
@dwippy @Answer Blip
The first thing your competitor is going to do is go post your URL in DP asking for a link exchange. Guess who will be the first to find your request and URL together…Google…and then, down you go. You will not know until your traffic drops to zero, but the damage will have been done at that point.
Scraping forums for link-exchange offenders is a terrible idea and is one anti-spam tactic that will be immediately abused.
Chances are that the search engines have been “unofficially” looking at places like forums that offer link exchanges and purchases for a while, before anyone had a notion of possibly patenting the idea.
I’d imagine that this kind of information on its own wouldn’t be enough to cause a search engine to penalize a site, but it might be a good place for a search engine to attempt to start investigating. There’s some potential for someone to try use something like this to attempt to intentionally harm competitors. Hopefully the search engines have taken that into consideration.
Since 2001 I’ve become a member of numerous webmaster forums and communities, and I always left links to my websites in my signature. Because it is natural. Because if a web forum provides an opportunity to insert a link, why a user shouldn’t use it? And now some smart brains from MS want to penalize me for doing it? This doesn’t sound clever, ha?
It’s likely that the major search engines have been keeping an eye on some of the larger webmaster forums and communities in some fashion for a fair amount of time, regardless of whether they’ve been doing it manually or using an automated system like the one described here.
Does Microsoft actually use a system like this one? I would suspect that they might.
Comments are closed.