A paper prepared by Microsoft researchers at the AIRWeb’07 conference this past May explores some methods that a few people use to try to trick search engines. The paper, A Taxonomy of JavaScript Redirection Spam (pdf), provides a nice overview of those methods.
In this paper, we study common JavaScript redirection spam techniques on the web.
Our findings indicate that obfuscation techniques are very prevalent among JavaScript redirection spam pages.
These obfuscation techniques limit the effectiveness of static analysis and static feature based systems.
Based on our findings, we recommend a robust counter measure using a light weight JavaScript parser and engine.
There are legitimate reasons to use redirects on Web sites, and there are less than legitimate reasons. The paper details both legitimate reasons for redirects as well as questionable reasons. Here’s an example of one:
Doorway pages are used by both legitimate and spam sites to improve rankings for certain search terms. The doorway page is specifically designed and optimized to rank high for certain search terms. Doorway pages can improve user experience by introducing the site to the user and clearly stating what the site is about.
However, the problem occurs when the site targets terms that are completely inappropriate to the site’s topic. Visitors who search on those terms may click on the doorway page, but then are quickly redirected to a spam site.
The paper also explores different types of approaches to using javascript within those questionable areas. What makes this study work well is that the authors actually performed a study involving a large number of web sites to see how prevalent this kind of redirection using javascript might be. When a random sampling of web pages only resulted in tens of pages using javascript redirects, a different method was used which focused upon exploring more popular pages.
The authors took a list of the top 5000 most popular English queries, and found the top 200 search result URLs from each using Live Search (search.live.com). That gave them a set of 782,937 unique URLs which they then labeled as being popular.
To explore the use of redirection on blogs, they decided to focus upon sites at blogspot.com. They used what they believed were the top 100 most monetizable keywords from Live Search to extract 934,876 blog sites which contained one or more of those keywords in the subdomain area of the blog’s URL –
Of the popular sites, 1 in 288 (2,712 / 782,937) contained a javascript redirect. The incidence was much greater in those blogspot pages, with 1 in 130 (7,196 / 934,876) of them containing javascript redirects.
So blogspot really are all spammers 😉
Hi Joost,
Not necessarily. The sites that were looked at weren’t choosen at random, but instead were specifically choosen because they might be more likely to have some kind of redirect upon them.
The paper was more of a study to see what kind of javascript spam redirects were in use on popular and blogspot pages than it was a survey of where spam appears on the web. Blogspot pages were choosen as a subject because others had pointed them out as a place where spam might appear, but this study only targeted certain blogspot sites, and not all of them.
The ones chosen were pages which had terms in the subdomain name that were popularly chosen as keywords in ad campaigns. The numbers cited in the study don’t apply across the board to all blogspot domains, but rather to ones that were more likely than not, according to assumptions made by the researchers, to contain some kind of web spam.
Hehe I stand “corrected” 🙂
It was an easy conclusion to jump to.
I’ve seen a few blogposts and forum threads where people talked about starting to blog at blogspot, and then deciding that they really would rather be on a wordpress or movable type blog because it offers more features, and the use of plugins.
It’s difficult to do some kind of redirect from those blogspot pages to a new blog, and the use of something like a 301 http server code isn’t a possibility at blogspot. I’m wondering at this point how many of the “spam” pages that were uncovered in this research actually used redirects to point to new versions of blogs.
We’re seeing more exploitation of “open redirectors” on major sites. PhishTank currently has active phishes which exploit open redirectors on Google Maps, Microsoft Live, AOL, and eBay. These allow building URLs that will get through most spam filters. These attacks don’t even require Javascript; they’re just URLs with parameters.
Operators of domains whose reputation is worth stealing now need to be aware of any means of redirection through their domain. There are “open redirector” exploits. There’s a an exploit of the Google “I’m feeling lucky” feature. There are exploits of login pages which attempt to return the browser to the page before the login. There are redirects involving .swf files uploaded to photo-sharing sites. All these holes need to be plugged.
We now have SiteTruth down-rate the entire base domain when PhishTank reports an active phish anywhere in the domain. It’s drastic, but it works.
Hi John,
Thanks for the warning on some of the other potential abuses that may happen to a site.
I wonder if we might see a followup to this paper that covers some of the other exploits that you mention. It seems like a good next step. Maybe the researchers involved in this one might find that topic worth pursuing.