These obfuscation techniques limit the effectiveness of static analysis and static feature based systems.
There are legitimate reasons to use redirects on Web sites, and there are less than legitimate reasons. The paper details both legitimate reasons for redirects as well as questionable reasons. Here’s an example of one:
Doorway pages are used by both legitimate and spam sites to improve rankings for certain search terms. The doorway page is specifically designed and optimized to rank high for certain search terms. Doorway pages can improve user experience by introducing the site to the user and clearly stating what the site is about.
However, the problem occurs when the site targets terms that are completely inappropriate to the site’s topic. Visitors who search on those terms may click on the doorway page, but then are quickly redirected to a spam site.
The authors took a list of the top 5000 most popular English queries, and found the top 200 search result URLs from each using Live Search (search.live.com). That gave them a set of 782,937 unique URLs which they then labeled as being popular.
To explore the use of redirection on blogs, they decided to focus upon sites at blogspot.com. They used what they believed were the top 100 most monetizable keywords from Live Search to extract 934,876 blog sites which contained one or more of those keywords in the subdomain area of the blog’s URL –
So blogspot really are all spammers 😉
Not necessarily. The sites that were looked at weren’t choosen at random, but instead were specifically choosen because they might be more likely to have some kind of redirect upon them.
The ones chosen were pages which had terms in the subdomain name that were popularly chosen as keywords in ad campaigns. The numbers cited in the study don’t apply across the board to all blogspot domains, but rather to ones that were more likely than not, according to assumptions made by the researchers, to contain some kind of web spam.
Hehe I stand “corrected” 🙂
It was an easy conclusion to jump to.
I’ve seen a few blogposts and forum threads where people talked about starting to blog at blogspot, and then deciding that they really would rather be on a wordpress or movable type blog because it offers more features, and the use of plugins.
It’s difficult to do some kind of redirect from those blogspot pages to a new blog, and the use of something like a 301 http server code isn’t a possibility at blogspot. I’m wondering at this point how many of the “spam” pages that were uncovered in this research actually used redirects to point to new versions of blogs.
Operators of domains whose reputation is worth stealing now need to be aware of any means of redirection through their domain. There are “open redirector” exploits. There’s a an exploit of the Google “I’m feeling lucky” feature. There are exploits of login pages which attempt to return the browser to the page before the login. There are redirects involving .swf files uploaded to photo-sharing sites. All these holes need to be plugged.
We now have SiteTruth down-rate the entire base domain when PhishTank reports an active phish anywhere in the domain. It’s drastic, but it works.
Thanks for the warning on some of the other potential abuses that may happen to a site.
I wonder if we might see a followup to this paper that covers some of the other exploits that you mention. It seems like a good next step. Maybe the researchers involved in this one might find that topic worth pursuing.
Comments are closed.