Yahoo Robots Combat Spam and Phishing

Sharing is caring!

How much of a responsibility do search engines have to police the internet, and protect their users from email and instant message and web spam, phishing fraud, and misuse of chat rooms? How can search engines be socially responsible and work towards keeping consumers from harm?

How effective can they be in pro-actively identifying and fighting phishing attempts on the Web as they index the Web?

A newly published patent application from Yahoo tells us that people using the Web have an expectation that their services providers will help protect them from such abuses, and Yahoo describes how they might use robot programs to help identify emails, IMs, chat rooms, and Web sites where inappropriate activities take place.

The document goes into detail on how a Yahoo robot might interact with others in a chat room, and through email and instant messaging programs to identify spam and fraud, through phishing attacks aimed at obtaining passwords, credit card details, social security numbers and other confidential information.

These robots could passively or actively seek out abusive activity on the Web, and collect information about the abuse, such as “a sender’s network address, a sender’s account name/address, Universal Resource Locators (URLs) associated with the abusive communication or web page, or the like.”

Under this approach, a robot could log in to web pages, sign up for news letters, sign in to chat rooms and initiate conversations, receive emails and respond to them, and proactively take other steps to identify spam and phishing attacks.

Depending upon where and how the abuse takes place, Yahoo could take different responses to spam and phishing, such as blocking or filtering messages and web pages, warning the sender, or disabling accounts.

Dynamic Combatting of Spam and Phishing Attacks
Invented by Richard Sinn and Miles Libbey
US Patent Application 20080177841
Published July 24, 2008
Filed January 19, 2007


A self training set of robots are configured to proactively search for selective communication abuses over a network. Robots may enter a chat room to proactively send messages. The robots then analyze patterns and/or content of a received message for potential abuse.

Robots may also passively reside on/off line without publishing their network address. If a message is received, the message may be interpreted to be SPAM/SPIM.

Robots may also perform a variety of other actions, such as access websites, and analyze received messages to determine if the messages indicate abuse. If abuse is detected, information may also be obtained to enable blocking or filtering of future messages from the sender, or access to/from an abusive website.

The information also may be used to retrain robots, so that the robots may learn from and share their collective knowledge of abusive actions.

Fraud run loose on the internet makes the Web much less safe for all of us. It’s great to see Yahoo attempting to find ways to identify spam, phishing, and misuse of chat environments.

Resources on Phishing

In researching this post, I found a number of articles, blog posts, organizations, and academic papers about phishing, and about some of the technology being developed to address phishing. It’s by no means a complete list – there are many more resources on the web about phishing, but hopefully it’s a good introduction to the topic.

The Ocean is Full of Phish – An introduction to phishing, and phishing attacks, with some thoughts on technical solutions to stopping phishing.

The Anti-Phishing Working Group – A world wide organization “focused on eliminating the fraud and identity theft that result from phishing, pharming and email spoofing of all types.” Members include some of the leading technology service providers in the world, including Microsoft, Yahoo, MySpace, and eBay.

The Messaging Anti-Abuse Working Group (MAAWG) – Aimed at preventing online messaging abuse, another organization sponsored by very large technology groups. The press releases section provides many details of activities undertaken by members of this group.

Federal Trade Commission – Spam Summit: The Next Generation of Threats and Solutions – The staff from the Federal Trade Commission’s Division of Marketing Practices put out this report in November 2007, providing a look at spam and phishing from the perspective of the FTC.

Spam, Phishing, and the Looming Challenge of Big Botnets (pdf) – What dangers might be around the corner for consumers as spammers start utilizing huge networks of virus-compromised computers, called botnets? What does this mean to technologies intended to fight sphishing attacks, such as SSL certificates and DomainKeys/DKIM.

Combating the Botnet Scourge (pdf) – Information about botnets, and some basic and advanced defense approaches to address them.

A Forensic Framework for Tracing Phishers (pdf) – Some ideas on using phishing methods against those engaged in phishing, to help investigators and service providers to “pro-actively fight phishing online fraud by identifying and tracing the involved actors.”

CANTINA: A Content-Based Approach to Detecting Phishing Web Sites (pdf) – an approach to examining the content of web sites to identify ones used in phishing fraud activities.

One small step for email, one giant leap for Internet safety – Yahoo’s May, 2007, announcement of the use of Domain Keys Identified Mail to authenticate the original domain of emails from organizations that are using the technology.

Fighting phishing with eBay and PayPal – Google’s July, 2008, announcement of the use of Domain Keys Identified Mail in Gmail for messages from eBay and PayPal.

Phinding Phish: Evaluating Anti-Phishing Tools (pdf) – A paper from later 2006, which compares tools that help web surfers identify phishing activity, focusing upon the effectiveness of 10 popular antiphishing toolbars.

You’ve Been Warned: An Empirical Study of the Effectiveness of Web Browser Phishing Warnings (pdf) – A paper written by Carnegie Mellon University researchers for CHI 2008, held in April 2008, which explores the effectiveness of the warnings that anti-phishing programs provide, and ways to improve upon those warnings.

Sharing is caring!

9 thoughts on “Yahoo Robots Combat Spam and Phishing”

  1. You know this sounds very cool and I’m glad Yahoo is being proactive and doing things FIRST but there is something weird about this…something more esoteric. I mean reading the green quote block in the middle – the whole thing looks like it came out of a futuristic science fiction book where your shit is being read 24/7, your communication is being monitored by “robots” etc… lol maybe im just being weird.

  2. Hi Abdul,

    Thanks. There were a lot of interesting papers and pages on phishing and botnets out there – probably enough to fill at least a couple of books.

    Hi Srai,

    A lot of the patent filings and papers that I’ve seen discuss robots in the content of crawling pages to collect information for indexing. I haven’t seen many that discuss having robots log into pages, or answer emails, or participate in chat to find phishing activities. They are definitely something else.

    Hi Jordan,

    You’re welcome.

    Hi Adrian,

    The patent reminded me of Paul Ford’s (very) short story, Robot Exclusion Protocol.

    I don’t think that you’re being weird. When robots like the ones that index web pages start logging into email and chatrooms and webpages, and try to guess the intent, and index the activities of people, it does sound like science fiction.

  3. I am planning to move to SERP optimizing on yahoo and I found your post very informative thanks for taking time to discuss it

  4. SPAM and Phishing emails contain domains and emails. There has to be a way for users (or preferably an automated program) to block email’s with certain URL in them all together. For example, I have been receiving a lot of SPAM from domains such as,, A user should be able to block all email’s containing these domains. At the same time, the email provider should notify the registered organization or agent to stop these emails. This can easily be accomplished through Whois. Similarly, if a user reports an email whereas 5 or more other folks have reported it previously, the contained URL and/or email address should automatically be blocked and the owning organization is notified.

    In some cases, the SPAM includes an email address that comes from my own email provider. The email is sent to the SPAM box but the email address is not blocked/deleted. I do not know why.

  5. Hi Mike,

    All very good points. I wrote a post about an approach described by Microsoft where they looked at the domains that show up in email spam to identify web sites that might contain web spam. The post is at: Microsoft on Determining Search Engine Spam From Email Spam

    There are a number of free email services that have legitimate users and other people who may use those addresses to send spam – so blocking on a domain level might not be ideal.

    I know that filtering software does exist which will look for URLs included in email messages and block the transmission of those emails. There’s also a process known as email spoofing in which the “from” address isn’t actually the source of the email. That may be what you’re seeing when you see spam in your spam box from your own email provider – the “from” address is actually faked.

Comments are closed.