Category Archives: Web Spam

Google Maps Using Photos to Identify Spam?

A couple of interesting patent applications surfaced at Google recently, involving the use of photography in Local Search, to identify whether or not businesses actually exist, or might be closed, or might be Web Spam.

Google Street View Car
Google Street View Car in Bristol, Byrion Smith, Some rights reserved

The first of these looks at Street Views images, and is:

Systems and Methods of Correlating Business Information to Determine Spam, Closed Businesses, and Ranking Signals
Inventors: Andrea Frome, Howard Wellington Trickey, Melanie Clements, Ethan G. Russell, Paul Eastlund, Diego Ariel Gertzenstein, Douglas Richard Grundman, Baris Yuksel
Assigned to: Google, Inc.
US Patent Application 20150154607
Published June 4, 2015
Filed: February 24, 2011

Continue reading Google Maps Using Photos to Identify Spam?

How Google might fight Web Spam in Social Networks

For as long as SEOs have known, Google has had one person in charge of leading their fight against Webspam. His name was Matt Cutts, and his position had evolved over the years into that of being a mouthpiece for Google, speaking on actions that Google might take to fight web spam, and low quality content. Matt Cutts is presently on an extended leave of absence from Google.

my fingerprint (index, left hand), Stefano Mortellaro, Some Rights Reserved
my fingerprint (index, left hand), Stefano Mortellaro, Some Rights Reserved

News came out a few days ago, that Google would be replacing Matt Cutts as Google’s Head of Spam, but that news tells us that the new person in charge of WebSpam at Google wouldn’t be as vocal as Matt Cutts had been, nor reveal his or her identity.

Continue reading How Google might fight Web Spam in Social Networks

Google Turns to Deep Learning Classification to Fight Web Spam

In the past few years, Google has been busy building what has become known as the Google Brain team, which started out by having its deep learning approach watching videos until it learned to recognize cats.

Google has been hiring a number of people to add to the abilities of their deep learning team, including a pricy acqui-hire in the UK earlier this year, as described in More on DeepMind: AI Startup to Work Directly With Google’s Search Team

Web Spam Classification Patent

Continue reading Google Turns to Deep Learning Classification to Fight Web Spam

Google Patent Attacks Reverse Engineering of Local Search Listings

The title from a Google patent reached out and grabbed me as I was skimming through Google’s patents. It has the kind of title that captures your attention, as a weapon in the war that Google wages against people who might try to spam the search engine.

The title for the patent is Reverse engineering circumvention of spam detection algorithms. The context is local search, where some business owners might be striving to show up in results in places where they don’t actually have a business location, or where heavy competition might convince them that having additional or better entries in Google Maps is going to help their business.

The result of such efforts might be for their local listings to disappear completely from Google Maps results. The category Google seems to have placed such listings under is “Fake Business Spam.”

Google spam score flow chart from patent

Continue reading Google Patent Attacks Reverse Engineering of Local Search Listings

Google’s Paid Link Patent

There are things that we just don’t know about search engines. Things that aren’t shared with us in an official blog post, or search engine representative speaker’s conference comment, or through a publicly published white paper. Often we do learn some aspects of how search engines work through patents, but the timing of those is controlled more by the US Patent and Trademark Office than by one of the search engines.

For example, back in 2003 Google was filing some of their first patents that identified changes to how their ranking algorithms worked, and among those was one with a name similar to the original Stanford PageRank patents filed by Lawrence Page. It has some hints about PageRank and Google’s link analysis that we haven’t officially seen before.

If you want a bit of a history lesson you can see the first couple of those PageRank patents at Method for scoring documents in a linked database (US Patent 6,799,176) and Method for node ranking in a linked database (US Patent 6,285,999).

Continue reading Google’s Paid Link Patent

How Google Might Use the Context of Links to Identify Link Spam

With Google’s Penguin update, it appears that the search engine has been paying significantly more attention to link spam as attempts to manipulate links and anchor text to a page. The Penguin Update was launched at Google on April 24th, 2012, and it was accompanied by a blog post on the Official Google Webmaster Central Blog titled Another step to reward high-quality sites

The post tells us about efforts that Google is undertaking to decrease Web rankings for sites that violate Google’s Webmaster Guidelines. The post is written by Google’s Head of Web Spam, Matt Cutts, and in it Matt tells us that:

…we can’t divulge specific signals because we don’t want to give people a way to game our search results and worsen the experience for users, our advice for webmasters is to focus on creating high quality sites that create a good user experience and employ white hat SEO methods instead of engaging in aggressive webspam tactics.

Continue reading How Google Might Use the Context of Links to Identify Link Spam

Google Scoring Gibberish Content to Demote Pages in Rankings?

This week, Google was awarded a patent that describes how they might score content on how much Gibberish it might contain, which could then be used to demote pages in search results. That gibberish content refers to content that might be representative of spam content.

The patent defines gibberish content on web pages as pages that might contain a number of high value keywords, but might have been generated through:

  • Using low-cost untrained labor (from places like Mechanical Turk)
  • Scraping content and modifying and splicing it randomly
  • Translating from a different language

Gibberish content also tends to include text sequences that are unlikely to represent natural language text strings that often appear in conversational syntax, or that might not be in text strings that might not be structured in conversational syntax, typically occur in resources such as web documents.

Continue reading Google Scoring Gibberish Content to Demote Pages in Rankings?

Google Granted Patent on Invisible Text and Hidden Links

As long as there have been search engines, there have been people trying to take advantage of them to try to get pages to rank higher in search engines. It’s not unusual to see within many SEO site audits a section on negative practices that a search engine might frown upon, and Google lists a number of those practices in their Webmaster Guidelines. Linked from the Guidelines is a Google page on Hidden Text and Links, where Google tells us to wary about doing things such as:

  • Using white text on a white background
  • Locating text behind an image
  • Using CSS to position text off-screen
  • Setting the font size to 0
  • Hiding a link by only linking one small character—for example, a hyphen in the middle of a paragraph

Continue reading Google Granted Patent on Invisible Text and Hidden Links