We can make your web site easier to find, and easier to use.

How Google Might Disassociate Webspam from Content

Manipulative repetitive anchor text, blog comments filled with spam, Google bombs, and obscene content could be the targets of a system described in a patent granted to Google today that provides arbiters (human and possibly automated), with ways to disassociate some content found on the Web, such as web pages, with other content, such as links to that content.

A couple of images of screens from a content management system that allows someone to make judgments on comments associated with a video and on whether search results for a particular query appear to be manipulated.

In an Official Google Blog post, Another step to reward high-quality sites, Google’s Head of Webspam Matt Cutts wrote about an update to Google’s search results targeted at webspam that they’ve now started calling the Penguin update. The day after, I wrote about some patents and papers that describe the kinds of efforts Google has made in the past to try to curtain web spam in my post Google Praises SEO, Condemns Webspam, and Rolls Out an Algorithm Change.

The patent doesn’t describe in detail an algorithmic approach to identifying practices that might have been used to manipulate the rankings of pages in search results. Instead it tells us about a content management system that people engaged in identifying content impacted by such practices might use to disassociate certain content with webpages and other types of online content.

Continue reading How Google Might Disassociate Webspam from Content

Google Praises SEO, Condemns Webspam, and Rolls Out an Algorithm Change

Yesterday, Google’s Distinguished Engineer Matt Cutts published a post on the Google Webmaster Central Blog titled Another step to reward high-quality sites that started out by praising SEOs who help improve the quality of web sites they work upon. The post also noted:

In the next few days, we’re launching an important algorithm change targeted at webspam. The change will decrease rankings for sites that we believe are violating Google’s existing quality guidelines.

We’ve always targeted webspam in our rankings, and this algorithm represents another improvement in our efforts to reduce webspam and promote high quality content.

This isn’t something new, but it sounds like Google is turning up the heat some on violations of their guidelines, and we’ve seen patents and papers in the past that describe some of the approaches they might take to accomplish this change.

Continue reading Google Praises SEO, Condemns Webspam, and Rolls Out an Algorithm Change

How a Search Engine May Automate Web Spam Reports and Search Feedback

How much does feedback from searchers impact the search results that we see at Bing or Google? How do those search engines process and respond to that feedback?

The links that Google and Bing present for searchers to provide feedback on search results are listed at the bottoms of the search results pages for each. If there was a link instead after each search result where someone could provide feedback, how much of an impact would that change have, and would the search engines be able to handle the feedback that they receive?

A patent granted to Microsoft this week describes how the search engine may automate processes for “dissatisfaction reports” that are manually submitted by searchers, and how the search engine may file its own disatisfaction reports in some instances. While some of the feedback that search engines receive may include web spam reports, they may also receive feedback that something is “broken” with the search engines, or that a URL that should be showing for a specific query isn’t, or that the results just weren’t helpful.

Providing Feedback at Bing and Google

Continue reading How a Search Engine May Automate Web Spam Reports and Search Feedback

How Google Might Filter Out Duplicate Pages from Bounce Pad Sites

I hadn’t heard the term “Bounce Pad” being referred to websites before, but it’s useful knowing the language of search engines, and the things they might look for when crawling and indexing webpages, and serving results to searchers. Determining whether a site is a bounce pad involves an analysis about redirects appearing on the site, like in the image below from a Google patent granted this week:

Screen shot from the bounce pad patent showing calculation of redirect score and spam score to determine whether a site is a bounce pad.

One of the mysteries associated with Google’s search results is how it determines which pages to show when there are duplicate or substantially duplicated documents within its index. A search engine doesn’t want to show searchers a list of search results that contains substantially the same pages, so when it finds pages that are pretty close to being the same, it will create a “cluster” of those pages and choose a representative page to display.

That kind of duplication can happen for a number of reasons, such as someone copying content from another page (with or without permission or license to do so), the majority of the content on a page being a manufactor’s or publisher’s description, a content management system set up so that the same page gets published more than once at different URLs, content being republished on a mirror site or sites set up so that if there’s too much traffic to one of the sites that the others may handle overflow, and more.

Continue reading How Google Might Filter Out Duplicate Pages from Bounce Pad Sites

Document Level Classifiers and Google Spam Identification

There have been a number of news opinion pieces and blog posts appearing on the Web in recent months telling us that Google has become less useful because of web spam from pages scraping content from other site as well as from low quality articles on content farms. Google’s head of Web Spam, Matt Cutts responded to those criticisms by announcing some new efforts at Google to make those kinds of pages not rank as well in search results. From the Official Google Blog, on January 21, 2011:

As we’ve increased both our size and freshness in recent months, we’ve naturally indexed a lot of good content and some spam as well. To respond to that challenge, we recently launched a redesigned document-level classifier that makes it harder for spammy on-page content to rank highly.

The new classifier is better at detecting spam on individual web pages, e.g., repeated spammy words—the sort of phrases you tend to see in junky, automated, self-promoting blog comments.

Continue reading Document Level Classifiers and Google Spam Identification

Irony, Thy Name is Microsoft

Microsoft was granted a new patent today, Search ranger system and double-funnel model for search spam analyses and browser protection (US Patent 7,873,635), which provides a detailed look at how Bing might attempt to identify search spammers who redirect traffic from search results pages to pages filled with advertising or other content intended to earn the spammers some money.

The patent uses Google’s Adsense as an example of the kind of advertising that these spammers might use in one of these cloaking schemes.

Ironically, Google’s Matt Cutts also uncovered an interesting Bing affiliate scheme today, from a company that Ad Age calls FaceBook’s third largest advertiser in the third quarter of last year.

Continue reading Irony, Thy Name is Microsoft

How Google Might Fight Web Spam Based upon Classifications and Click Data

When you enter a set of keywords into Google, the search engine attempts to find all the pages that it can which contain those keywords, and return a set of results ordered based upon a combination of relevance and importance scores. But it’s possible that many of the pages that could possibly be returned in response to such a search may not be very good matches for a topic related to the query terms used, or may be spam pages.

According to a Google patent filed in 2006 and granted today, around 90 percent of web pages that could be returned for topics such as computer games, movies, and music are spam pages, which exist only to “misdirect traffic from search engines.” The patent tells us that those pages are usually unrelated to those “topics of interest” and try to get a visitor to purchase things such as pornography, software, or financial services.

The patent presents an automated process that might be used by the search engine to classify documents based in part upon user-behavior data, to help weed out web spam.

Continue reading How Google Might Fight Web Spam Based upon Classifications and Click Data

How a Search Engine Might Crowdsource Web Spam Identification

The term crowdsourcing was coined by Wired correspondent Jeff Howe, in a 2006 article titled The Rise of Crowdsourcing, where he described how a crowd of people might use their spare time to help in solving problems or creating content, or in addressing other issues that a single person or organization might have difficulties addressing on their own. Could a search engine effectively rely upon searchers to help clean up web spam in search results?

A crowd of people milling about, waiting on Lincoln's second inauguration speech.

What if search engines added a “feedback” button to every page that they showed in search results where searchers could report pages in those results as web spam? Or, if they added a spam button to their toolbar that searchers could click upon to indentify pages they found through a search as spam?

Continue reading How a Search Engine Might Crowdsource Web Spam Identification

Page 1 of 41234