Google on Letting Searchers Remove Pages from the Web

In the future, we may all be able to join Google Engineer Matt Cutts in fighting spam on Google. Or at least in removing pages from our searches and browsers.

A new patent application from Google points at giving people the power to remove pages or even sites from web searches and browsing. Matt Cutts is one of the co-authors. (You may have seen this before as a Google experiment.)

Why remove pages?

Sometimes the search results include a web page that the user deems undesirable. This web page may be deemed undesirable by the user because the web page is spam, the web page relates to content unrelated to the user’s interests, the web page contains content that the user dislikes or finds offensive, or for some other reason.

The patent application is:

Removing documents

Invented by Sanjay Ghemawat, John Piscitello, Simon Tong, and Matt Cutts
US Patent Application 20070043721
Published February 22, 2007
Filed: August 22, 2005

Abstract

A system may present information regarding a document and provide an option for removing the document. The system may also receive selection of the option and remove the document when the option is selected. The system may aggregate information regarding documents that have been removed by a group of users and assign scores to a set of documents based on the aggregated information.

The description of how this might work is broken down into the following sections. Please keep in mind that this is a patent application, and the processes involved may be implemented differently, or not at all.

  • Overview
  • How a Removal Feature is Implemented
  • Removal Options
  • Creating A Remove List
  • Alternative Documents for Removed Pages
  • Removed Pages and Search Results
  • Buffy Example
  • Improving Search Results
  • Local Remove Lists

Overview

What it involves – A remove feature which a user can use to show they dislike a document.

Individual use – Remove the document from the user’s browser, including search results, so the user doesn’t have to view that document again.

Collective use – Removal information collected from a group of users to improve the quality of search results for the group or another group of users.

How it’s used – The remove feature may be implemented as part of a browser.

How a Removal Feature is implemented

The remove feature software may use:

  • a plug-in,
  • An applet,
  • A dynamic link library (DLL),
  • A bookmark, or;
  • A similar executable object or process.

The remove feature software be presented within a web browser window as:

  • A toolbar button, (from an add-on toolbar or a browser toolbar),
  • A menu item of a web browser,
  • A link (for instance, on a search results page),
  • A frame, or;
  • Other ways

It could be part of the browser or an interface between the browser and the web.

Activation of the remove feature software happens:

  • Automatically upon initiation of the web browser, or;
  • When instructed by a user.

Removal Options

Options on What to remove may permit a user to remove:

  1. The document,
  2. The site associated with the document,
  3. Any related documents, such as documents of the same type, hosted by the same server, associated with the same domain, or classified the same as the document, and/or;
  4. A document or a set of documents for a particular set of queries, subjects or categories of searches, types of document corpa (e.g., general web versus product search documents) etc., but not for others.

Options on how long may permit a user to remove it for:

  1. A single search,
  2. A single search session only (a sequence of queries or interactions by the same user),
  3. All searches/sessions, or
  4. A specific period of time.

Creating A Remove List

A remove list could be set up on your computer, or a server from Google, or spread out over both.

A page may be selected for removal when visiting the page, or from a list of search results presented from a search at Google or through a Google toolbar.

Upon making a removal selection, the person using this feature could make a choice amongst the options listed above about what they want to remove, and for how long.

Those options would be stored on a remove list, which could be associated with a specific searcher.

It’s appears from the document that there would be one remove list, with multiple options upon it, instead of more than one for different types of options, such as remove for a single session, or remove for all searches. But, it may also be possible to have separate ones, for different options.

Separate remove lists may be set up for each user, based upon a user identifier, such as:

  • An IP address associated with computer used by the user, or;
  • Login information associated with the user.

Alternative Documents for Removed Pages

If you have been using the remove feature, and you go to visit a web page:

1. A determination is made as to whether a document is on your remove list.

2. When you go to visit a page, and it isn’t on the remove list, you may see the page.

3. If the document is on the remove list, you could be redirected to an alternative document instead of the page you previously removed.

4. The alternate document that may be presented to a user might include a message letting you know that you had removed the document.

5. It may also present an option to access the document even though it is on your remove list.

6. If you choose to view the page, it may be shown to you without taking it off the remove list.

7. The alternate document may providethe option of taking the document off the remove list.

8. Selecting that document may take it off the list and present the unremoved page.

Removed Pages and Search Results

1. You type a query into a search box through the toolbar or at the search engine.

2. Your search results are scored and sorted and listed in response.

3. A detemination is made as to whether any of the results are on your remove list.

4. If none of the documents are on your remove list, then they will be presented to you.

5. If one or more of the pages are on your remove list, then the results might be modified so that those removed pages are filered out.

6. There may be an indication made somehow that specific documents were filtered from your results.

7. If the remove list (or part of it) is stored locally on your computer, it may intercept the documents that are removed, and cause them not to be shown.

8. If the toolbar filters some results, it may indicate which ones in a “distinquished manner.”

This last part of this section is important enough to quote directly instead of summarizing:

In addition, a search engine (e.g., the Google search engine) may optionally collect and track removal data from users and use such removal data to score documents for searches by all or a subset (e.g., geographical) users. For example, if a large number of users remove a certain search result for a given set of search queries, the search engine may use that information to adjust the score for that document (for those set of search queries and/or others). In other words, removal data may be used as a scoring signal by a search engine or other search application.

I’m not sure how many “signals” Google uses at this point to rank documents during a search, but remove list data may join those.

Buffy Example

There’s an example involving how this removal process works in the patent application (under the section labeled “example), using the page: www.upn.com/shows/buffy

It’s pretty straightforward, so rather than summarize it, I’ll just mention it here. Not sure why the authors chose that page. Maybe they’re fans of Buffy the Vampire Slayer (or maybe not, if they are removing the page from searches).

Improving Search Results

1. Google may collect remove list information associated with a group of users. If those lists are stored on a user’s computer, they may be transmitted to a server. If the remove lists are collected on Google’s server, that transmission doesn’t have to happen.

2. In one version, remove list information associated with a group of users might be aggregated.

3. As an example, remove list information associated with only “legitimate” users might be aggregated to reduce the effects of spamming.

4. Legitimacy of a user might be determined based upon:

  • Amount of time the user spent accessing the search engine,
  • Interactions of the user with the search engine,
  • Valid login information,
  • Whether the user has posted a bond or some sort of deposit (?),
  • Reputation or ratings of the user, or they are known about in some other manner,
  • A relationship (such as advertiser) with the search engine, and/or;
  • Other information that may distinguish a legitimate user from an illegitimate user.

5. Remove list information associated with “some” identified set of users may be aggregated. Examples might include:

  • Users within a particular geographic region (such as in the United States),
  • Users with a defined relationship such as friends within an online community like Orkut, address book contacts, users associated with a specific web site, users identified by a particular user), etc.

6. Google may assign scores for search result documents based partially upon the remove list information.

7. The scores could be partially based upon factors independent of search queries and precomputed, and partially dependent upon the particular search query involved.

8. The total score may be based on a function of one or more features of a document, such as an information retrieval (IR) score, a link-based score, and/or a remove list score.

9. The remove list score that would go into this total score may be determined based upon remove list information associated with a page. For example:

  • How many users that removed each document and/or;
  • How many users removed each document that another particular user first removed (where the particular user may be the user that provided the search query).

10. The remove list information associated with a group of users may be used to determine scores for the group or another group of users (including or separate from the group). Examples:

  • When remove information is aggregated for the group of “legitimate” users, the remove list information may be used to determine scores for documents for all users.
  • When remove list information associated with contacts within an address book and is aggregated, that remove list information may be used to determine scores for those users.

11. Google may combine the IR score, link-based score, and remove list score for a total score used to rank a document.

12. This removal of “undesirable” documents from search results may serve to improve a user’s search experience.

Local Remove Lists

In addition to working with web based documents, this remove list process could also be used for documents on a computer searched for with a desktop search.

Conclusion

This “remove this result” option has appeared in Google’s personalized results, and a couple of quick-eyed commentors covered it the first time around. Marc Hil Macalua wrote about it in a September 2005 post titled Remove Result Option in Google Personalized Search. Loren Baker covered this in Google Testing Remove Result and Spam Report Options.

Loren also points out a post that Matt Cutts wrote on this User Interface Experiment – UI Fun: Remove Result. Will “remove results” return? Guess that depends upon how the experiment turned out.

Added: I created a thread at Cre8asite on this topic: Google Meets Digg: Letting Users Remove (bury) Pages

Share

16 thoughts on “Google on Letting Searchers Remove Pages from the Web”

  1. Wow, fascinating, Bill.

    My guess is that the main issue of importance here would be that Google become extremely adept at determining legitimacy (so that Barnes&Nobles can’t claim Amazon.com results are spam, etc.).

    On the one hand, I really like this patent. It genuinely could improve the web! And, make life very tough for spammers. So, that seems great.

    On the other hand, I would be concerned about being wrongly categorized in a group of users to whom something is not seen as relevant, when it might be relevant to my particular search. Do you see what I mean?

    Great writeup on this. Really food for thought!
    Miriam

  2. Thanks, Miriam.

    I usually try to only assign one category to a post, but this one was tough. I ended up putting it under personalization rather than my Web spam category because the remove results experiment originally appeared in Google’s Personalized search, and it can be used for different groups of users.

    But when they expand it to the group of “legitimate” users, it becomes a way to try to defeat spam.

    One of the things that scares me about this is that people might remove pages during a search session not because they are spam, and not because they aren’t good pages, but rather to filter those out to find other results. If people start removing pages from results so that they can see minority or alternative results on a topic, sites that are rightfully authorities might be ranked lower.

    I suspect that a result removed from an individual search session would be treated differently than a result removed from all future sessions.

    I’d love to see the data Google collected during their experiment.

  3. Oh, yes, Bill,
    I see what you mean about the danger of that. Hmm. That’s a poser. In fact, that could even become a big problem if people simply don’t understand how to use the remove function properly or what it’s really for.

    Well, darn…it seems like it could be so helpful for combatting spam, but the dynamic between search engines and humans creates such a wide margin for error when it comes to judging what is really valuable.

    I can see this becoming a very big and heated topic of discussion!
    Kind Regards,
    Miriam

  4. Welcome to abuse ville! Think of the millions of computers hackers control to run DDOS etc. All they do is spit out an app to mark a certain website as “unwanted” from all those computers over a period of time and BAM another website removed.

    This one is ripe for abuse and my feeling is that it will probably NOT effect the regular search results. It would effect people who have “asked” to be effected by it. Meaning, I can tick a box to say “adjust my results based on world wide user opinion to help remove spam (recommended” or something such like. This still will cause issues with sites getting bombed out of the search engines.

    I guess they are just putting the patent in so that they have it rather than because they want to use it. God help us the day that this comes about on the main search.

  5. If these results are applied globally (even with the option to remove) then think what will happen to important sites with unpopular messages that get a knee-jerk negative vote. For instance, a product that shreds large standing trees in seconds. Yes, it is horrific but an excellent tool for wildfire fighting…

  6. David,

    There are a lot of processes described in patent filings that have the potential to be abused if not implemented in an intelligent fashion. I have to agree that this is one of those that could be harmful.

    One of the things I’ve heard most search engineers (that I’ve heard speak) say though, when it comes to looking at a single potential ranking signal is to keep in mind that this is just one thing that could be looked at, and there are many others, too.

    What if, instead of penalizing sites based upon their inclusion in many thousands, or hundred thousands or millions of removal lists, this flagged sites for review so that other signals could be looked at? What if by itself it really had little meaning, but in conjunction with other things it might?

    Andres,

    When I was reading this, I thought of people making claims that stories were being buried in Digg not because they were or weren’t good stories, but rather because of the source, or because some opinions express may not have been popular. In some ways, this is a little like the “bury” function on Digg.

    As I wrote above, it might not be too bad as long as it would be only a single facet of a many-factored approach.

  7. Google do invite spam reporting already, of course:

    “The best way to tell us about a site that doesn’t meet our guidelines is by submitting a Spam Report at http://www.google.com/contact/spamreport.html. Spam Reports are submitted directly to our engineers”

    …but they don’t use the information very well. Even the most obvious/outrageous/crude spammy sites can sit in more or less the same position in the search results for months, regardless of how many individuals use the spam report facility to “submit directly to [the] engineers”.

  8. Hi David,

    Good to see you.

    The feature reminds me of the bury feature of Digg, but the potential impact is much more serious. I would hope that Google would make ti pretty hard for a page to be adversely impacted.

    Hi Roy,

    Thanks for commenting.

    There are a couple of different ways to directly report spam to Google. There’s the page that you point to, and there’s another one that you can get through at the Webmaster Central tools after logging in. I’ve heard a couple of search engineers from Google mention that they give those priority because they know who the people are who are submitting them.

    I’ve heard that when a site is submitted as spam, Google may look at the page in question, but they will usually try to find a solution that will affect a lot of sites at the same time instead of one at a time.

  9. Pingback: » Gedanken zur personalisierten Suche
  10. Pingback: Google - All 2008 Nominees » SEMMYS.org

Comments are closed.