How a Search Engine May Automate Web Spam Reports and Search Feedback

How much does feedback from searchers impact the search results that we see at Bing or Google? How do those search engines process and respond to that feedback?

The links that Google and Bing present for searchers to provide feedback on search results are listed at the bottoms of the search results pages for each. If there was a link instead after each search result where someone could provide feedback, how much of an impact would that change have, and would the search engines be able to handle the feedback that they receive?

A patent granted to Microsoft this week describes how the search engine may automate processes for “dissatisfaction reports” that are manually submitted by searchers, and how the search engine may file its own disatisfaction reports in some instances. While some of the feedback that search engines receive may include web spam reports, they may also receive feedback that something is “broken” with the search engines, or that a URL that should be showing for a specific query isn’t, or that the results just weren’t helpful.

Providing Feedback at Bing and Google

The link to provide feedback at Bing is located in the footer on search result pages using the anchor text “tell us what you think”.

The footer for a Bing search result page, with a link users can click upon to provide feedback

If you click upon that link, a box popups up where you can tell Bing whether or not you found what you were looking for along with a box where you can provide more information (without providing personally identifiable information).

The footer for a Bing search result page, with a link users can click upon to provide feedback

Provide feedback, either positive or negative and click on the “send” button, and Bing will thank you for helping to make Bing better.

Google’s link is displayed under the pagination of results at the bottom of the search results pages with the text “Give us feedback.”

The footer for a Google search result page, with a link users can click upon to provide feedback

If you click upon that link, you are brought to a new page where you have a choice of different items you can report upon, as seen in the image below:

The text of a feedback page from Google

One of the things that I like about this page is that they also include a link to the Google Web Search Help Forum, so that you can also ask for help in addition to reporting something. You don’t always get a response from someone who works at Google when you choose that route, but often the responses you receive are helpful.

When you do provide feedback at Bing or Google, you usually won’t receive a response from anyone at either of the search engines, and we really don’t have any idea what happens with that feedback after that. The Microsoft patent gives us a little more insight into some of the processes that may take place.

The patent is:

Automatic diagnosis of search relevance failures
Invented by Li-wei He, Wenzhao Tan, Jinliang Fan, Yi-Min Wang, and Xiaoxin Yin
Assigned to Microsoft Corporation
US Patent 8,041,710
Granted October 18, 2011
Filed: November 13, 2008

Abstract

Search relevance failures are diagnosed automatically. Users presented with unsatisfactory search results can report their dissatisfaction through various mechanisms.

Dissatisfaction reports can trigger automatic investigation into the root cause of such dissatisfaction. Based on the identified root cause, a search engine can be modified to resolve the issue creating dissatisfaction thereby improving search engine quality.

Why is a searcher dissatisfied with search results? Here are some of the reasons listed in the patent:

  • The searcher knows of a good or relevant URL that isn’t returned in their search
  • URLs returned in results are irrelevant, dead, or duplicative
  • A site listed is malicious (contains spyware, viruses, etc.)
  • A site listed is from the wrong market

The Bing search engine may also submit its own feedback to this automated feedback system. It might do this by identifying pages that show up in search results that get little or no clicks for particular queries, and when they are clicked upon have visitors spending very little time on those pages before returning for a new search.

Benefits of an automated search diagnostic system

We’re told in the patent that this automated search diagnostic system can provide many benefits over conventional manual approaches to diagnostics, such as:

  1. Automation is more efficient because it can eliminate unnecessary waste of manual effort to perform repetitive diagnostic tasks.
  2. Accuracy is improved by eliminating guesswork that can lead to erroneous root cause categorization.
  3. Manual responses to feedback can result in inconsistency when different people take actions in response to the feedback.
  4. The system is more comprehensive since it can analyze the feedback received and help to prioritize responses and provide metrics to measure overall organizational performance.
  5. An automated system can be useful in revealing patterns that may suggest best fixes.

The automated system might classify the feedback received into a number of different categories and point to possible corrective actions.

It’s possible that while the people who end up reviewing feedback received may make some changes to search results on a case-by-case basis, there’s probably an emphasis upon identifying chances to make corrections and improvements that impact as many sites as possible.

The patent provides an example of one kind of investigation into feedback received in the case of someone reporting that a relevant or good URL isn’t being displayed in search results. The root cause might be investigated. It’s possible in that instance that:

  • The page in question hasn’t been crawled by the search engine (why not?)
  • The page wasn’t indexed correctly (Is there a reason why it wasn’t?)
  • The page didn’t make it into the candidate document set for a particular query (Does it contain all of the words in the query)

A dissatisfaction reporting system might be set up to have different levels of reports, so that for instance someone at the search engine might be able to provide highly detailed and explicit reports, while feedback from searchers might be more limited.

The patent also tells us that other approaches might also be used to enable search engines to receive feedback about sites listed in search results.

For instance, Google now provides users with the ability to both +1 pages that they see in search results and block sites they also see in search results. While those actions don’t give user the ability to provide the kind of detail that their feedback forms do, they are signals that Google is paying attention to. See, for instance the Office Google Webmaster Central blog post, High-quality sites algorithm goes global, incorporates user feedback:

In some high-confidence situations, we are beginning to incorporate data about the sites that users block into our algorithms.

The patent provides much more detail on how this automated feedback system might work, and is recommended if you’re interested in how a search engine might use feedback to reshape its search results or find ways to improve the results it receives based upon reports from its users.

Conclusion

I recently wrote about a Yahoo patent that described different ways that a search engine might measure Search Success, and one of the methods that was discussed as having a great amount of value was in viewing reports from searchers about their experiences while searching.

One complaint that I sometimes see or hear from people in comments here or elsewhere is that people who provide feedback to search engines by reporting problems in search results such as spam or irrelevant results is that they don’t see changes made or get responses from their reports.

I have no idea how much feedback the search engines receive, but imagine that a very small number of people actually do provide feedback in response to a very small number of searches. Given the volume of searches at Google or Bing or Yahoo, that can still mean that they are receiving a lot of feedback.

Finding ways to intelligently cluster that input and prioritize corrective actions makes sense. Trying to find ways to provide algorithmic solutions also makes sense in that the more problems that can be solved with a single change, the better.

How much of a role does this kind of feedback have in changing the search results that we do see?

A very recent interview by Eric Enge with Google Research Head Peter Norvig included some discussion of the algorithmic changes that take place at Google on a regular basis:

We test tens of thousands of hypotheses each year, and make maybe one or two actual changes to the search algorithm per day. That’s a lot of ideas, and a lot of changes.

It means the Google you’re using this year is improved quite a bit from the Google of last year, and the Google you’re using now is radically different from anything you used ten years ago.

I couldn’t help but wonder though, what if Bing or Google added a “feedback” button after every search result, and provided searchers with the kind of detailed reporting interface that we see above from Google. And what if the search engines had a response system that could effectively manage those responses?

How much would that impact the search results that we receive?

Share

21 thoughts on “How a Search Engine May Automate Web Spam Reports and Search Feedback”

  1. it’s way to time consuming to “fill out” something and provide enough feedback to make any difference to relevance, ranks or anything for one individual website unless you found spam or pornography or casino site – ideas like Google +1 and Facebook Like are the future of “quick loading” feedback that WILL make a rank difference for individual (also ir-)relevancies . Also the 5 star rating in Google places is nice and comfy to use so it works, but the provide feedback is more of a pulsating link that millions click every day because they have something to complain about and most of the time it’s rubbish like: why is my website not on page on when I search it” so NO, nobody in the world can be asked to do the slaves job of trying to filter out 1% useful feedback from this. I can imagine that it is more like a alarm bell that always rings so Google doesn’t take it serious (like the story of “The Boy Who Cried Wolf” lol) but if something is broken on Google which happened sometimes like when they tried to cache Flash and half of SERP were results that had flash code within description so this “silly boy” would have gotten attention because he was “squeaking” when the heat of the clicks on this button would have doubled or tripled, so ONLY THEN some editor would have said, oi were getting an unusual flood of emails from “the bored shepherd boy” =)

  2. Hi Ron,

    Good points. The patent does mention that they might also implement feedback mechanisms like voting tools.

    I do think that if feedback buttons were available after each search result that there would be people who would take the time to use them, even though it could potentially be time consuming. I also thing that if that feedback were tied to a Google Acount, so that people logged in to provide feedback, it might lessen further the amount of feedback received, but could potentially result in more valuable feedback – at the very least feedback that could be weighted based upon previous contributions and feedback from the person leaving it, and could possibly find some weight in some kind of user rank or author rank based upon other signals such as that person’s interactions with local search reviews and contributions at social networks such as Google Plus.

    The point behind providing some level of automation to this system would be so that each piece of feedback wouldn’t be viewed independently, but rather might be clustered in a manner so that, for example, if lots of feedback was received about the search results for a particular query was received for a large number of searchers, that might potentially be prioritized.

  3. Hi Dewaldt,

    Thanks. Funny that you’re providing feedback on a post about feedback. :)

    You’re the first person to ask about it, but I did increase my font size after playing around with it and deciding that I liked it better. I also changed the font type being used as well, and thought that it looked better with the larger font size. I think it gives the site a somewhat different feel. Happy to hear that you liked the change.

  4. Thanks for the up-building feedbacks guys! =)I’ll write more often when I get a chance to, but I have started my SEO biz only a few months back and I have become busier than I had ever thought I would and grew to one of the highest web people n Australia. I helped many in OZ and also in NZ to rank number 1 for major keywords, so I manage to prove that I do have a “touch” for this. I would be happy to find like-minded people and even maybe ask for some support.

    Shalom, from OZ

    :P

  5. Great research, i did not know they even had a feedback form.
    It would be interesting to see how many people actually use the feedback buttons on google. as i guess most like my self just retype the query if it does not produce the expected results.

  6. Hey Bill,

    Automated web spam reporting would be a welcome introduction, I’ve thought about this numerous times over the past year after becoming just so frustrated at the ever increasing amount of complete junk on the web. There are a number of methods that could help automate this process to help reduce the hundeds of thousands of automated spam being published every day on the web.

    There are the obvious factors such as:-

    * Checking IP/host to determing details used to create sites and compare timestamps.
    * Checking Who is / domain registration details to compare frequency of new sites being created.
    * Instances of CMS installs on the same domain (sub domains/sub directories).

    Then there are common factors used by web spammers for speed:-

    * Does the website/blog use a default/standard CMS theme?
    * Has the web author updated their ‘About’ page or does it remain the default blurb?
    * Is the content of the site scraped from another source?
    * Does the site have any of it’s original content or is it just regurgitated feeds?

    What I would like to see is a funnel path for web spam:-

    Every page on the web must…

    1. Pass conflictions or discrepancies with associated content created at similar time.
    2. Pass all unique content checks and not be flagged as being duplicate content.
    3. Pass all criteria set for quantity of external links and the authority of each.
    4. Pass all relevancy of associated sites in neighbourhood and external linking.
    5. Pass all criteria set for ads to copy ratio.
    … Many other numerous stages could be added to this filter process.

    If any web page doesn’t pass the above criteria, just devalue the entire page instantly. 9.99(reoccurring) I’m sure it would be this simple.

    @Rob – I just wanted to pick up on something you mention that it’s way to time consuming to fill out forms and submit feedback about the state of the search results. I conduct most of my browsing/surfing and any processes where I am excessively using search engines in Google Chrome, in Chrome, I use the ‘Personal Blocklist’ extension to block domains/hosts at one simple click of a button – it takes two seconds at most. As this extension was developed by Google, I’m pretty sure they are going to be paying attention to the data received from this. It does a great job at improving your search experience as you can block any spammy/low quality domains and no pages from this domain will ever appear in your search results again. Genius.

    I also am a regular submitter of the Blogspot Spam Report where you can report blogspot blogs to Google that are spammy/low quality just by entering the URL. It’s just nice to know there are some measures in place out there to report this spam.

  7. Hi Jason,

    It is kind of hard to find that feedback button on Bing search results, and Google’s isn’t all that easy to find either.

    I haven’t used either much myself, and do refine my queries when I don’t find what I’m looking for right away. I actually prefer performing a few searches using different terms and phrases because I want to try to make sure that I don’t miss something worth seeing.

  8. Hi Geoff,

    Thanks for the long and thoughtful comment. I suspect that we are seeing the search engines starting to consider the kinds of things that you’ve mentioned.

    Some of the things that you describe, I’m not quite sure that the search engines are quite sophisticated enough to handle yet, such as identifying the orginal source of content that’s been duplicated in more that one place. And I think there’s a role for the duplication of content in some contexts, such as a site that syndicates articles with permission, as a way of curating similar content from multiple sources in one place, or content that might be published by a legitimate news wire service that provides publishers with an opportunity to add to that content or customize it in some manner. Authorship markup is one approach that may help in that area.

    Algorithms like Google’s Panda seem to have their sights set on bringing about those kinds of improvements as well.

    Adding the ability for people to +1 sites or block them in search results doesn’t provide the ability to provide feedback in as detailed a manner as filling out a form, but they are definitely the kinds of things that make it easier for people to give Google an idea of the things they want to see, or don’t want to see.

  9. Hi Bill,

    Maybe the search engines aren’t quite as sophisticated to identify and assess each of these steps on an individual basis, but by introducing a funnel process, the results of the previous steps could definitely help a search engine draw a conclusion on anything it would have otherwise found difficult without.

    You’re right though, I think Authorship markup is one approach that is definitely geared towards the same thinking along with a number of other operations we are seeing from Google either rolled out or in the form of patents at this stage. :)

  10. The usual high standard of post that we have come to expect from you Bill. I am slightly embarrased to say that I was completely unaware that Google or Bing had this feature.
    I tend to agree with one of your other readers that social sites and likes are probably better for measuring.
    Glad to see the bigger font too. Very helpful for a blind old chap like me.

  11. Hi Geoff,

    I agree with you. By considering a number of different signals, the search engine can increase its confidence in making a decision and taking action. Authorship markup is a step towards an identity service that would help in a number of ways.

  12. Hi Steve,

    Thank you. It’s pretty interesting seeing what kinds of feedback mechanisms different websites have, and the processes that might be hidden behind those. With very large websites, giving people a chance to provide comments, suggestions, and so on might mean that the people handling those could be easily buried in they try to handle them one at a time.

    Happy to hear another positive vote for the larger font. Thanks.

  13. Hi Bill.
    Just a quick one today. Like many people, I have a number of websites/blogs that I visit regularly. Yours is among them.
    I have learned a lot from my visits here and for that I’m greatful. You always take the time to respond to my comments and I just wanted you to know, that means a real lot to me.
    Do you have somewhere that I could leave a testimonial? I would be more than happy to do so.

  14. Bill,

    I stumbled upon this post today and i was surprised to see that it had some elements of what I thought too – Google, why not a positive approach in getting user feedback?.

    I was thinking about framing the content than a feedback link against each result. And this framing being done for content at random times and locations for random users to prevent any kind of misuse. But otherwise voting or rating tools that captures the details would be better than a detailed form or survey.

  15. Hi Steve,

    Thanks very much for your kind words. I’m happy to hear that you’re finding value in the posts I write and in the comments here.

    I do have some testimonials up at my LinkedIn profile ( http://www.linkedin.com/in/slawski ), and you could send me one there. Or you could email me one, and I would be happy to publish it on my about page. I’ve been thinking about adding some there, and it would be great to have yours.

  16. Hi Rajesh,

    Thanks for the link to the discussion over at Webmaster World on blocking sites as a user feedback mechanism.

    The reason or purpose for people blocking sites isn’t necessarily really clear, and it’s possible that such an action isn’t based upon the quality of content found on those sites as much as it might be upon philosophical or religious or political disagreements, or upon reducing the amount of content found within search results based upon convenience, or many other reasons.

    I do think that some people would be happy to fill out a detailed form, while others may only invest enough time to simply cast a vote in some manner, and making it easier for people to fill out a form (whether long or short) when available could potentially add a lot of value to in terms of feedback that Google might use, especially when that feedback could potentially contain things that the search engineers at Google might not have anticipated.

  17. Interesting – I think the process is already automated, however….

    If you look at the typical Google Search page, they’ve got a ton of redirects and javascript lurking about to figure out what you’re looking at. They know what you click and, I suspect, how often you repeated your search or backed out and clicked on another result.

    In that environment, you should be able to detect webspam via:

    - High ranking records (per algorithm) that get few click-throughs (human detection in description)
    - If a click occurs, searcher exits and resumes their search
    - The exit occurs almost immediately (eg. second google search click within 15 – 20 seconds)

    Some of my friends who are into SEO suspect this is going on. Actually, I think I’ve got a spammer attacking one of my projects right now by simulating this tactic. One of my marginal keywords gets “bursts” of high bounce rate traffic – visible on a daily / hourly chart – on a page which hasn’t changed significantly since the site was lauched. The difference is statistically significant (page had a <10% bounce rate and long sessions in the first days of the site. For days where the spammer stays home, this occurs – however, on days where the spammer attacks my bounce rate moves to around 50% (70% for a several hour window). I track SERP's very closely (scrape several times/day, if there are fresh changes and/or funny business going on) and have noticed Google drops after several particularly vicious sessions….

  18. Hi John,

    I agree that Google is likely using a system like this, and that they probably already have some kind of “spam” identification in place based upon clicks, or lack of clicks.

    Google was granted a patent titled, “How Google Might Fight Web Spam Based upon Classifications and Click Data,” which I wrote about in:

    http://www.seobythesea.com/2010/08/how-google-might-fight-web-spam-based-upon-classifications-and-click-data/

    One aspect of it is that they might only use this process for topics that tend to acquire alot of web spam pages, so there’s a classification element at the front end.

    Regarding the incident with the person or people clicking upon one of your pages, if they aren’t switching IP addresses, browsers, etc., they might be giving themselves away.

Comments are closed.