Microsoft’s Approach to Identifying Quality Search Results Based on User Feedback?

When we see a change at one of the major search engines like the Panda update at Google, it’s not a bad idea to look at whether or not one of the other search engines has done something similar, or at least published some research on a similar approach.

A flowchart from the Microsoft patent showing how patterns involving user-behavior might influence dynamically improving search results.

Interestingly, a patent granted to Microsoft this week (though originally filed back in 2004) describes how the quality of search results might be judged, and those results possiblity changed, based upon user feedback. The patent is:

Automated satisfaction measurement for web search
Invented by Oliver Hurst-Hiller, Eric Watson, and Susan T. Dumais
Assigned to Microsoft
US Patent 7,937,340
Granted May 3, 2011
Filed March 22, 2004

Abstract

Context-based user behavior data is collected from a search mechanism. This data includes, for a given query, user feedback (implicit and explicit) on the query and context information on the query. A predictive pattern is applied to the context-based user behavior data in order to produce predicted user satisfaction data.

Data mining techniques may be used to create and improve one or more predictive patterns. Predicted user satisfaction data can be used to monitor or improve search mechanism performance, via a display reporting the performance or identification of any queries with a shared characteristic and sub-par user satisfaction.

A dynamically-improving search mechanism uses the predicted user satisfaction data to improve the performance of the search mechanism.

A whitepaper from Microsoft which shares an author with the patent and seems to be somewhat related is Improving Web Search Ranking by Incorporating User Behavior Information. We’re told in the paper that:

In this paper we explored the utility of incorporating noisy implicit feedback obtained in a real web search setting to improve web search ranking.

We performed a large-scale evaluation over 3,000 queries and more than 12 million user interactions with a major search engine, establishing the utility of incorporating “noisy” implicit feedback to improve web search relevance.

Implicit Feedback

Implicit feedback about how satisfied a searcher is with a web page that they found in a search result might be collected by a search engine. This kind of information isn’t provided explicitly by a searcher, but rather is implicit in the searcher’s actions or inactions.

For instance, someone printing a page that they’ve found through a search may mean that they found value in that page.

The print dialog from my printer on the patent page, as an example of implicit user feedback.

While there’s a chance that the reason they are printing may be because they found something of interest on the page unrelated to their search, chances are that there is some relationship between the search and the act of printing in many cases.

Some user behavior signals tracked may involve navigation through a site or the display of a page, such as when:

  • A hyper link has been clicked to navigate to a different page
  • The history is used for navigation to a different page
  • The address bar is used to navigate to a different page
  • The favorites list is used to navigate to a different page
  • A document has been completely loaded and initialized
  • Scrolling is taking place
  • A document is printed
  • A document is added to the favorites list
  • The window gains focus
  • The window loses focus
  • A window has been closed
  • The user selects, cuts, or pastes portions of the displayed page
  • Navigation back to the search results page

Other signals could involve:

  • User dwell time on a page
  • A new query initiated by the same user
  • Other sequences of user behaviors

Other information about searchers’ behaviors are likely to be collected such as when a hyperlink is clicked, the position of the link may be recorded, the size of the content involving that element (image, anchor text length, area of content where the link is located, and the type of content may be identified as well.

Explicit Feedback Data

The patent also tells is that it might consider using more explicit feedback to measure the satisfaction of a searcher with a page that they’ve found in search results.

This might be as simple as asking, via a dialog box, a question such as, “Did this answer your question?” and allowing a response to be entered.

The ability for searchers to block certain sites from their search results is an explicit form of user feedback, and in the Google interview I mentioned above, we are told that it’s a signal that Google looked at to see if their approach with Panda resulted in similar sites or pages being identified as less than satisfactory results.

Google’s +1 button might be seen as an explicit feedback mechanism, though as many have pointed out when writing about the button, the button presently appears in search results before a searcher visits a page, and it’s hard to determine how good a result might be before you visit it.

A recent Google advertisement that is being shown on television for Google Chrome shows a +1 button built into the browser. That may be one way to get around the problem of seeing the +1 button in the search results before you visit the page itself. Watch for the +1 button on the toolbar in the Chrome browser on the video below:

Predictive Patterns

The kind of implicit and explicit feedback above might be associated with the contexts in which they were found, and patterns identified with that information might be used to predict a user’s satisfaction with particular search results for particular queries.

This type of pattern prediction might be part of a data mining system that uncovers possible trends and patterns and relationships from a range of data, including classification methods involving things such as support vector machines, decision trees, neural networks, and language models.

That kind of prediction might drive a “dynamically-improving search mechanism” to improve how a search engine performs.

For example, the predicted user satisfaction data 425 may indicate that, of two different presentations of search results, one of the presentations results in higher predicted user satisfaction. The dynamically-improving search mechanism adjusts to provide the better presentation for search results more often, thus improving predicted user satisfaction for the future.

Other refinements may occur using this same mechanism. In addition to providing different presentations of search results (such as different orderings of results on a results page), spell-correction, query refinement suggestions, news or shopping results, or categorization user interface may be provided to the user in different situations.

The dynamically-improving search mechanism 410 can be used to compare the user satisfaction with these solutions or features.

Conclusion

It’s hard to tell if Google has incorporated user feedback into their scoring of the quality of pages, or focusing upon are using it to evaluate their system for grading quality, and then making tweeks to the signals being used.

Google’s Amit Singhal and Matt Cutts told us in The ‘Panda’ That Hates Farms: A Q&A With Google’s Top Search Engineers that the Panda update looks “for signals that recreate that same intuition, that same experience that you have as an engineer and that users have.” It’s possible that these signals are using some kind of classification system that might either incorporate user behavior signals into page rankings, or use it as feedback to evaluate the signals chosen to rerank pages in search results.

The kind of algorithmic approach that I pointed to in Searching Google for Big Panda and Finding Decision Trees may be in part what’s behind the Panda update, but it’s clear that user behavior plays a role in how a page or site might be evaluated by Google.

Share

44 thoughts on “Microsoft’s Approach to Identifying Quality Search Results Based on User Feedback?”

  1. Some great insights here Bill. I certainly think that the user behavior factor is going to play a big role going forward. The release of Google’s +1 button is going to be a huge step in this direction.

  2. Thanks for the post. How do you think is Microsoft going to track the user behavior that may involve navigation?

  3. Pingback: My Fears
  4. I’m wondering what kind of methods will be popping up to gather explicit searcher feedback, and how these evolve–I can see dialogue boxes quickly becoming something that people don’t want to deal with.

  5. The problem with this technique is that it probly will make the work 10x easier for spammers to rank.

    Dont know how hard it could be for someone with some programming skills to create an robot to simulate an user in the site, -lets say, clicking 2/3 links staying around for some time to lower the bounce rate, etc – and then making it run on the site through, dont know, 100 proxies.

    Also using some sort off user input like a approval button (digg or youtube style) or the google +1 button may be manipulated just as easy as the other stats.

    At least the current way the algorithm works make the job harder to manipulate the results, well, at least is what I think.

  6. In terms sharing the information with the user, it’s important that all the information they need is available in one way or another, but the most important thing is to give the user the way to filter the information in the way it wants because everybody is looking at the information in a different way depending who you are and what you do.

    This means that all the information should be in one bag with one id for each element and the user could chose the column he wants and the keyword he wants to look for.

    It’s a little bit like a DB with SQL Requests

  7. Great information here. There is a lot going on out there regarding the tracking, gathering and interpretation of user data. Everything and everyone is being tracked and interpreted. Is this good or bad. I think good. In the long run it should make for a much more personalized online experience. Thanks again!

    Dave

  8. Very good information. Really hard for a lay person (like myself) to keep up with the ever changing world of technology.

  9. Yes indeed an informative post here. I would hope that if you have for example, Google Analytics installed on your blog or site, that Google would have a feature built into their Analytics to see this. Such as what Bill mentioned for the scoring of the quality of pages. Being able to see these extra scores for your site could be very beneficial to Webmasters. I think its a good thing.

  10. Good stuff, and not too hard for us non-tchies to follow. I tend to like any mechanism that acts like a free market where quality is rewarded. My concern is that much like “American Idol” and our political elections, the masses won’t always get it right.

  11. very good point about the +1 button being of little use on the search engine but it would be much better on the brower plus it means google could push chrome a lot more and possible even corner that market to.

  12. Really informative post. I think that the factor of user behavior is known to play a major role in the future. The release of Google +1 key will be a big step in that direction. Thanks for sharing.

  13. Notifying the user behavior is very big task. The use of +1 button really helps for that. Very cool idea

  14. Hi Derek,

    Thank you. It’s hard to tell how much of a role the +1 button might have in ranking pages, but I expect Google will spend a considerable amount of time testing it to see if information about it might help improve search results.

  15. Hi Kentaro,

    I’m not sure that most people would take the time to respond to dialog boxies either. I’ve probably seen hundreds of boxes appear on websites asking me if I would be willing to fill out a brief survey of what I found on a site. I may have answered yes to one of them.

  16. Hi Fabio,

    If the user data collected can be associated with a specific user account, then the credibility of that information might be tied to something like a reputation score associated with that account. It might take considerably more work to create many “credible” accounts than most spammers may care to take the time to create.

    It’s also likely that the search engines have become much better at identifying automated attempts to create user data.

  17. Hi Alex J,

    I’m not quite sure what you’re getting at.

    Most commercial search engines aren’t likely to share with searchers the algorithms behind how they rank pages, and the kinds of information that they use in those rankings.

    At one point in time, we did see Yahoo release something called Yahoo Mindset, which let you use a slider to get either more “shopping” type results, or more “informational” type results when you searched. You would slide the bar towards one side or another, and that would influence the rankings of pages. Yahoo ended that experiment a while back. There was also a similar feature hidden in Microsoft Live Search before they turned to Bing that would let you use a slider to indicated how much “commercial intent” you wanted to see in your search results.

  18. Hi Dave,

    Thanks. There certainly is a lot of data that the search engines are collecting about how people use websites. The question is, whether or not they can use that data in a meaningful manner that improves the quality of a searcher’s experience while searching.

  19. Hi Kevin,

    That’s an interesting thought – for the search engines to provide some kind of indication of how they are measuring the quality of web pages. Google’s Webmaster Tools has some reporting tools built into it to tell you about things that they see that are errors, like some duplicate content issues involving meta descriptions, soft 404 errors, and others. If they were to tell us about quality issues, it might help to improve the quality of pages on the Web.

  20. Hi Pat,

    Thank you – I strive to simplifiy some of the more complex ideas that I find in places like patents, without simplifying them too much. It’s a difficult thing to do.

    Popularity isn’t necessarily a good replacement for quality, and mainstream sites that cover a subject often outrank better sources that much have smaller audiences that may be more technical in nature. That’s one of the problems with PageRank as well – it tends to reward more popular sites, while possibly not giving as much credit to more niche sites that may explore a topic in more depth.

  21. Hi Natalie,

    I’m thinking back to when Google used to include a smiley face button on their toolbar that you could click on when you found a page that you liked. That really didn’t catch on at the time, but it was a few years before Facebook likes arrived. See:

    http://www.webmasterworld.com/forum80/106.htm

    Maybe the +1 button on Chrome will prove to be more popular.

  22. Hi Anuj,

    I’m suspecting that Google and Bing have been looking at a lot of user-behavior signals over the past few years with the intent of figuring out how best to use them as ranking signals.

  23. Hi Tessa,

    It is a big task. Chances are that the amount of data that the search engines collect about user data is more than they amount of data they collect about pages found on the Web.

  24. One must consider how search engines are able to track such activities and how reliable the data they receive truly is. For example, how does Bing monitor user behavior once users leave the search results? Google has the upper hand here with Adsense, Analytics and their Toolbar – all of which allows Google to record data. However, many individuals and sites do not use any of Google’s services outside of search. All of this raises privacy concerns, but for Bing to drill deeper into user behavior, they will have to embrace and entice webmasters to install similar features as Google has done.

  25. Hi Beth,

    Bing has their own toolbar as well, and they’ve written both whitepapers and patents which provide some details on how information they’ve collected from their toolbar as been used in many different ways to describe how people browse the web and interact with web pages. They also have the ability to drill down into their search query log files to see the kinds of searches that people perform, and then track and connect those search queries, and query sessions to browsing activities on pages outside of their search using their toolbar.

    Chances are good that Google doesn’t adsense or analytics data to inform their organic search, and I think they’ve stated that they don’t many times. They collect so much information through both their query log files and toolbar that they likely have more data about user behavior than they have any idea what to do with.

  26. Wow if Google can make this happen I am pretty sure it will rock. I believe user’s feedback is very important.

  27. Hi Andrew,

    The patent is Microsoft’s but I think that Google is exploring a lot of the same territory. I agree – user feedback is growing in importance in how search results are ranked, what query suggestions that we see, and more.

  28. Bill, a very insightful post.

    Care to take a guess as to why no one (other than some rumored) is coming out of Panda?

    - If Google had re-run the algo we would have seen many site-owners screaming from joy. It’s safe to say that a nice percentage has done improvements compared to pre-Panda, namely remove ‘bad pages.’ Not every site is eHow with 457,813,547,894,214,578 pages so deleting bad ones and buffing up what’s left can be managed.

    - If they haven’t re-run it yet, why? To teach a lesson?

    Must not forget that Google has to calculate even for sites that Google doesn’t have too much data for. And this mob rule has to be limited or HuffPost will rank for every word they mention in their (semi-stolen) articles. And of course it can be gamed.

  29. Hi

    Care to take a guess as to why no one (other than some rumored) is coming out of Panda?

    Panda presents a different way of evaluating web pages than Google used in the past, and there may be no “coming out” of Panda. Rather than trying to return to pre-Panda rankings, it might be better for most sites to focus upon improving the quality of what they offer, making smart choices about how they present information, and looking for marketing gaps and niches that they might not have explored before. There’s more to Panda than just removing “bad pages.”

  30. I am starting to believe that, despite Google throwing some sand in the eyes of destroyed webmasters, ruining them even more by spending money on a mission impossible. Content, I believe, is no longer the deciding factor for the hit sites (not all niches were analyzed by Panda). Google did give lots of the traffic back in April to several popular sites that got hit, and as far as I can tell it was based on ‘a lot of people visit them.’ This of course, removed the pressure on Google to do something to others, since the hit sites are now smaller ones and can be called ‘bad’ much easier.

    >>> There’s more to Panda than just removing “bad pages.”

    I am afraid that there’s a lot more than remove bad pages *and* fix the existing ones. You need to become a ‘brand’ or call it quits since almost of the traffic is going to a few, or way fewer sites.

    I never believed that Google can accurately differentiate and analyze all the site content the way Singhal described it post-Panda so instead they are probably using ‘popularity’ as the main deciding factor. Theories are good and all but trying to analyze the text in a site that sells cars, tickets, shoes….patio umbrellas or a scholarly article site is not the same. Maybe they can do it in a few years, but not now. So they gave up and the richer get richer. And Google’s job is easier once they picked who own the niche. For a while anyway

  31. >>> There’s more to Panda than just removing “bad pages.”

    That’s true, but does it make sense for ranking of the left pages to go down if you remove say 600 out of your 1000 pages? What got worse from a ‘user’s perspective’ ?

    So you add more relevant text to your /nikexl500.html etc and remove many ‘thin /shallow’ pages but traffic keeps going down and down. Removing and improving was the only advice given right after Panda, so unless Google changed the rules drastically people should see results. Not 100% pre-panda or 150% but at least not to go down, no? Or a marginal improvement. Immediately after Panda many pages ranked quite high even on pandalized sites, now sitewide ranking (+ or -) has way more power. I see it on my SERPS.

  32. Hi Mike (Google’s TOP 40 Countdown)

    You need to become a ‘brand’ or call it quits since almost of the traffic is going to a few, or way fewer sites.

    Google is definitely engaging in associating entities (specific people, places, and things – including brands) with specific websites if they think it provides better results to searchers. That’s something that’s been going on before Panda, though Panda may include some additional elements that bring that much further.

    Did you have a chance to read through the Planet whitepaper that I wrote about in the post I mentioned (Searching Google for Big Panda and Finding Decision Trees) at the end of this post. If so, do you think it’s likely that Google used a method like that to do the kind of classification that Amit and Matt described in the Wired article?

  33. Hi Bill,

    How much do you think user feedback will play role in search engines in future?

    And assuming Google and Bing both are using Google+ and Facebook Like respectively as user feedback to improve search. Which of the two service do you think will have more positive effect.

  34. Hi Max,

    User Feedback will always have a strong role in what we see from the search engines, much like any other site owner looking at his or her analytics to see how well their site is doing, and try to figure out what they might change to help make it do even better.

    Google would definitely favor a social service that they had both more control over, and more access to data associated with social signals. A recent blog post from Christopher Penn, Google+ and Search Signals: Tinfoil Hat Edition points out how Google is tracking and recording every click and action on Google Plus. Google doesn’t have the same level of access to information about Facebook users, and what they do on Facebook.

  35. I strongly believe that this is the way in which search engines will begin to operate, especially since Google decided that they were going to create the Google +1 button.

    Whether or not it will last will be a test of time as there will be a number of people looking to abuse the fact that they can move themselves up the rankings but i believe that if used correctly, user feedback generated results would give a better user experience.

  36. Hi Laura,

    I think I recently read a few articles about Bing incorporating more user feedback into how they rank pages.

    You might find this interview interesting:

    How Bing Uses CTR in Ranking, and more with Duane Forrester

    I’m not sure that the Google +1 button by itself is going to be used by Google as a ranking signal. It’s also possible that if they do, the value of a +1 from one person might be different than the value of a +1 from another person.

Comments are closed.