Microsoft Exploring Popularity Data for Ranking Search Results

I haven’t written too much about Microsoft’s search recently, and a friend wrote an email earlier today asking me a little about what they try to do to rank pages. It’s a timely question with Microsoft’s Chairman Bill Gates announcing during a keynote address yesterday a renewed focus of his time at the company upon online services, including search.

Microsoft has been writing about some interesting sounding stuff with Ranknet (as described in Learning to Rank using Gradient Descent (pdf)) and fRank (pdf), a feature-based ranking which works with ranknet. Don’t know if that is what they are using.

Some more recent papers on search, which cover even more intriquing ground, have been showing up on the home page of Susan Dumais, who is one of the chief researchers at Microsoft Research. She has a number of papers listed and linked to from 2007 and 2006 that focus a lot on user behavior.

A patent application from Microsoft, published last week explores incorporating user statistics on page visits into their machine learning based ranking systems.

[0044] In practice, imagine this scenario: Sam is browsing the web. PQR Browser Plug-in, for those users who have opted in, sends back to QWE Ranking System a list of what URLs the user visited, what time he visited them, etc. This data is stored on QWE servers.

I like how they emphasize a few times in this patent application that the collection of user data is something people have opted into, and that such collection of data is through a voluntary system, using a browser plug-in. This sounds a little similar to Google’s Web History news from not too long ago, doesn’t it?

QWE can go through that list and count how many times each page has been viewed by a user, how many times a given domain has been viewed by a user, how many times a domain+toplevel (e.g., w w w. qwerank. com /ie) has been viewed, etc.

These statistics can then be used to improve the query-independent ranking of web pages (their static rank). For instance, QWE may take a weighted sum of the logs of these counts, where the weights for each count are learned using machine learning.

The patent document provides some numbers on the effectiveness of this process during some experiments that were conducted upon it, citing an accuracy performance gain of more that 50%.

[0045] The resulting ranking helps the search engine provide more relevant results to the people who are searching the web, since it is more likely to return pages which many people have visited. According to actual experiments performed, the accuracy of search results for a given search query using the popularity based system described hereinabove increased over the conventional rank system.

In fact, it was determined that 50% of the performance gain observed by the testers is due to the browser tracking count. Thus, popularity based rankings can and do improve the quality of search results. Such rankings also help the search engine order its index so that it can retrieve good pages more efficiently.

Finally, they can help the search engine determine which pages to crawl and/or re-crawl since it is more useful to re-crawl pages that are highly relevant and good rather than re-crawling poor or fraudulent pages.

The patent application is:

Using popularity data for ranking
Inventors: Matthew R. Richardson, Eric D. Brill; Eric D., Robert J. Ragno, and Robert L. Rounthwaite
Assigned to Microsoft
Filed: November 3, 2005
US Patent Application 20070100824
Published May 3, 2007

Abstract

A unique ranking system and method that facilitates improving the ranking and ordering of objects to further enhance the quality, accuracy, and delivery of search results in response to a search query. The system and method involve monitoring and tracking an object in terms of the number of times it’s been accessed and optionally by whom, when, for how long, and an access rate. The user’s interaction with the object can be tracked as well.

By tracking the objects, a popularity measure can be determined. Popularity based rankings can be computed based on the popularity measure or some function thereof. The popularity measure can be affected by the access time, who accessed it, access duration or the user’s interaction with the object upon access. The popularity based rankings can be utilized by a search component to improve the quality and retrieval of search results.

I suspect since the filing date of the patent application was in 2005, that Microsoft has conducted a lot of research upon user behavior influencing rankings of results. As I noted above, visit the home page of Dr. Dumais for some more recent papers on the topic.

Share

8 thoughts on “Microsoft Exploring Popularity Data for Ranking Search Results”

  1. With Bill Gates announcement, this seems a logical step for Microsoft. What else does Bill Gates left to do except to go after Google?

  2. Hi Navneet,

    It’s not just going after Google, though.

    I think his statement reflects a knowledge that platforms are moving to online usage, and paying attention to feedback elicited through interactions with users can provide information and services that they want to see and use. That seems to be the next logical evolution for a company like Microsoft.

  3. Thanks Bill. I hadn’t realized Microsoft was as far along in tracking user data, though it makes sense they would be. I agree this is the next step and natural evolution in search. How long till Live Search and Yahoo! for that matter offer their own versions of personalized search?

    I appreciate all the mentions of ‘opt in’ too, though I wonder how long this will always be the case. Assuming the search engines see that user data produces better results and leads to more people using their engine it’s likely they’ll be looking for ways to collect more data regardless of how we feel about it.

  4. I’m looking forward to their implementations of personalized search, too, Steven.

    I think that the concern each search engine shows in how our personal data is used and protected is going to be a key aspect of how successful personalized search will be from different sources.

  5. I have a dream.

    I dream that one day Google will have some stiff competition and the playing field will level once again. Microsoft has the resources and the talent to bring Google to it’s knees.

    Now don’t get me wrong. I am not a Google hater. Google has made me a pretty penny. However, where there is little or no competition, there is eventually the big head that comes with power. The paid link debate is the perfect example of “big brother” trying to manipulate the players. It is OUR websites. Without our websites, what would Google have to have to show?

  6. Hi Pyke Tin,

    There are some very interesting ideas circulating about how to improve the quality of search results. It looks like you have been working on some very interesting topics. I would really like to read your newest paper once it is published.

    Thank you for stopping by, and leaving a comment.

    Bill

  7. I have a chance to read this article only in last week.
    It may too late to responce. But I like the idea and approach.
    I myself is working on this topic.
    I found that there are many ways and methods to approach this kind of problems.
    I myself has developed one method to be published soon.
    But anyhow, I admire the Inventors: Matthew R. Richardson, Eric D. Brill; Eric D., Robert J. Ragno, and Robert L. Rounthwaite
    Assigned to Microsoft for thier breakthroughs.

Comments are closed.