Microsoft Exploring Popularity Data for Ranking Search Results
I haven’t written too much about Microsoft’s search recently, and a friend wrote an email earlier today asking me a little about what they try to do to rank pages. It’s a timely question with Microsoft’s Chairman Bill Gates announcing during a keynote address yesterday a renewed focus of his time at the company upon online services, including search.
Microsoft has been writing about some interesting sounding stuff with Ranknet (as described in Learning to Rank using Gradient Descent (pdf)) and fRank (pdf), a feature-based ranking which works with ranknet. Don’t know if that is what they are using.
Some more recent papers on search, which cover even more intriquing ground, have been showing up on the home page of Susan Dumais, who is one of the chief researchers at Microsoft Research. She has a number of papers listed and linked to from 2007 and 2006 that focus a lot on user behavior.
A patent application from Microsoft, published last week explores incorporating user statistics on page visits into their machine learning based ranking systems.
 In practice, imagine this scenario: Sam is browsing the web. PQR Browser Plug-in, for those users who have opted in, sends back to QWE Ranking System a list of what URLs the user visited, what time he visited them, etc. This data is stored on QWE servers.
I like how they emphasize a few times in this patent application that the collection of user data is something people have opted into, and that such collection of data is through a voluntary system, using a browser plug-in. This sounds a little similar to Google’s Web History news from not too long ago, doesn’t it?
QWE can go through that list and count how many times each page has been viewed by a user, how many times a given domain has been viewed by a user, how many times a domain+toplevel (e.g., w w w. qwerank. com /ie) has been viewed, etc.
These statistics can then be used to improve the query-independent ranking of web pages (their static rank). For instance, QWE may take a weighted sum of the logs of these counts, where the weights for each count are learned using machine learning.
The patent document provides some numbers on the effectiveness of this process during some experiments that were conducted upon it, citing an accuracy performance gain of more that 50%.
 The resulting ranking helps the search engine provide more relevant results to the people who are searching the web, since it is more likely to return pages which many people have visited. According to actual experiments performed, the accuracy of search results for a given search query using the popularity based system described hereinabove increased over the conventional rank system.
In fact, it was determined that 50% of the performance gain observed by the testers is due to the browser tracking count. Thus, popularity based rankings can and do improve the quality of search results. Such rankings also help the search engine order its index so that it can retrieve good pages more efficiently.
Finally, they can help the search engine determine which pages to crawl and/or re-crawl since it is more useful to re-crawl pages that are highly relevant and good rather than re-crawling poor or fraudulent pages.
The patent application is:
Using popularity data for ranking
Inventors: Matthew R. Richardson, Eric D. Brill; Eric D., Robert J. Ragno, and Robert L. Rounthwaite
Assigned to Microsoft
Filed: November 3, 2005
US Patent Application 20070100824
Published May 3, 2007
A unique ranking system and method that facilitates improving the ranking and ordering of objects to further enhance the quality, accuracy, and delivery of search results in response to a search query. The system and method involve monitoring and tracking an object in terms of the number of times it’s been accessed and optionally by whom, when, for how long, and an access rate. The user’s interaction with the object can be tracked as well.
By tracking the objects, a popularity measure can be determined. Popularity based rankings can be computed based on the popularity measure or some function thereof. The popularity measure can be affected by the access time, who accessed it, access duration or the user’s interaction with the object upon access. The popularity based rankings can be utilized by a search component to improve the quality and retrieval of search results.
I suspect since the filing date of the patent application was in 2005, that Microsoft has conducted a lot of research upon user behavior influencing rankings of results. As I noted above, visit the home page of Dr. Dumais for some more recent papers on the topic.