User Popularity Data for Ranking Search Results

Sharing is caring!

User Popularity Data and Machine Learning

I haven’t written too much about Microsoft’s search recently, and a friend wrote an email earlier today asking me a little about what they try to do to rank pages. It’s a timely question with Microsoft’s Chairman Bill Gates announcing during a keynote address yesterday a renewed focus of his time at the company upon online services, including search.

Microsoft has been writing about some interesting sounding stuff with Ranknet (as described in Learning to Rank using Gradient Descent (pdf)) and fRank (pdf), a feature-based ranking which works with ranknet. Don’t know if that is what they are using.

Some more recent papers on search, which cover even more intriguing ground, have been showing up on the home page of Susan Dumais, who is one of the chief researchers at Microsoft Research. She has many papers linked to 2007 and 2006 that focus on user behavior.

A patent application from Microsoft, published last week, explores incorporating user statistics on page visits into their machine learning-based ranking systems.

[0044] In practice, imagine this scenario: Sam is browsing the web. PQR Browser Plug-in, for users who have opted in, sends back to QWE Ranking System a list of what URLs the user visited, what time he visited them, etc. This data is stored on QWE servers.

User Popularity Data Has Been Opted Into

I like how they emphasize a few times in this patent application that collecting user data is something people have opted for. Such information collection is through a voluntary system, using a browser plug-in. This User Popularity Data sounds a little similar to Google’s Web History news from not too long ago.

QWE can go through that list and count how many times a user has viewed each page, how many times a user has viewed a given domain, how many times a domain+toplevel (e.g., w w w. qwerank. com /i.e.) has been viewed, etc.

These statistics can then improve the query-independent ranking of web pages (their static rank). For instance, QWE may take a weighted sum of the logs of these counts, where the weights for each count are learned using machine learning.

User Popularity Data Accuracy Performance Gain Greater than 50%

The patent document provides some numbers on the effectiveness of this process during some experiments conducted upon it, citing an accuracy performance gain of more than 50%.

[0045] The resulting ranking helps the search engine provide more relevant results to the people searching the web since it is more likely to return pages many people have visited. According to actual experiments performed, the accuracy of search results for a given search query using the popularity-based system described hereinabove increased over the conventional rank system.

It was determined that 50% of the performance gain observed by the testers is due to the browser tracking count. Thus, popularity-based rankings can improve the quality of search results, and such rankings also help the search engine order its index to retrieve good pages more efficiently.

Finally, they can help the search engine determine which pages to crawl and re-crawl since it is more useful to re-crawl pages that are highly relevant and good rather than re-crawling poor or fraudulent pages.

user popularity data factors

The User Popularity Data Patent application is:

Using popularity data for ranking
Inventors: Matthew R. Richardson, Eric D. Brill; Eric D., Robert J. Ragno, and Robert L. Rounthwaite
Assigned to Microsoft
Filed: November 3, 2005
US Patent Application 20070100824
Published May 3, 2007

Abstract

A unique ranking system and method facilitates improving the ranking and ordering of objects to enhance further the quality, accuracy, and delivery of search results in response to a search query. The system and method involve monitoring and tracking an object regarding the number of times it’s been accessed and optionally by whom, when, for how long, and an access rate. The user’s interaction with the object can be tracked as well.

By tracking the objects, a popularity measure can be determined. Popularity-based rankings can be computed based on the popularity measure or some function thereof. The popularity measure can be affected by the access time who accessed it. And the access duration of the user’s interaction with the object upon a search component can utilize the popularity-based rankings next to improve the quality and retrieval of search results.

Sharing is caring!

8 thoughts on “User Popularity Data for Ranking Search Results”

  1. With Bill Gates announcement, this seems a logical step for Microsoft. What else does Bill Gates left to do except to go after Google?

  2. Hi Navneet,

    It’s not just going after Google, though.

    I think his statement reflects a knowledge that platforms are moving to online usage, and paying attention to feedback elicited through interactions with users can provide information and services that they want to see and use. That seems to be the next logical evolution for a company like Microsoft.

  3. I’m looking forward to their implementations of personalized search, too, Steven.

    I think that the concern each search engine shows in how our personal data is used and protected is going to be a key aspect of how successful personalized search will be from different sources.

  4. Thanks Bill. I hadn’t realized Microsoft was as far along in tracking user data, though it makes sense they would be. I agree this is the next step and natural evolution in search. How long till Live Search and Yahoo! for that matter offer their own versions of personalized search?

    I appreciate all the mentions of ‘opt in’ too, though I wonder how long this will always be the case. Assuming the search engines see that user data produces better results and leads to more people using their engine it’s likely they’ll be looking for ways to collect more data regardless of how we feel about it.

  5. I have a dream.

    I dream that one day Google will have some stiff competition and the playing field will level once again. Microsoft has the resources and the talent to bring Google to it’s knees.

    Now don’t get me wrong. I am not a Google hater. Google has made me a pretty penny. However, where there is little or no competition, there is eventually the big head that comes with power. The paid link debate is the perfect example of “big brother” trying to manipulate the players. It is OUR websites. Without our websites, what would Google have to have to show?

  6. Pingback: This Week In SEO - 5/11/07 - TheVanBlog
  7. I have a chance to read this article only in last week.
    It may too late to responce. But I like the idea and approach.
    I myself is working on this topic.
    I found that there are many ways and methods to approach this kind of problems.
    I myself has developed one method to be published soon.
    But anyhow, I admire the Inventors: Matthew R. Richardson, Eric D. Brill; Eric D., Robert J. Ragno, and Robert L. Rounthwaite
    Assigned to Microsoft for thier breakthroughs.

  8. Hi Pyke Tin,

    There are some very interesting ideas circulating about how to improve the quality of search results. It looks like you have been working on some very interesting topics. I would really like to read your newest paper once it is published.

    Thank you for stopping by, and leaving a comment.

    Bill

Comments are closed.