Using Page Quality to Overcome Bias in Ranking Newer Sites

Pagerank is a measure of the popularity of a page, yet it has a flaw according to some researchers. The problem is that newer pages haven’t had the chance to be viewed like older pages that have more links to them.

How would a problem like this be overcome? One way might be to determine how likely it would be that someone who viewed the newer page would link to it, and use a “future” measure of pagerank to return results to searchers.

In a paper by Junghoo Cho, Sourashis Roy, and Robert E. Adams, Page Quality: In Search of an Unbiased Web Ranking (pdf), the researchers try to address this problem. Here’s how they describe it:

In a number of recent studies researchers have found that because search engines repeatedly return currently popular pages at the top of search results, popular pages tend to get even more popular, while unpopular pages get ignored by an average user. This “rich-get-richer” phenomenon is particularly problematic for new and high-quality pages because they may never get a chance to get users’ attention, decreasing the overall quality of search results in the long run.

The solution to this problem, according to them, would be to define a new ranking function, called page quality, which could overcome this popularity bias. A patent application was published this morning, with Junghoo Cho named as the inventor, which explores a definition for page quality, and describes how such a system might work.

Unbiased page ranking
Invented by Junghoo Cho
US Patent Application 20060294124
Published December 28, 2006
Filed January 12, 2005

Abstract

The pages in a network of linked pages are ranked based on the quality of the pages.

Page quality is obtained by determining the change over time of the link structure of the page, which is obtained by determining the link structure of the page at different periods of time by taking multiple snapshots of the link structure of the network.

The link structures are approximated by their PageRanks, page quality being determined by the formula: Q .function. ( p ) .apprxeq. D .DELTA. .times. .times. PR .times. ( p ) PR .function. ( p ) + PR .function. ( p ) where Q(p) is the quality of the page, PR(p) is the current PageRank of the page, .DELTA.PR(p) is the change over time in the PageRank of the page, and D is a constant that determines the relative weight of the terms .DELTA.PR(p)/PR(p) and PR(p).

There is a lot of overlap between the patent application, and the paper, and rather than summarize the process defined in the patent, I’m going to recommend that if you are interested in finding out more about how this works, you take a look at the paper before tackling the patent application.

Some interesting stuff in these documents, such as noting that there are three different stages for a web site after it is created: the infant stage, the expansion stage, and the maturity stage.

The infant stage is defined by a period of time in which the page is barely noticed by Web users and has almost no popularity at all. The second expansion stage is where the popularity of a page suddenly increases. The maturity stage is where the popularity of the page appears to stabilize at some certain value.

Measuring and comparing pagerank changes along the path to maturity might provide a “future” pagerank, which would show how popular a page might be if people actually knew about it, and had the choice of whether or not to link to the page. Ranking pages based upon that future pagerank instead of the present one may act to overcome the bias search engines have towards popular pages at the expense of new ones.

The conclusion in the patent application, defining differences between ranking web pages by search engines over the past decade or so was interesting, too:

At a very high level, we may consider the quality estimator as a third-generation ranking metric. The first-generation ranking metric (before PageRank) judged the relevance and quality of a page mainly based on the content of a page without much consideration of Web link structure. Then researchers [12, 16J proposed a second-generation ranking metrics that exploited the link structure of the Web. The present invention further improves the ranking metrics by considering not just the current link structure, but also the evolution and change in the link structure. Since we are taking one more information into account when we judge page quality, it is reasonable to expect that the ranking metric performs better than existing ones.

Conclusion

Is this process presently being used? It’s difficult to tell, but I noticed that the UCLA Office of Intellectual Property and Industry Sponsored Research included a page about the algorithm being developed, which is no longer available.

Regardless, it’s interesting to see how an example of how a search engine might use measures of time, and frequencies of changes in rankings of pages to influence where those pages show up in search results.

Share

3 thoughts on “Using Page Quality to Overcome Bias in Ranking Newer Sites”

  1. Thus, at the core of this problem lies the question of page quality, but what is meant by the quality of a page? Without a good definition of page quality, it is difficult to measure how much bias PageRank induces in its ranking and how well other ranking algorithms capture the quality of pages.

    Very interesting patent Bill, and I’ve only read bits and pieces of it, but I think they’re climbing up the wrong tree. The ultimate solution is to judge a page’s quality based on what’s written on the page. Links will always be a popularity metric that may or may not always reflect page quality.

    If Google has or is thinking of launching an AI team, I won’t hesitate to send them a job application :)

  2. I think that you may be right. There’s a lot of value to looking at what is on a page when considering page quality.

    The premise of page quality being defined at something people would link to, if they were aware of it isn’t necessarily a bad idea, and I think it may improve upon the basic concepts of pagerank.

    But there are other approaches, and I’ll likely be writing about one of those later today.

  3. I am glad to here people pursuing a better and fairer way of ranking webpages. I based my website design on maximizing quality through content and professionally organized business advertising galleries. I published it in 2010, but am struggling to grow my business website. I hope one day that the quality of a website takes precedent, so quality websites can achieve more visibility, and hopefully more sales. I worked hard to design and publish this website, and has been through several revisions, to the end result you see today. I’m a bit frustrated, and even use Blast4traffic to try to generate more hits, but with little results. Thank you for letting me bend your ears. Manager@logolinkads.com

Comments are closed.