Calculating PageRank takes a lot of time and computing power. If there was a way to speed up the process, it might make a significant difference to the amount of time that it takes to assign PageRank values to pages.
The original patents involving PageRank, Method for node ranking in a linked database (updated) and Method for scoring documents in a linked database, have been joined by a newly granted patent that explores a faster method of calculating ranks for pages.
On the Web, most links between pages are between pages in the same domain. That’s an observation that the process in this new patent uses to its advantage.
Methods for ranking nodes in large directed graphs
Invented by Sepandar D. Kamvar, Taher H. Haveliwala, Glen Jeh, and Gene Golub
Assigned to Board of Trustees of the LeLand Stanford Junior University
US Patent 7,216,123
Granted May 8, 2007
Filed: August 22, 2003
Techniques for assigning ranks to nodes in a large linked database, such as world wide web or any other hypermedia database, partition the nodes so that the link matrix has a predominantly block-diagonal form.
Within each block, a local rank is computed for nodes in the block, possibly by different computer in a distributed computing environment. A block rank is then estimated for each block as a whole, and may optionally include block-level weights to implement customized ranking.
The local ranks and block ranks are then combined to form a global rank, which may be used to rank the nodes. Alternatively, a global rank vector for the database may be used as an initial vector in an iterative link-based ranking scheme to obtain more accurate global ranks for the nodes. The global rank vector may be divided to provide local rank vectors for use in subsequent applications of the method.
The process described isn’t new, with the patent having been filed in 2003, and there seems to be a good deal of overlap between the process describe in the patent and a paper from researchers from Stanford published in 2003 – Exploiting the Block Structure of the Web for Computing PageRank, by Sepandar D. Kamvar, Taher H. Haveliwala, Christopher D. Manning, and Gene H. Golub.
Here’s the abstract from the paper:
The web link graph has a nested block structure: the vast majority of hyperlinks link pages on a host to other pages on the same host, and many of those that do not link pages within the same domain. We show how to exploit this structure to speed up the computation of PageRank by a 3-stage algorithm whereby
(1) the local PageRanks of pages for each host are computed independently using the link structure of that host,
(2) these local PageRanks are then weighted by the â€œimportanceâ€ of the corresponding host, and
(3) the standard PageRank algorithm is then run using as its starting vector the weighted concatenation of the local PageRanks.
Empirically, this algorithm speeds up the computation of PageRank by a factor of 2 in realistic scenarios. Further, we develop a variant of this algorithm that efficiently computes many different â€œpersonalizedâ€ PageRanks, and a variant that efficiently recomputes PageRank after node updates.
There have been a number of other papers from Stanford which explore ways to increase the speed with which PageRank can be calculated.
Perhaps the biggest takeaway from the issuance of this patent is that the original PageRank, described in the late 1990s in The Anatomy of a Large-Scale Hypertextual Web Search Engine and The PageRank Citation Ranking: Bringing Order to the Web, has changed and evolved over time.
The processes that the major search engines follow to calculate a query independent score for pages (an indication of how important a page may be rather than how relevant it may be for a specific query) may not have followed along the same path as the academic research on the subject. But if you’re interested in how search engines rank pages, reading through the paper and patent (the paper is easier reading) may provide some interesting insights into possibilities along the evolutionary of processes involved in ranking Web pages.