Recalculating PageRank

A Google patent was granted on October 20th, 2015 titled Producing a ranking for pages using distances in a Web-link graph. It presents some changes to Google’s original PageRank.

I wrote about the very first PageRank patent in my post The First PageRank Patent and the Newest, where I posted a link to the original provisional copy of Lawrence Page’s Improved Text Searching in Hypertext Systems (pdf – 1.7m)

Under this new patent, Google adds a diversified set of trusted pages to act as seed sites. When calculating rankings for pages. Google would calculate a distance from the seed pages to the pages being ranked. A use of a trusted set of seed sites may sound a little like the TrustRank approach developed by Stanford and Yahoo a few years ago as described in Combating Web Spam with TrustRank (pdf). I don’t know what role, if any, the Yahoo paper had on the development of the approach in this patent application, but there seems to be some similarities.

Links from seed pages
Ranks would be based in part upon distances of links from seed pages.

The new patent is:

Producing a ranking for pages using distances in a Web-link graph
Inventor: Nissan Hajaj
Application Date: 12.10.2006
Publication Number: 9,165,040
Publication Date: 20.10.2015
Granted: 20.10.2015
Abstract:

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for producing a ranking for pages on the web. In one aspect, a system receives a set of pages to be ranked, wherein the set of pages are interconnected with links.

The system also receives a set of seed pages which include outgoing links to the set of pages. The system then assigns lengths to the links based on properties of the links and properties of the pages attached to the links. The system next computes shortest distances from the set of seed pages to each page in the set of pages based on the lengths of the links between the pages.

Next, the system determines a ranking score for each page in the set of pages based on the computed shortest distances. The system then produces a ranking for the set of pages based on the ranking scores for the set of pages.

The “trusted pages” in this process appear to follow the same assumption that the seed pages in the Yahoo Trustrank approach follow, that “good pages seldom point to bad ones.”

The inventor of this patent has been at Google for a while. Back in 2008, he was one of the co-authors of a Google Blog post that told us that Google had achieved a milestone after indexing over a trillion pages, in the post We knew the web was big… According to his LinkedIn Profile, Nissan Hajaj has been a Sr. Staff Engineer at Google since August of 2004, as an “Algorithms Developer and multi-disciplined team leader/member for innovative projects in a variety fields of engineering.”

If you want to drill down into the details of this new ranking algorithm, the best place to start may be with the claims from the patent.

It’s difficult to say whether or not Google may have adopted this new ranking approach, and made it live.

73 thoughts on “Recalculating PageRank”

  1. Another great find here Bill. Thanks for sharing. It would be interesting if Google made it public that PageRank would make a comeback, but I fear that it could raise the PageRank spamming issue again. Anyway, it is all done to improve the web, so it will be interesting to see what is unlocked. Thanks for sharing!

  2. Hi Dave,

    This approach seems geared towards ranking pages that are higher trusted to rank better. Don’t know how well it might work, but it’s probably worth trying.

  3. Definitely very interesting. This implies that Google still uses pagerank but does not show it externally to the public. I still get the spam emails of paid linking and stuff to increase pagerank but it has decreased quite a bit after Google stopped updating PR on its toolbar.

  4. Nice article Bill.

    I knew Google would have some kind of rebirth of PageRank in one form, one name or another. It looks like they are trying to build on some things they have learned over the years.

    I do wonder if Google will make any kind of announcement on this. As Dave said above, it would be interesting to see if Google goes public on this as well as how it’s taken and handled.

  5. Hi Dave,

    Pagerank never went away, Google still use Pagerank – it just no longer updates the public version of Pagerank so we can never see what our pagerank actually is, we can only infer from products such as Moz DA score.

    Pagerank is still a key ranking factor.

    Thanks

    Andy

  6. This sounds very familiar….oh yeah, here is the definition of Trust Flow from Majestic.com:
    “Trust Flow, a trademark of Majestic, is a score based on quality, on a scale between 0-100. Majestic collated many trusted seed sites based on a manual review of the web. This process forms the foundation of Majestic Trust Flow. Sites closely linked to a trusted seed site can see higher scores, whereas sites that may have some questionable links would see a much lower score”

    Interesting indeed…..

  7. Hi Raphael,

    The patent doesn’t define those much past statements such as, “seeds are specially selected high-quality pages which provide good web connectivity to other non-seed pages”. I think it’s on purpose that they don’t.

  8. Hi Stephen,

    The concept or idea of seed sites has been out on the Web for a while, going back at least as far as the Yahoo/Stanford paper that I mentioned.

  9. Hi Andrew,

    Interesting, yes. The Hilltop approach targets pages that are said to be from experts, so the focus is a little different that the one here, which is aimed at pages that are chosen because they are “trusted.”

    The patent also points to finding a diverse set of trusted pages, so that a wide range of topic are covereed, but doesn’t say that is being done to find expert pages in topics the way Hilltop would:

    One approach for choosing seeds involves selecting a diverse set of trusted seeds. Choosing a more diverse set of seeds can shorten the paths from the seeds to a given page. Hence, it would be desirable to have a largest possible set of seeds that include as many different types of seeds as possible. However, because selecting the seeds involves a human manually identifying these high-quality pages, the total number of the seeds is typically limited. Moreover, having too many seeds can make the selected seeds vulnerable to manipulation. Consequently, the actual number of the selected set of seeds is limited.

  10. Hi Claude,

    I don’t think Google ever came out and told us that they would stop using PageRank to rank pages. They did tell us that they would no longer be updating the PageRank toolbar; which makes PageRank scores invisible to us.

  11. Hi Andy,

    Thank you. That’s how I see Google using PageRank these days, too. The Moz DA score is supposed to be a measure of the strength of an entire domain, rather than an individual page. Moz uses Mozrank to indicate the strength of individual pages.

  12. Hi Ulterio,

    Thank you.

    This does seem like a mix of things that Google may have learned over the years. Hard to say if Google will make an announcement about this; I guess we wait and see.

  13. Yes, seed sites have been around for a while, seems like a chore to figure out who used this methodology first, I do find it somewhat affirming to see similar lines of thought between Google and other services like Majestic, to me it seems to lend added credibility to the third party services as having somewhat valid information, atleast in some ways. Keep the great content coming friend!

  14. Hi Usman,

    I would guess it’s possible that Google still uses PageRank. It’s also likely that Google is taking steps with efforts like this patent to avoid link manipulation and spamming via things like link buying.

  15. Hi Stephen,

    Thank you. It is good seeing the approach used in a number of places, in different ways. I like the chance to see what kinds of things are taking place behind the curtains, and This seemed like a good idea when Yahoo and Stanford came out with TrustRank. Google did come out with a different type of Trust Rank a few years ago, which I wrote about in http://www.seobythesea.com/2009/10/google-trust-rank-patent-granted/, but it’s good seeing that Google also found a way to use a trusted seed set of sites like this.

  16. Hi Bill,

    It looks like the patent application was made nine years ago. It’s not common for it to take that long for a patent to be granted, is it?

    In any case, I would guess that if Google decided to make use of this in the algo, it’s probably been in place for some time now.

  17. Hi Bob,

    I took a look at the USPTO PAIR database and the documents that were filed by the applicants and the patent prosecutor during the long period that the patent was being reviewed. The prosecutor did bring the Yahoo TrustRank patent up against it, but appears to let Google’s patent proceed because it implemented things differently. Some patents take a long period of time. Google may have put this process in place, or they could have held off on doing so until they felt safe in moving forward.

  18. looks it has more intense criteria for the PR. I’m excited to see the update, to enable to check my new sites status in terms of PR. ^_^

    Thanks for the info Sir Bill.

  19. Hi Bryan.,

    I’m not sure that you are going to be able to see if this has impacted your site, nor check upon your site’s status in terms of PR, since Google is no longer updating the PageRank Toolbar.

  20. It seems that PageRank is going to evolve to include greater trust signals. At long last. It really depends on how they choose the seed sites

  21. One thing is puzzling me, does the patent refer to a physical distance, or a topical one? Or both? – “The system next computes shortest distances from the set of seed pages to each page in the set of pages based on the lengths of the links between the pages.”
    I guess my question regarding distance would be – will I struggle running a site about the Caribbean if I was based in Europe? – For example.
    Pretend for a minute you were Google and pick a seed site. (Please don’t let it be Wikipedia though).

  22. Hi Neil,

    I believe that “distance” is a measure of the number of clicks from a seed page to the page in question.

  23. Pagerank never went away, Google still use Pagerank – it just no longer updates the public version of Pagerank so we can never see what our pagerank actually is, we can only infer from products such as Moz DA score.

    Pagerank is still a key ranking factor.

    Thanks

  24. Hi Bill,

    I back again to your site to learn more about “recalculating pagerank” Now I understand your point!
    Thanks

  25. Yes, I believe it is a natural tendency. With the evolution of the web Pagerank itself becomes increasingly obsolete in the face of countless factors that can prove its a page / site is relevant or not.
    Backlink is still king, and will be at least next year. But I think it will gradually lose power before other factors.

  26. Classic PageRank feels more and more irrelevant, as I notice how -1 PR sites rank higher than PR5 websites. Google made a good point releasing this important update.

  27. Hi Jack,

    Classic PageRank has always given off that feel, and PageRank 1 websites have always had the ability to outrank much less relevant PageRank 5 websites, if they’ve been relevant enough. I have a feeling that we might not see PageRank lasting too much longer.

  28. Hi Bill,

    Great reading, just one question – I thought Google finished with page rank ?! Am I wrong?

  29. Hello Bill
    Thanks for sharing valuable information, would you pls explain the term“good pages seldom point to bad ones.”
    Thanks once again…

  30. Hi Sunny,

    That phrase is a description of the assumption that the Yahoo Trustrank paper is based upon. That trustworthy Webpages tend not to link to bad web sites.

  31. Thanks for sharing valuable information.I found very informative writing and blog links.Certainly very interesting. Google still uses PageRank, but it does not appear that the public external means.Thanks once again…

  32. Hi Rizwan,

    Your welcome. I’m happy to hear that you liked my site. It’s difficult to tell how much Google is using PageRank these days.

  33. Interesting read Bill. The idea of trusted ‘seed’ sites makes sense. A lot of the patents I read around crawling etc, also refer to ‘page importance’ and say that this may include page rank so it seems to still be in the mix somewhere. A selection of very trusted sites which rarely link to bad sites can surely provide some level of trust indication as you say.

  34. Hi Dawn.

    It was a surprise running into this patent, because it seemed like the Yahoo Trustrank approach was out there for a while, and Google was purposefully staying away from it. I checked in the USPTO PAIR database, where you can look at documents that were filed in the prosecution of a patent, and those show that Google struggled with the patent prosecutor over the Yahoo Trustrank patent, but ended up deciding that Google’s use of seed sites was different from Yahoos enough to grant this patent.

    In a move which seems to point towards a different approach regarding crawling of web pages is Yahoo’s development and open sourcing of a method that looks for data in markup on pages, rather than a page importance. That is described on this page:

    http://venturebeat.com/2015/12/14/yahoo-open-sources-anthelion-web-crawler-for-parsing-structured-data-on-html-pages/

  35. Hi Micahel,

    I think Google would prefer to see people spending less time doing things like building Private blog networks, and spending more time creating quality content.

  36. Recalculating pagerank methods are very important to increase website traffic. This article explain how Google calculates PR and optimize website for achieving a high PR.

  37. Thanks bill for sharing this.
    But I want to know how much time google take to update pagerank or it updated automatically after some fix time.

  38. Hi Edward

    Around 12 years ago, Google went from updating PageRank monthly to updating PageRank much more quickly, providing and “everflux” of updates to PageRank. It’s impossible to tell how frequently Google might be updating PageRank now-a-days.

  39. Hi Bill,
    This is a nice article, and it seems to have spawned some good discussion. I agree with you that PageRank never really went away, just the public reference to it.

    Makes sense though if you think about it. How do you determine if information is of any quality? Mathematically I mean. It would need to be highly dependent on “votes” of some sort (being links from sites) and that more respected sites linking to the target site would get more weight, and thus pass on that weight to the target site. In essence saying “hey this is good info here”. The farther away the higher weighted respected sites, the less the weight is carried.

    I am sure it has a partial agenda to weed out weak PBNs too, because Google doesn’t like people trying to game the system. But I also think this is a secondary objective (or even just a byproduct), and that they are trying to determine value of the page’s content in order to bring the best info to the searcher. (I’m not implying that you said otherwise, just adding some thoughts here.)

    Thanks!

  40. Hi ChrisV

    Thank you. So much of what Google has achieved since their start in the late 90s has been because of how they ranked search results based upon a link citation analysis. So, when Google publicly makes a change to how they might be doing such an analysis, it’s worth spending time considering what they may have changed. I looked at the PAIR database at the USPTO to see if they discussed Yahoo’s TrustRank approach, and the patent prosecutor did raise the patent behind that, but said that Google’s approach was different enough that it shouldn’t block Google using the method described in this patent. Placing value on highly trusted sites linking to other high quality sites seems like a valuable assumption. It’s likely that PBNs are less likely to be linked to from the seed sites mentioned in this patent. It was interesting to see Google pursue this approach and amend PageRank in a method like this.

  41. I think we might of seen these changes take place over the last couple of weeks. Some major shifts taking place from where I’m sitting and I think it will be a few weeks before the dust settles yet. Thanks for the post Bill

  42. Hi Alan

    It’s difficult to tell what Google changes when we see large fluctuations in search results like we did recently, and when Google admits to us that the changes were in response to changes in Google’s core ranking algorithms, like we’ve been told, it’s tough saying what they may have changed. This patent does signal what could be tremendous changes, but the patent itself was filed a number of years ago, and if Google implemented this change, we may not have gotten any feedback from them when it may have happened. Being vigilant for visible signs that something specific has changed is worth doing. 🙂

  43. Hi Bill
    I was looking at the old post of Moz you probably know, “All Links are Not Created Equal” by” Rand Fishkin.
    Point #4 was: Links from Sites Closer to a Trusted Seed Set Pass More Value.
    I wonder, is it some evidence that Google used (uses) distance to seed pages to measure sites
    in her alogrtihms, so what’s new or changed this time?

    thanks James.

  44. Hi Bill,

    Many of the crawl scheduling patents refer to ‘page importance (which may include page rank), implying that there are many other factors involved. I also notice Googlers talking about ‘important pages’ and that they try to crawl these more often when they recognise them.

  45. Hi Dawn

    Google seems to look at different importance metric to decide when to crawl a page, and those importance metrics may include things such as PageRank, and how frequently a page tends to be updated. Another importance metric might be closeness to the root directory on a site – chances are that Google would much prefer to crawl and index 1,000,000 home pages of 1,000,000 sites than 1,000,000 pages of one site – it would have a lot more diverse search index that way. In the 1990s, Lawrence Page co-authored a paper that I found a reference to on the Stanford Website a few years ago, that listed research used to provide results to the Google search engine. The paper includes 5 different importance metrics including PageRank, and should be read by anyone who is interested in how Google Works, because chances are that it describes how Google Crawled pages on the Web for at least a while. The paper is:

    Efficient Crawling Through URL Ordering

  46. Thanks Bill.

    Yes, it definitely corresponds with a few things Googlers have said too really (around how often the page changes e.g.). Also, if we consider the ‘crawl budget’ interview with Matt Cutts often referred to it corresponds with pages from root getting more budget and then budget flowing down from those pages to those below them in the hierarchy also getting budget. I’ve seen some evidence that when you send Googlebot into areas of a site where there is little strength the crawl stats show a big drop off, indicating potentially that there is no budget assigned as such. Obviously this is on large sites where the budget couldn’t likely be enough to crawl the whole site on one visit versus very small sites which can be whipped around in a few miliseconds.

  47. I remember discussing this topic with a few friends some time ago and I was telling them that soon enough, Page Rank will be irrelevant. Finally, I can prove them I was right! Great article, btw.

  48. Hi Dawn,

    It’s part of the reason why it’s been a good idea to point links from the home page of a site to different parts of that site, as “featured sections” of the site, from time to time, to introduce a pathway into those sections for a search engine crawler to use to visit. Those shouldn’t be random links, but should remain for a long enough period so that they are seen to be more than just transient links, like an ad might be perceived as.

  49. Hi Andrei,

    The approach that I was describing with coverage of this patent wasn’t telling us that PageRank was gone; just different. Thanks.

  50. This was good. There are some interesting points made on Wikipedia on how page rank is calculated as well. I really do like your post here though. It is very helpful.

  51. Hi Bill,

    How subsequently update the page rank of any website. because for a long time, i have not seen any changes in page rank of any website i have been working since 1.5 years.

  52. Hi Shilpi< Google representatives told us last year that they wouldn't be updating toolbar pagerank. So you won't see that change for any websites. They repeated this around a month ago or so; but added to that statement that they still use PageRank while ranking pages in search results.

Leave a Reply

Your email address will not be published. Required fields are marked *