PageRank 2020

Sharing is caring!

If you have been an SEO for any amount of time, or responsible for marketing a website, you may have heard of PageRank. It was was developed by the Founders of Google in the earliest days of the Search Engine. There is new information about PageRank that I wanted to share when I learned about it this morning.

Recently revealed news about the PageRank algorithm, was that Google had stopped using at least the original version of PageRank back in 2006. Or that they started using a version of PageRank (still referred to as PageRank) after that time. They may have started using a different version of PageRank at that time. There have been at least two others developed by Google that were available in 2006.

The original patent behind PageRank was assigned to Stanford University, where Lawrence Page and Sergey Brin were both students, working on a search engine, as a diversion while their Ph.D. Supervisor went on Sabbatical to Japan for a year.

A provisional Patent from Lawrence Page was the first official document describing how the search engine worked using the PageRank algorithm. I found a copy of the provisional patent behind PageRank on the USPTO website which I blogged about in 2011. That provisional patent was Improved Text Searching in Hypertext Systems (pdf – 1.7mb). In that version, Page referred to PageRank as “An Approximating to ‘Importance'” In other words, PageRank is an “approximation of how well-cited or important” matching documents for a query might be.

Google filed a patent on updating PageRank October 12, 2006 for the first time (it was updated since then at least once.) Another version of PageRank from researchers at Google resulted in a version considered more efficient in the paper Efficient Computation of PageRank. Other exist about PageRank as well.

The original patent behind PageRank assigned to Stanford University and which was only licensed to Google has likely expired in 2018. Which has meant that Search Engines other than Google could use PageRank. Chances are that the PageRank described in early patents from Stanford, and even the later patents and papers from Google has changed as it was implemented to rank pages on the Web.

I looked at the Google Research Publications from 2020 this morning, and came across a paper titled Scaling PageRank to 100 Billion Pages while the author was an employee at Yahoo. He is at Google now, and his name is Stergios Stergiou.

He tells us in his LinkedIn profile that he has:

Architected and implemented many massively distributed systems, including:

  • A Word2Vec algorithm that learns from a 1 trillion words corpus in 2 hours per epoch
  • A PageRank algorithm that executes 35″ iterations on a 3 trillion edges web graph
  • A Set Cover algorithm capable of processing 1 trillion elements in 20 billion sets
  • A Connected Components algorithm able to process a 5.9 trillion edge graph in 3808

That list item about PageRank matches up with the paper that he authored while at Yahoo!, and he may have experimented with that while at Yahoo, which his profile says that he left in October of 2017. He now works as a Software Engineer at Google.

We do not know whether he has worked on PageRank after joining Google, after leaving Yahoo, but it was interesting seeing the paper in the Google Research publications section.

It wouldn’t be there if he hadn’t joined Google, and we may never learn if the approaches behind PageRank described in that paper are in place at Google.

We also don’t know if the PageRank that he was writing about in that paper was like the one in use at Google when it was written.

However the paper is included in papers to be presented at WWW ’20, April 20–24, 2020, Taipei, Taiwan. According to the conference Website, it will still be held but will be online only.

I am not going to make any assumptions about the use of the processes described in the paper. The crawl data listed in it is cited as being from 2016, and some newer information in the footnotes of the paper are from 2020, such as a page on the Google site, about how Crawling and Indexing work at Google.

We have been told by Google Spokespeople that Google still uses PageRank. We don’t know if that version of PageRank is like the Scaling PageRank to 100 Billion Pages Version that will be presented in 2-3 weeks at the online WWW conference.

Sharing is caring!

20 thoughts on “PageRank 2020”

  1. Hi Panagiotis,

    The author of the paper wrote it while he was at Yahoo, and he has since left there to go to work at Google. I know from someone who works at Yahoo that the approach described in the paper is one that is in use. It is not PageRank. But it is a graph-based algorithm that is modeled after PageRank. One of the amazing things about it is how much it scales to cover a very large number of pages. Back when Google used to run the older version of PageRank, Google used to update rankings of pages using a batch processing approach every 5 weeks or so. Google moved on from that to an approach that updates rankings in a more real-time way. But it is possible that Google may explore a graph-based approach like the one described In this paper that the author wrote when he was at Yahoo. I took a look to see if he had any patents in his name, and according to LinkedIn he has worked on at least 62. I think it’s possible that we will see his name appearing on patents from Google in the future, and I will be watching out for them. Some of the earlier ones he has worked on for other companies looked very interesting. So, he wrote about PageRank, but it is a very different PageRank than the one that Lawrence Page invented in the late 90s.

  2. Just a quick note, your webpage is a great insight into the history of ranking mechanisms. Especially the evolution and growth and continual growth, even today.
    Thanks

  3. Hi James,

    Thanks for your note. There’s a lot happening all of the time in Search, and it pays to be diligent, and listen, and follow leads about what might be changing, because there is a lot of potential for change.

  4. Fascinating as always Bill.
    What do you feel the practical outcomes of scaling the existing PR algo?

    I might be off the mark here but I’m reading a potentially more dynamic and efficient algo at this scale.

  5. Hi Tom,

    This is Yahoo’s Algorithm, rather than Google’s. It should be helpful in handling a lot more data about websites and may be able to do it more quickly. I seem to recall reading that it was capable of doing a lot really quickly. I remember the days of Google doing batch updates every 5 weeks or so, and it is good to see that there isn’t any intent of going back to those days.

  6. Hi Bill,

    Yes, PageRank was a most important factor for marketers for years. It has been disappeared nowadays since Google has stopped using and updating it. Great information and looking forward to such posts.

  7. I really liked the idea of Page Rank and I still remember it would update every 6 months or so back in the day. I would take the much coveted Page Rank over the DA and PA ranks any day of the week.

  8. Nice article Bill.

    It was a good ranking factor. Yes it is only from 0 to 10 but it gave you an idea of where you were in the overall scheme of things.

    Have to admit we tend to use PR (Page Rank) and DA (Domain Authority) to track sites as it does give a quick easy way to do it.

  9. There was one great guy at Google, one of the best at algorithms in the world. He now works at SpaceX. His name is Tomek Czajka, he is one of the best Polish programmers, but he is not only one in our country for sure. Guys like him can create this sorta things and make it happen. Godbless.

  10. Hi Miron,

    I am not familiar with Tomek Czajka, but I looked him up and saw that he has been busy at places like Google and SpaceX. I do write a lot about the patents that Google comes out with, and it looks like Tomek has been involved in areas like commercial search quality at Google. It’s good to hear about his successes. Thanks for sharing about him.

  11. Hi Katie,

    The original PageRank that Google started using to rank sites didn’t use a scale from 1-10. That scale was used when Google started showing PageRank of pages to people browsing the web who were using a Google Toolbar that showed PageRank. It was developed for use with Internet Explorer and would show off what Google referred to as a “proxy” of PageRank. We have been told by spokespeople from Google that they still use a version os PageRank in ranking pages. The metrics you mention from Moz aren’t Google’s and possibly don’t work quite the same way, but it may be convenient to use them now that the tool-bar PageRank isn’t available to see where pages rank. I remember using that all of the time.

  12. Hi Robin,

    I believe that PageRank was being updated approximately every 5 weeks or so at one point. During that time there was a lot of fluctuations in rankings of pages, and people would refer to all of that movement as “The Google Dance.” Google started having events yearly at their Mountain View campus when an SMX event was held nearby that they would refer to as “The Google Dance.” I enjoyed using the PageRank toolbar metric to learn about pages, too.

  13. Hi Declan,

    The Toolbar PageRank has disappeared. Google stopped supporting the Internet Explorer PageRank Toolbar, by discontinuing the data supplied to it. However, it does appear that Google is continuing to use a link analysis score to in ranking pages. It is likely combined with an information retrieval score to determine rankings. They aren’t sharing those PageRank metrics with the world anymore though, and it is possible that many types of results may not use a PageRank score to determine where they rank in search results, such as news results, realtime results, and possibly others.

  14. It will be super interesting to watch how this further develops. I think he is definitely at Google now for a reason. Thank you for another informative post!

  15. Hi Paul,

    I checked the USPTO website to see if he had filed any patents, and he had been an inventor on many. I expect to see a number from him while he is at Google, too.

  16. Hi Alka,

    Using PageRank to decide which pages were good ones to get links from was misleading, because newer pages that hadn’t had a chance to accrue much PageRank yet and would have lower amounts of PageRank may have been good pages to get links from if they ended up gaining more links and having higher PageRanks. and were good sources if they ended up bringing visitors who might become customers. So basing where you get links from based solely on PageRank was misleading.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.