If you have been working as an SEO for any amount of time, or have been responsible for marketing a website, you may have heard of PageRank, which was developed by the Founders of Google in the earliest days of the Search Engine. There is some new news concerning PageRank that I wanted to share when I learned about it this morning.
You may have heard a little about the history of the PageRank algorithm, and that Google had stopped using at least the original version of PageRank back in 2006. Or that they started using a version of PageRank (still referred to as PageRank) after that time. They may have started using a different version of PageRank at that time. There have been at least two others developed by Google that were available in 2006.
The original patent behind PageRank was originally assigned to Stanford University, where Lawrence Page and Sergey Brin were both students who started working on a search engine, as a diversion while their Ph.D. Supervisor went on Sabbatical to Japan for a year.
A provisional Patent from Lawrence Page was the first official document that described how the search engine worked using the PageRank algorithm. I found a copy of the provisional patent behind PageRank on the USPTO website which I blogged about in 2011. That provisional patent was Improved Text Searching in Hypertext Systems (pdf – 1.7mb). In that version of the patent, Page referred to PageRank as “An Approximating to ‘Importance'” In other words, PageRank is an “approximation of how well-cited or important” matching documents for a query might be.
Google filed a patent updating PageRank October 12, 2006 for the first time (it was updated since at least once.) Another version of PageRank was written about by researchers at Google that resulted in a version that was considered more efficient in the paper Efficient Computation of PageRank. Other papers have been written about PageRank as well.
The original patent behind PageRank assigned to Stanford University and which was exclusively licensed to Google has likely expired in 2018. Which has meant that Search Engines other than Google could use PageRank. Chances are that the PageRank described in the early patents from Stanford, and even the later patents and papers from Google has changed as it was used to rank pages on the Web.
I took a look at the Google Research Publications from 2020 this morning, and came across a paper titled Scaling PageRank to 100 Billion Pages while the author was an employee at Yahoo. He is at Google now, and his name is Stergios Stergiou.
He tells us in his LinkedIn profile that he has:
Architected and implemented many massively distributed systems, including:
- A Word2Vec algorithm that learns from a 1 trillion words corpus in 2 hours per epoch
- A PageRank algorithm that executes 35″ iterations on a 3 trillion edges web graph
- A Set Cover algorithm capable of processing 1 trillion elements in 20 billion sets
- A Connected Components algorithm able to process a 5.9 trillion edge graph in 3808
That list item about PageRank matches up with the paper that he authored while at Yahoo!, and he may have experimented with that while at Yahoo, which his profile says that he left in October of 2017. He now works as a Software Engineer at Google.
We do not know whether or not he has worked on PageRank after joining Google, after leaving Yahoo, but it was interesting seeing the paper in the Google Research publications section.
It wouldn’t be there if he hadn’t joined Google, and we may never learn if the approaches behind PageRank described in that paper have been put into place at Google.
We also don’t know if the PageRank that he was writing about in that paper was like the one in use at Google when it was written.
However the paper is included in papers to be presented at WWW ’20, April 20–24, 2020, Taipei, Taiwan. According to the conference Website, it will still be held but will be online only.
I am not going to make any assumptions about the use of the processes described in the paper. The crawl data listed in it is cited as being from 2016, and some newer information in the footnotes of the paper are from 2020, such as a page on the Google site, about how Crawling and Indexing work at Google.
We have been told by Google Spokespeople that Google still uses PageRank. We don’t know if that version of PageRank is like the Scaling PageRank to 100 Billion Pages Version that will be presented in 2-3 weeks at the online WWW conference.