If you have been an SEO for any amount of time or responsible for marketing a website, you may have heard of PageRank. The Founders of Google developed it in the earliest days of the Search Engine. There is new information about PageRank that I wanted to share when I learned about it this morning.
Recently revealed news about the PageRank algorithm was that Google had stopped using at least the original version of PageRank back in 2006. Or that they started using a version of PageRank (still referred to as PageRank) after that time. They may have started using a different version of PageRank at that time. There have been at least two others developed by Google that were available in 2006.
The original patent behind PageRank was assigned to Stanford University, where Lawrence Page and Sergey Brin were both students, working on a search engine as a diversion while their Ph.D. The supervisor went on Sabbatical to Japan for a year.
A provisional Patent from Lawrence Page was the first official document describing how the search engine worked using the PageRank algorithm. I found a copy of the provisional patent behind PageRank on the USPTO website, which I blogged about in 2011. That provisional patent was Improved Text Searching in Hypertext Systems (pdf – 1.7mb). In that version, Page referred to PageRank as “An Approximating to ‘Importance'” In other words, PageRank is an “approximation of how well-cited or important” matching documents for a query might be.
Google filed a patent on updating PageRank October 12, 2006 for the first time (it was updated since then at least once.) Another version of PageRank from researchers at Google resulted in a version considered more efficient in the paper Efficient Computation of PageRank. Other exist about PageRank as well.
The original patent behind PageRank assigned to Stanford University and was only licensed to Google has likely expired in 2018. Which has meant that Search Engines other than Google could use PageRank. The chances are that the PageRank described in early patents from Stanford, and even the later patents and papers from Google has changed as it was implemented to rank pages on the Web.
I looked at the Google Research Publications from 2020 this morning, and came across a paper titled Scaling PageRank to 100 Billion Pages while the author was an employee at Yahoo. He is at Google now, and his name is Stergios Stergiou.
He tells us in his LinkedIn profile that he has:
Architected and implemented many massively distributed systems, including:
- A Word2Vec algorithm that learns from a 1 trillion words corpus in 2 hours per epoch
- A PageRank algorithm that executes 35″ iterations on a 3 trillion edges web graph
- A Set Cover algorithm capable of processing 1 trillion elements in 20 billion sets
- A Connected Components algorithm able to process a 5.9 trillion edge graph in 3808
That list item about PageRank matches up with the paper that he authored while at Yahoo!, and he may have experimented with that while at Yahoo, which his profile says that he left in October of 2017. He now works as a Software Engineer at Google.
We do not know whether he has worked on PageRank after joining Google after leaving Yahoo, but seeing the paper in the Google Research publications section was interesting.
It wouldn’t be there if he hadn’t joined Google, and we may never learn if the approaches behind PageRank described in that paper are in place at Google.
We also don’t know if the PageRank he was writing about in that paper was like the one in Google when it was written.
However the paper is included in papers to be presented at WWW ’20, April 20–24, 2020, Taipei, Taiwan. According to the conference Website, it will still be held but will be online only.
I am not going to make any assumptions about the use of the processes described in the paper. The crawl data listed in it is cited as being from 2016, and some newer information in the footnotes of the paper is from 2020, such as a page on the Google site about how Crawling and Indexing work at Google.
We have been told by Google Spokespeople that Google still uses PageRank. We don’t know if that version of PageRank is like the Scaling PageRank to 100 Billion Pages Version that will be presented in 2-3 weeks at the online WWW conference.