Compared Added: July 16, 2019, A Google Search Engineer on a thread at Hacker News told the world that Google stopped using the Stanford Version of PageRank back in 2006, which Barry Schwartz reported upon at Search Engine Roundtable in the post Former Google Engineer: Google Hasn’t Used PageRank Since 2006 That search engineer was Jonathan Tang, who has been an inventor on at least one Google Patent in the past. Tang stated in a longer post the following:
The comments here that PageRank is Google’s secret sauce also aren’t really true – Google hasn’t used PageRank since 2006. The ones about the search & clickthrough data being important are closer, but I suspect that if you made that public, you still wouldn’t have an effective Google competitor.
He told us this about the change away from that version of PageRank:
They replaced it in 2006 with an algorithm that gives approximately similar results but is significantly faster to compute. The replacement algorithm is the number reported in the toolbar and what Google claims as PageRank (it even has a similar name, and so Google’s claim isn’t technically incorrect). Both algorithms are O(N log N), but the replacement has a much smaller constant on the log N factor because it does away with the need to iterate until the algorithm converges. That’s fairly important as the web grew from ~1-10M pages to 150B+.
Google originally filed the newer version of PageRank that this post was about with the USPTO in 2006. It describes PageRank as a link analysis approach in describing the patent and doesn’t refer to itself as PageRank. Still, it is easy to refer to it after reading the patent as a new version of PageRank.
I was asked what parameters seed sites in the trusted seed sets might contain, and the patent (both the original and the continuation version of the patent) tell us that information:
In the section of the patent description labeled “Link Graphs and Seed Sets” are some examples, based on this: ” In one embodiment of the present invention, seeds 102 are specially selected high-quality pages which provide good web connectivity to other non-seed pages.” The patent provides 2 examples: The Google Directory (It was still around when the patent was first filed) and the New York Times. We are also told: “Seed sets need to be reliable, diverse enough to cover a wide range of fields of public interests & well connected to other sites. In addition, they should have large numbers of useful outgoing links to facilitate identifying other useful & high-quality pages, acting as “hubs” on the web.”
Under the PageRank patent, ranking scores are given to pages based upon how far away they might be from those seed sets and based upon other features of those pages.
PageRank Update by Google
The original PageRank patent, assigned to Stanford University, has expired. Google had an exclusive license to use PageRank. Google filed a PageRank update with a different algorithm behind it. That PageRank patent filed by Google has been updated. Without a doubt, it does cover PageRank, as it describes in the description to the patent, which tells us this about PageRank:
A popular search engine developed by Google Inc. of Mountain View, Calif. uses PageRank.RTM. As a page-quality metric for efficiently guiding the processes of web crawling, index selection, and web page ranking. Generally, the PageRank technique computes and assigns a PageRank score to each web page it encounters on the web. The PageRank score serves as a measure of the relative quality of a given web page compared to other web pages. PageRank generally ensures that important and high-quality web pages receive high PageRank scores, which enables a search engine to efficiently rank the search results based on their associated PageRank scores.
~ Producing a ranking for pages using distances in a web-link graph
A continuation patent showing a PageRank update was granted today. The original version of this PageRank patent was filed in 2006. It reminded me of a lot of Yahoo’s TrustRank (which is cited by the patent’s applicants as one of a large number of documents that this new version of the patent is based upon.)
I first wrote about this new version of PageRank in the post titled, Recalculating PageRank. It was originally filed in 2006, and the first claim in the patent read like this (note the mention of “Seed Pages”):
What is claimed is:
1. A method for producing a ranking for pages on the web, comprising: receiving a plurality of web pages, wherein the plurality of web pages are interlinked with page links; receiving n seed pages, each seed page including at least one outgoing link to a respective web page in the plurality of web pages, wherein n is an integer greater than one; assigning, by one or more computers, a respective length to each page link and each outgoing link; identifying, by the one or more computers and from among the n seed pages, a kth-closest seed page to a first web page in the plurality of web pages according to the lengths of the links, wherein k is greater than one and less than n; determining a ranking score for the first web page from the shortest distance from the kth-closest seed page to the first web page and producing a ranking for the first web page from the ranking score.
The first claim in the newer version of this continuation PageRank patent is:
What is claimed is:
1. A method, comprising: obtaining data identifying a set of pages to be ranked, wherein each page in the set of pages is connected to at least one other page in the set of pages by a page link; obtaining data identifying a set of n seed pages that each include at least one outgoing link to a page in the set of pages, wherein n is greater than one; accessing respective lengths assigned to one or more of the page links and one or more of the outgoing links; and for each page in the set of pages: identifying a kth-closest seed page to the page according to the respective lengths, wherein k is greater than one and less than n, determining the shortest distance from the kth-closest seed page to the page; and determining a ranking score for the page based on the determined shortest distance, wherein the ranking score is a measure of the relative quality of the page relative to other pages in the set of pages.
The Updated PageRank patent is:
Producing a ranking for pages using distances in a web-link graph
Inventors: Nissan Hajaj
Assignee: Google LLC
US Patent: 9,953,049
Granted: April 24, 2018
Filed: October 19, 2015
Abstract
One embodiment of the present invention provides a system that produces a ranking for web pages. During operation, the system receives a set of pages to be ranked, wherein the set of pages are interconnected with links. The system also receives a set of seed pages which include outgoing links to the set of pages. The system then assigns lengths to the links based on the properties of the links and properties of the pages attached to the links. Next, the system computes the shortest distances from the set of seed pages to each page in the set of pages based on the lengths of the links between the pages. Next, the system determines a ranking score for each page in the pages based on the computed shortest distances. The system then produces a ranking for the pages based on the ranking scores for the set of pages.
Under the PageRank patent, we see how it might avoid manipulation by building trust into a link graph like this:
One possible variation of PageRank that would reduce the effect of these techniques is to select a few “trusted” pages (also referred to as the seed pages) and discovers other pages that are likely to be good by following the links from the trusted pages. For example, the technique can use a set of high-quality seed pages (s.sub.1, s.sub.2, . . . , s.sub.n), and for each seed page i=1, 2, . . . , n, the system can iteratively compute the PageRank scores for the set of the web pages P using the formulae:
.A-inverted..noteq..di-elect cons..function..times..fwdarw..times..function..times..function..fwdarw. ##EQU00002## where R.sub.i(s.sub.i)=1, and w(q.fwdarw.p) is an optional weight given to the link q.fwdarw.p based on its properties (with the default weight of 1).
Generally, it is desirable to use many seed pages to accommodate the different languages and a wide range of fields in the fast-growing web content. Unfortunately, this variation of PageRank requires solving the entire system for each seed separately. Hence, as the number of seed pages increases, the complexity of computation increases linearly, limiting the number of seeds that can be practically used.
Hence, what is needed is a method and an apparatus for producing a ranking for pages on the web using many diversified seed pages without the problems of the above-described techniques.
The summary of the PageRank patent describes it like this:
One embodiment of the present invention provides a system that ranks pages on the web based on distances between the pages, wherein the pages are interconnected with links to form a link graph. More specifically, a set of high-quality seed pages is chosen as references for ranking the pages in the link graph. The shortest distances from the set of seed pages to each given page in the link graph are computed. Each of the shortest distances is obtained by summing lengths of a set of links that follows the shortest path from a seed page to a given page, wherein the length of a given link is assigned to the link based on properties of the link and properties of the page attached to the link. The computed shortest distances are then used to determine the ranking scores of the associated pages.
The PageRanl patent discusses the importance of a diversity of topics covered by seed sites and the value of a large set of seed sites. It also gives us a summary of crawling and ranking and searching like this:
Crawling Ranking and Searching Processes
FIG. 3 illustrates the crawling, ranking, and searching processes following an embodiment of the present invention. During the crawling process, web crawler crawls or otherwise searches through websites on the web to select web pages to be stored in the indexed form in a data center. In particular, the web crawler can prioritize the crawling process by using page rank scores. The selected web pages are then compressed, indexed, and ranked in (using the ranking process described above) before being stored in a data center.
A search engine receives a query from a user through a web browser during a subsequent search process. This query specifies the number of terms to be searched for in the set of documents. In response to a query, the search engine uses the ranking information to identify highly-ranked documents that satisfy the query. The search engine then returns a response through the web browser, wherein the response contains matching pages along with ranking information and references to the identified documents.
I’m thinking about looking up the many articles cited in the patent and providing links to them because they seem to be tremendous resources about the Web. So I’ll likely publish those soon.
I’ve written a few posts about links. These were ones that I found interesting:
5/30/2006 – Web Decay and Broken Links Can be Bad for Your Site
12/11/2007 – Google Patent on Anchor Text Indexing and Crawl Rates
1/10/2009 – What is a Reciprocal Link?
5/11/2010 – Google’s Reasonable Surfer: How the Value of a Link May Differ Based upon Link and Document Features and User Data
8/24/2010 – Google’s Affiliated Page Link Patent
7/13/2011 – Google Patent Granted on PageRank Sculpting and Opinion Passing Links
11/12/2013 – How Google Might Use the Context of Links to Identify Link Spam
12-10-2014 – A Replacement for PageRank?
4/24/2018 – PageRank Update
Last Updated July 16, 2019.
Interesting content. How will they decide which Seed pages to use, or will they be hand picked?
Incredible post Bill. Much needed during these times of fluctuations and uncertainty. It really helps. Thank you 😀
Thanks, David.
I knew when I saw this one at the USPTO that I had to write about it, and give people a chance to see it, and read it. When I see a continuation patent, it’s a indication to me that the patent holder had a reason to update the claims to reflect a change in how they are being applied. I didn’t compare how they changed from the earlier version to the later version, but it’s probably worth spending some time on, and I will likely return to it to compare them.
But we’ve always known the closer you are to a trusted source – the better. I wonder they only applied for the patent in 2015.
Thanks for the updated information on PageRank, Bill!
Perfect Post.. Keep Share such posts. I have bookmarked this post..
Really appreciate the work that you do on your site Bill!
Interesting, just makes me wonder how much of the old code from 1999 still running today.
But Bill, they turned off Page Rank years ago, didn’t ya hear?!
– Yours, Uninformed SEO
Your posts never cease to amaze me Bill, your dedication is unparalleled.
I’d like to see those changes in versions too.
Great stuff, thanks for posting.
They didn’t stop using PageRank, Uninformed SEO. They just stopped publishing and updating the number your domain name currently has.
Bill you continue to be a great source for Google news with your patent tracking, thanks for sharing this with us all!
I also hope May grey isn’t so bad this year.
Because they Patent it, does not mean they use the patent.
Kudos Bill! Fantastic post!
Yes, that is true. But they had people spend time researching and creating the patent. And this was a continuation patent, which means they filed it before, and it was granted before, and they updated the claims to reflect changes in a process that may have been in place, or at least that they want to protect legally. So, even if they aren’t using the patented process, they are taking action to protect that process. It does appear that they are continuing to use PageRank, and the original Pagerank patent from Stanford has expired – that doesn’t mean that they can’t use PageRank. They seem to have developed their own version with this orginal and updated patent.
Thank you, Andrew.
I hope the weather turns out fine here. So far it has been pretty nice.
The pagerank toolbar was for Internet Explorer, and there wasn’t a version for newer browsers. When they didn’t update the toolbar for newer versions of internet explorer, that wasn’t a surprise. It was convenient being able to see the proxy numbers for Pagerank for your site. But it definitely still exists, and is likely still being used.
Thank you, Graham.
Hi Colin,
I believe I have seen a few people making that claim.
Hi Peter,
Matt Cutts wrote a post about PageRank, where he said that it really changed a lot from when they first thought of it, to when they implemented it. I suspect most of the older code has been updated. The Web is a lot larger now than when Google first started out.
This was a continuation patent, which means that it updates an earlier one, that was originally filed in 2006. As a continuation patent it takes the original filing date of the patent it updates (the one from 2006.)
Hi Marco,
They handpick the seedpages. The patent explains more about what they look for in those, and it provides examples, such as The New York Times, and the Google Directory (which is now closed.)
Awesome! Thanks for getting this one Bill. Very interesting to see mentions of PageRank once again and definitely makes you wonder how Google has been using it behind the scenes…
Bill, you gave me in-depth knowledge about updated Page Rank.
amazing stuff you explained. thank you 🙂
“All pages are equal, but some pages are more equal than others.â€
Cheers Bill
Hi Kieran,
It’s good seeing some concepts get mentioned and addressed in patents that are just granted so recently. The mention of how the patent is aimed at issues like “link farms” was also interesting.
Hi Grant,
It is interesting to see that Google appears to have taken ownership and control over PageRank, now that Stanford’s patent on it has expired (and Google’s exclusive license to use that.) Seeing that they issued this patent and then have updated it with a continuation patent is interesting, too.
Fantastic article Bill, and seems like huge news. (I was alerted to this by Roger’s excellent take on it in SEJ). I’ve been trying to read thru the patent, side-stepping the math equations (which are beyond me I’m afraid) but some of the details listed are fascinating. It mentions “assigning lengths to the links based on properties of the links”. I note that elsewhere in the paper it expands on these link “properties” as things such as “the link’s position, the link’s font, and the source page’s out-degree”. (I’m assuming out-degree refers to number of outgoing links, whereby the more outgoing links on the page, the less value is passed to any destination page receiving the link ? ). But it also refers to “properties of the pages attached to the links”. Are these “properties” to be considered as indicators of quality/relevancy both on-page and offsite (inbound links ), in which case are they using existing signals to determine this or is this something different (or new ?) I’d love to get your take on these points, Bill. Thanks!
Hi Jim,
Thanks. I noticed some of that language, and my thought was that the patent may be referring to properties about links described in the Google patent describing their reasonable surfer model, which I wrote about the latest version here: https://www.seobythesea.com/2016/04/googles-reasonable-surfer-patent-updated/ It would make a lot of sense if this patent was referring to those, since it determines how much weight might be passed along from a link considering very similar properties (but not the lengths mentions in this patent.)
I am not sure the place you are getting your information, however good topic. I needs to spend some time studying more or understanding more. Thank you for wonderful information I was in search of this info for my mission.
Hi renumongia
I linked to the Patent I wrote about. It is at the United States Patent and Trademark Office. 🙂
Google files a number of patent applications with them every week, and every week some of the filings from Google get granted. I try to keep an eye out for granted and applications I think are interesting.
Thanks for the reply, Bill. Yes I remember now reading that Reasonable Surfer patent a while ago. Makes a lot of sense to think about it in terms of how clickable the links are on a page, and of course the anchor text will play a big role there.
Hey Bill,
Actually, read this the other day and per the usual content you put out, very useful and I personally believe a patent worth considering.
Mainly wanted to say congratulations on the clear mention of your superior standings in the SEO community as mentioned by Barry Schwartz in his weekly video update at Search Engine Roundtable.
We consider your site almost as much as the Google Guidelines.
Everyone here at VGI is sending a very big thanks for all you shared and hopefully will for years to come.
CHEERS BILL!
Thank you for the update. All of us freelance SEOs appreciate the work you put into helping us understand these topics.
Hi Garrett,
I learn best when I’m trying to teach others. It is really nice to hear that you are getting value from my posts. Thanks for letting me know. 🙂
Hi David,
Thank you. I appreciate hearing what you are saying about the Google guidelines, and your thanks. That is very good to hear. 🙂
Hi Jim,
it can be interesting seeing how some of the ideas in these patents can fit together. The big picture is often made up of a lot of tiny pieces.
Nice article but I think Page Rank is dead or kept hidden by the Google, these days domain authority by Moz is a more popular ranking criteria apart from Alexa’s page rank.
New invention published in 2018, but actually filed in 2015, so has it been applied for a long time?
I think it’s just a plus point, and the old PageRank algorithm has a higher coefficient, do not you think so?
Hi Laevis,
It takes a while for patents to be granted by the patent office, which can be a period of three years. That isn’t unusual.
The old PageRank patent has expired, and can no longer be used to exclude other people to use PageRank. It was owned by Stanford University, and Google had an exclusive license to use PageRank Not sure what you mean by a “higher coefficient”, but it is possible and likely that Google has moved on from a 20-year-old algorithm.
Hi Cathy,
Just because Google does not show a PageRank toolbar proxy that only worked in Internet Explorer does not mean that Google does not use PageRank. Google definitely does not use the Domain Authority Metric from Moz nor the Alexa Ranking Metri (not PageRank at all.) The Alexa ranking has been shown to be used by a self-selected group, and is often extremely biased.
After exploring a number of the blog posts on your web page,
I really appreciate your technique of writing a blog. I saved it to my bookmark site list and will be checking back soon. Take a look at my website as well and let me know what you think.
Good to know that links are STILL important. All those apocalypSEOs saying that links will lose importance in ranking are thankfully off the mark. Thanks for this!
Thanks for the updated information on PageRank, Bill!
Incredible post Bill. Much needed during these times of fluctuations and uncertainty. It really helps. Thank you 😀
Great post, Bill. Your posts are always so informative and worth reading.
Keep the posts coming.
Nice blog .u r providing good information about commenting websities.
Social media is very important for improving the branding of any product. In this article given the information is very good and currently I am working on it..
Incredible, mind blowing and fantastic post, Bill. I always got, your post is new with old or new update and this is one of them. On PageRank I read post a lot but i is special due to information. Thanks for sharing.
Hi, I work on digital marketing platform and i do off page seo for my website. and this article helps me a lot to learn more abot how to improve page rank of website on search engine. So thank you very much for sharing this article.
I’ve been looking for info on this topic for a while. I’m happy this one is so great. Keep up the excellent work.
Hi Laevis,
It takes a while for patents to be granted by the patent office, which can be a period of three years. That isn’t unusual. Incredible, mind blowing and fantastic post, Bill. I always got, your post is new with old or new update and this is one of them. On PageRank I read post a lot but i is special due to information. Thanks for sharing.
Very helpful blogging. Already I’ve visited several links you provided in your list. All the links are very effective. I’d like to see more such beautiful links from you by this posting.
This was really a very interesting & informative read. Back linking is indeed great for SEO and what can be a better way than doing it at well-known blog sites! And not only for SEO purposes, it also helps to build a connection between people belonging to the same industry. Thanks for the article!
Hey Bill,
Just wondering what your 3-5 most important ranking factors would be at the moment? I am more ecommerce focussed but (if you have time of course) it would also be great to know about your thoughts on local ranking factors as it is probably the most volatile in my experience anyway.
I’ve had a great time reading your posts as well!
Best Regards
Mike
Bill,
I just found your blog today. And I just spent an hour reading & learning. What a fantastic resource!
Hope this comment isn’t too far out of place: would you write a post about your favorite SEO tools? I think knowing which tools you prefer, and why, would be useful for me and, hopefully, other blog readers.
Hi Stefan,
My favorite SEO Tools are probably Screaming Frog Web Crawler, and Xenu Link Sleuth. I like both because they provide ways to crawl and understand the structure of a website. I’ve been enjoying the GSC and GA APIs built into Screaming Frog, to help make decisions about updates and changes to content on pages. Ultimately, those tools aren’t making decisions for you, but allow you to make good decisions regarding a site.
Thank you bill,
I am the beginner of SEO, in this blog, I learned about the PageRank really helpful post for who are learning SEO.
Good post bill, just wondering what is the shortest distance? how do we know that ?
Hi Anvesh,
The patent tells us about the lenght between pages and the seed sets of pages like this:
They don’t spell that out for us exactly, but there does seem to be a method behind the approach they state.
Thanks for sharing this valuable information with us
Very interesting read. Quite technical but I can see the close similarities between the original patent of 2006 and the recent 2018 one. Just a brief insight into the technicalities and challenges these web crawlers face as the internet grows larger and larger! But with their global dominance in English speaking countries Google surely has a good solution for ranking selections.
Thanks for sharing this valuable information with us. It helps me a lot, and my manager really impresses by doing this.
The information you have given is very clear and knowledgeable. It is very helpful for the freelance SEO’s . Keep going Bill. Great work!!
Hi, Thanks for such a useful information really helps that how keywords ranking is important in search result since im doing SEO from last 2yrs to my website by reading your article ill implement this strategist to my website.
Hi Bill
Thanks You very much sharing great article.
I really enjoyed it your article PageRank Update.
Thanks for your updated information.
Waiting for your next posting.
yours
Great!!
All SEO Work is not totally depend on Search engine Algo, it depend various possible reason like page speed , mobile responsive , users behavior on website, bounce rate of the website etc. So not to totally depend on update.
Hi Gyandhan,
Changes in rankings of web pages can happen for a number of reasons, including, changes you (or your host) may make to your site, changes that your competitors may make to their sites, changes in how searchers may behave, including the terms that they search for, and changes that a search engines may make. So, if you see changes in rankings at your site, it is possible that a change at the search engine may not be the cause.
Also, Google has many algorithms that it uses when ranking pages, and some of those involve things you list such as page Speed, Mobile responsiveness, and user behavior. Google Updates are also based upon the additions of algorithms. There are times when some algorithms are updated, like a PageRank, as I described in this post.