PageRank Updated

Link graph structure of web pages

Sharing is caring!

PageRank Updated by Google

A popular search engine developed by Google Inc. of Mountain View, Calif. uses PageRank.RTM. as a page-quality metric for efficiently guiding the processes of web crawling, index selection, and web page ranking. Generally, the PageRank technique computes and assigns a PageRank score to each web page it encounters on the web, wherein the PageRank score serves as a measure of the relative quality of a given web page with respect to other web pages. PageRank generally ensures that important and high-quality web pages receive high PageRank scores, which enables a search engine to efficiently rank the search results based on their associated PageRank scores.

~ Producing a ranking for pages using distances in a web-link graph

A continuation patent showing PageRank updated was granted today. The original version of this PageRank patent was filed in 2006 and reminded me a lot of Yahoo’s TrustRank (which is cited by the patent’s applicants as one of a large number of documents that this new version of the patent is based upon.)

I first wrote about this PageRank in the post titled, Recalculating PageRank. It was originally filed in 2006, and the first claim in the patent read like this (note the mention of “Seed Pages”):

What is claimed is:

1. A method for producing a ranking for pages on the web, comprising: receiving a plurality of web pages, wherein the plurality of web pages are inter-linked with page links; receiving n seed pages, each seed page including at least one outgoing link to a respective web page in the plurality of web pages, wherein n is an integer greater than one; assigning, by one or more computers, a respective length to each page link and each outgoing link; identifying, by the one or more computers and from among the n seed pages, a kth-closest seed page to a first web page in the plurality of web pages according to the lengths of the links, wherein k is greater than one and less than n; determining a ranking score for the first web page from a shortest distance from the kth-closest seed page to the first web page; and producing a ranking for the first web page from the ranking score.

The first claim in the newer version of this continuation patent is:

What is claimed is:

1. A method, comprising: obtaining data identifying a set of pages to be ranked, wherein each page in the set of pages is connected to at least one other page in the set of pages by a page link; obtaining data identifying a set of n seed pages that each include at least one outgoing link to a page in the set of pages, wherein n is greater than one; accessing respective lengths assigned to one or more of the page links and one or more of the outgoing links; and for each page in the set of pages: identifying a kth-closest seed page to the page according to the respective lengths, wherein k is greater than one and less than n, determining a shortest distance from the kth-closest seed page to the page; and determining a ranking score for the page based on the determined shortest distance, wherein the ranking score is a measure of a relative quality of the page relative to other pages in the set of pages.

The PageRank Updated patent is:

Producing a ranking for pages using distances in a web-link graph
Inventors: Nissan Hajaj
Assignee: Google LLC
US Patent: 9,953,049
Granted: April 24, 2018
Filed: October 19, 2015

Abstract

One embodiment of the present invention provides a system that produces a ranking for web pages. During operation, the system receives a set of pages to be ranked, wherein the set of pages are interconnected with links. The system also receives a set of seed pages which include outgoing links to the set of pages. The system then assigns lengths to the links based on properties of the links and properties of the pages attached to the links. The system next computes shortest distances from the set of seed pages to each page in the set of pages based on the lengths of the links between the pages. Next, the system determines a ranking score for each page in the set of pages based on the computed shortest distances. The system then produces a ranking for the set of pages based on the ranking scores for the set of pages.

Under this newer version of PageRank, we see how it might avoid manipulation by building trust into a link graph like this:

One possible variation of PageRank that would reduce the effect of these techniques is to select a few “trusted” pages (also referred to as the seed pages) and discovers other pages which are likely to be good by following the links from the trusted pages. For example, the technique can use a set of high quality seed pages (s.sub.1, s.sub.2, . . . , s.sub.n), and for each seed page i=1, 2, . . . , n, the system can iteratively compute the PageRank scores for the set of the web pages P using the formulae:

.A-inverted..noteq..di-elect cons..function..times..fwdarw..times..function..times..function..fwdarw. ##EQU00002## where R.sub.i(s.sub.i)=1, and w(q.fwdarw.p) is an optional weight given to the link q.fwdarw.p based on its properties (with the default weight of 1).

Generally, it is desirable to use a large number of seed pages to accommodate the different languages and a wide range of fields which are contained in the fast-growing web contents. Unfortunately, this variation of PageRank requires solving the entire system for each seed separately. Hence, as the number of seed pages increases, the complexity of computation increases linearly, thereby limiting the number of seeds that can be practically used.

Hence, what is needed is a method and an apparatus for producing a ranking for pages on the web using a large number of diversified seed pages without the problems of the above-described techniques.

The summary of the patent describes it like this:

One embodiment of the present invention provides a system that ranks pages on the web based on distances between the pages, wherein the pages are interconnected with links to form a link-graph. More specifically, a set of high-quality seed pages are chosen as references for ranking the pages in the link-graph, and shortest distances from the set of seed pages to each given page in the link-graph are computed. Each of the shortest distances is obtained by summing lengths of a set of links which follows the shortest path from a seed page to a given page, wherein the length of a given link is assigned to the link based on properties of the link and properties of the page attached to the link. The computed shortest distances are then used to determine the ranking scores of the associated pages.

The patent discusses the importance of a diversity of topics covered by seed sites, and the value of a large set of seed sites. It also gives us a summary of crawling and ranking and searching like this:

Crawling Ranking and Searching Processes

FIG. 3 illustrates the crawling, ranking and searching processes in accordance with an embodiment of the present invention. During the crawling process, web crawler crawls or otherwise searches through websites on web to select web pages to be stored in indexed form in data center. In particular, web crawler can prioritize the crawling process by using the page rank scores. The selected web pages are then compressed, indexed and ranked in (using the ranking process described above) before being stored in data center.

During a subsequent search process, a search engine receives a query from a user through a web browser. This query specifies a number of terms to be searched for in the set of documents. In response to query, search engine uses the ranking information to identify highly-ranked documents that satisfy the query. Search engine then returns a response through web browser, wherein the response contains matching pages along with ranking information and references to the identified documents.

I’m thinking about looking up the many articles cited in the patent, and providing links to them because they seem to be tremendous resources about the Web. I’ll likely publish those soon.

Sharing is caring!

65 thoughts on “PageRank Updated”

  1. Incredible post Bill. Much needed during these times of fluctuations and uncertainty. It really helps. Thank you ๐Ÿ˜€

  2. Thanks, David.

    I knew when I saw this one at the USPTO that I had to write about it, and give people a chance to see it, and read it. When I see a continuation patent, it’s a indication to me that the patent holder had a reason to update the claims to reflect a change in how they are being applied. I didn’t compare how they changed from the earlier version to the later version, but it’s probably worth spending some time on, and I will likely return to it to compare them.

  3. Your posts never cease to amaze me Bill, your dedication is unparalleled.

    I’d like to see those changes in versions too.

    Great stuff, thanks for posting.

  4. They didn’t stop using PageRank, Uninformed SEO. They just stopped publishing and updating the number your domain name currently has.

  5. Bill you continue to be a great source for Google news with your patent tracking, thanks for sharing this with us all!

    I also hope May grey isn’t so bad this year.

  6. Yes, that is true. But they had people spend time researching and creating the patent. And this was a continuation patent, which means they filed it before, and it was granted before, and they updated the claims to reflect changes in a process that may have been in place, or at least that they want to protect legally. So, even if they aren’t using the patented process, they are taking action to protect that process. It does appear that they are continuing to use PageRank, and the original Pagerank patent from Stanford has expired – that doesn’t mean that they can’t use PageRank. They seem to have developed their own version with this orginal and updated patent.

  7. The pagerank toolbar was for Internet Explorer, and there wasn’t a version for newer browsers. When they didn’t update the toolbar for newer versions of internet explorer, that wasn’t a surprise. It was convenient being able to see the proxy numbers for Pagerank for your site. But it definitely still exists, and is likely still being used.

  8. Hi Peter,

    Matt Cutts wrote a post about PageRank, where he said that it really changed a lot from when they first thought of it, to when they implemented it. I suspect most of the older code has been updated. The Web is a lot larger now than when Google first started out.

  9. This was a continuation patent, which means that it updates an earlier one, that was originally filed in 2006. As a continuation patent it takes the original filing date of the patent it updates (the one from 2006.)

  10. Hi Marco,

    They handpick the seedpages. The patent explains more about what they look for in those, and it provides examples, such as The New York Times, and the Google Directory (which is now closed.)

  11. Awesome! Thanks for getting this one Bill. Very interesting to see mentions of PageRank once again and definitely makes you wonder how Google has been using it behind the scenes…

  12. Hi Kieran,

    It’s good seeing some concepts get mentioned and addressed in patents that are just granted so recently. The mention of how the patent is aimed at issues like “link farms” was also interesting.

  13. Hi Grant,

    It is interesting to see that Google appears to have taken ownership and control over PageRank, now that Stanford’s patent on it has expired (and Google’s exclusive license to use that.) Seeing that they issued this patent and then have updated it with a continuation patent is interesting, too.

  14. Fantastic article Bill, and seems like huge news. (I was alerted to this by Roger’s excellent take on it in SEJ). I’ve been trying to read thru the patent, side-stepping the math equations (which are beyond me I’m afraid) but some of the details listed are fascinating. It mentions “assigning lengths to the links based on properties of the links”. I note that elsewhere in the paper it expands on these link “properties” as things such as “the link’s position, the link’s font, and the source page’s out-degree”. (I’m assuming out-degree refers to number of outgoing links, whereby the more outgoing links on the page, the less value is passed to any destination page receiving the link ? ). But it also refers to “properties of the pages attached to the links”. Are these “properties” to be considered as indicators of quality/relevancy both on-page and offsite (inbound links ), in which case are they using existing signals to determine this or is this something different (or new ?) I’d love to get your take on these points, Bill. Thanks!

  15. Hi Jim,

    Thanks. I noticed some of that language, and my thought was that the patent may be referring to properties about links described in the Google patent describing their reasonable surfer model, which I wrote about the latest version here: http://www.seobythesea.com/2016/04/googles-reasonable-surfer-patent-updated/ It would make a lot of sense if this patent was referring to those, since it determines how much weight might be passed along from a link considering very similar properties (but not the lengths mentions in this patent.)

  16. I am not sure the place you are getting your information, however good topic. I needs to spend some time studying more or understanding more. Thank you for wonderful information I was in search of this info for my mission.

  17. Hi renumongia

    I linked to the Patent I wrote about. It is at the United States Patent and Trademark Office. ๐Ÿ™‚

    Google files a number of patent applications with them every week, and every week some of the filings from Google get granted. I try to keep an eye out for granted and applications I think are interesting.

  18. Thanks for the reply, Bill. Yes I remember now reading that Reasonable Surfer patent a while ago. Makes a lot of sense to think about it in terms of how clickable the links are on a page, and of course the anchor text will play a big role there.

  19. Hey Bill,
    Actually, read this the other day and per the usual content you put out, very useful and I personally believe a patent worth considering.

    Mainly wanted to say congratulations on the clear mention of your superior standings in the SEO community as mentioned by Barry Schwartz in his weekly video update at Search Engine Roundtable.

    We consider your site almost as much as the Google Guidelines.

    Everyone here at VGI is sending a very big thanks for all you shared and hopefully will for years to come.

    CHEERS BILL!

  20. Hi Garrett,

    I learn best when I’m trying to teach others. It is really nice to hear that you are getting value from my posts. Thanks for letting me know. ๐Ÿ™‚

  21. Hi David,

    Thank you. I appreciate hearing what you are saying about the Google guidelines, and your thanks. That is very good to hear. ๐Ÿ™‚

  22. Hi Jim,

    it can be interesting seeing how some of the ideas in these patents can fit together. The big picture is often made up of a lot of tiny pieces.

  23. Nice article but I think Page Rank is dead or kept hidden by the Google, these days domain authority by Moz is a more popular ranking criteria apart from Alexa’s page rank.

  24. New invention published in 2018, but actually filed in 2015, so has it been applied for a long time?
    I think it’s just a plus point, and the old PageRank algorithm has a higher coefficient, do not you think so?

  25. Hi Laevis,

    It takes a while for patents to be granted by the patent office, which can be a period of three years. That isn’t unusual.

    The old PageRank patent has expired, and can no longer be used to exclude other people to use PageRank. It was owned by Stanford University, and Google had an exclusive license to use PageRank Not sure what you mean by a “higher coefficient”, but it is possible and likely that Google has moved on from a 20-year-old algorithm.

  26. Hi Cathy,

    Just because Google does not show a PageRank toolbar proxy that only worked in Internet Explorer does not mean that Google does not use PageRank. Google definitely does not use the Domain Authority Metric from Moz nor the Alexa Ranking Metri (not PageRank at all.) The Alexa ranking has been shown to be used by a self-selected group, and is often extremely biased.

  27. After exploring a number of the blog posts on your web page,
    I really appreciate your technique of writing a blog. I saved it to my bookmark site list and will be checking back soon. Take a look at my website as well and let me know what you think.

  28. Good to know that links are STILL important. All those apocalypSEOs saying that links will lose importance in ranking are thankfully off the mark. Thanks for this!

  29. Incredible post Bill. Much needed during these times of fluctuations and uncertainty. It really helps. Thank you ๐Ÿ˜€

  30. Nice blog .u r providing good information about commenting websities.
    Social media is very important for improving the branding of any product. In this article given the information is very good and currently I am working on it..

  31. Incredible, mind blowing and fantastic post, Bill. I always got, your post is new with old or new update and this is one of them. On PageRank I read post a lot but i is special due to information. Thanks for sharing.

  32. Hi, I work on digital marketing platform and i do off page seo for my website. and this article helps me a lot to learn more abot how to improve page rank of website on search engine. So thank you very much for sharing this article.

  33. Iโ€™ve been looking for info on this topic for a while. Iโ€™m happy this one is so great. Keep up the excellent work.

  34. Hi Laevis,

    It takes a while for patents to be granted by the patent office, which can be a period of three years. That isnโ€™t unusual. Incredible, mind blowing and fantastic post, Bill. I always got, your post is new with old or new update and this is one of them. On PageRank I read post a lot but i is special due to information. Thanks for sharing.

  35. Very helpful blogging. Already Iโ€™ve visited several links you provided in your list. All the links are very effective. Iโ€™d like to see more such beautiful links from you by this posting.

  36. This was really a very interesting & informative read. Back linking is indeed great for SEO and what can be a better way than doing it at well-known blog sites! And not only for SEO purposes, it also helps to build a connection between people belonging to the same industry. Thanks for the article!

  37. Hey Bill,

    Just wondering what your 3-5 most important ranking factors would be at the moment? I am more ecommerce focussed but (if you have time of course) it would also be great to know about your thoughts on local ranking factors as it is probably the most volatile in my experience anyway.

    I’ve had a great time reading your posts as well!

    Best Regards

    Mike

  38. Bill,

    I just found your blog today. And I just spent an hour reading & learning. What a fantastic resource!

    Hope this comment isn’t too far out of place: would you write a post about your favorite SEO tools? I think knowing which tools you prefer, and why, would be useful for me and, hopefully, other blog readers.

  39. Hi Stefan,

    My favorite SEO Tools are probably Screaming Frog Web Crawler, and Xenu Link Sleuth. I like both because they provide ways to crawl and understand the structure of a website. I’ve been enjoying the GSC and GA APIs built into Screaming Frog, to help make decisions about updates and changes to content on pages. Ultimately, those tools aren’t making decisions for you, but allow you to make good decisions regarding a site.

  40. Thank you bill,
    I am the beginner of SEO, in this blog, I learned about the PageRank really helpful post for who are learning SEO.

  41. Good post bill, just wondering what is the shortest distance? how do we know that ?

  42. Hi Anvesh,

    The patent tells us about the lenght between pages and the seed sets of pages like this:

    To compute distances between pages in link-graph 100, we need to assign a “length” to every link. The length of a link can be a function of any set of properties of the link and the source of the link. These properties can include, but are not limited to, the link’s position, the link’s font, and the source page’s out-degree.

    They don’t spell that out for us exactly, but there does seem to be a method behind the approach they state.

  43. Very interesting read. Quite technical but I can see the close similarities between the original patent of 2006 and the recent 2018 one. Just a brief insight into the technicalities and challenges these web crawlers face as the internet grows larger and larger! But with their global dominance in English speaking countries Google surely has a good solution for ranking selections.

  44. The information you have given is very clear and knowledgeable. It is very helpful for the freelance SEO’s . Keep going Bill. Great work!!

  45. Hi, Thanks for such a useful information really helps that how keywords ranking is important in search result since im doing SEO from last 2yrs to my website by reading your article ill implement this strategist to my website.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.