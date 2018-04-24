A popular search engine developed by Google Inc. of Mountain View, Calif. uses PageRank.RTM. as a page-quality metric for efficiently guiding the processes of web crawling, index selection, and web page ranking. Generally, the PageRank technique computes and assigns a PageRank score to each web page it encounters on the web, wherein the PageRank score serves as a measure of the relative quality of a given web page with respect to other web pages. PageRank generally ensures that important and high-quality web pages receive high PageRank scores, which enables a search engine to efficiently rank the search results based on their associated PageRank scores.
A continuation patent of an updated PageRank was granted today. The original patent was filed in 2006, and reminded me a lot of Yahoo’s Trustrank (which is cited by the patent’s applicants as one of a large number of documents that this new version of the patent is based upon.)
I first wrote about this patent in the post titled, Recalculating PageRank. It was originally filed in 2006, and the first claim in the patent read like this (note the mention of “Seed Pages”):
What is claimed is:
1. A method for producing a ranking for pages on the web, comprising: receiving a plurality of web pages, wherein the plurality of web pages are inter-linked with page links; receiving n seed pages, each seed page including at least one outgoing link to a respective web page in the plurality of web pages, wherein n is an integer greater than one; assigning, by one or more computers, a respective length to each page link and each outgoing link; identifying, by the one or more computers and from among the n seed pages, a kth-closest seed page to a first web page in the plurality of web pages according to the lengths of the links, wherein k is greater than one and less than n; determining a ranking score for the first web page from a shortest distance from the kth-closest seed page to the first web page; and producing a ranking for the first web page from the ranking score.
The first claim in the newer version of this continuation patent is:
What is claimed is:
1. A method, comprising: obtaining data identifying a set of pages to be ranked, wherein each page in the set of pages is connected to at least one other page in the set of pages by a page link; obtaining data identifying a set of n seed pages that each include at least one outgoing link to a page in the set of pages, wherein n is greater than one; accessing respective lengths assigned to one or more of the page links and one or more of the outgoing links; and for each page in the set of pages: identifying a kth-closest seed page to the page according to the respective lengths, wherein k is greater than one and less than n, determining a shortest distance from the kth-closest seed page to the page; and determining a ranking score for the page based on the determined shortest distance, wherein the ranking score is a measure of a relative quality of the page relative to other pages in the set of pages.
Producing a ranking for pages using distances in a web-link graph
Inventors: Nissan Hajaj
Assignee: Google LLC
US Patent: 9,953,049
Granted: April 24, 2018
Filed: October 19, 2015
Abstract
One embodiment of the present invention provides a system that produces a ranking for web pages. During operation, the system receives a set of pages to be ranked, wherein the set of pages are interconnected with links. The system also receives a set of seed pages which include outgoing links to the set of pages. The system then assigns lengths to the links based on properties of the links and properties of the pages attached to the links. The system next computes shortest distances from the set of seed pages to each page in the set of pages based on the lengths of the links between the pages. Next, the system determines a ranking score for each page in the set of pages based on the computed shortest distances. The system then produces a ranking for the set of pages based on the ranking scores for the set of pages.
Under this newer version of PageRank, we see how it might avoid manipulation by building trust into a link graph like this:
One possible variation of PageRank that would reduce the effect of these techniques is to select a few “trusted” pages (also referred to as the seed pages) and discovers other pages which are likely to be good by following the links from the trusted pages. For example, the technique can use a set of high quality seed pages (s.sub.1, s.sub.2, . . . , s.sub.n), and for each seed page i=1, 2, . . . , n, the system can iteratively compute the PageRank scores for the set of the web pages P using the formulae:
.A-inverted..noteq..di-elect cons..function..times..fwdarw..times..function..times..function..fwdarw. ##EQU00002## where R.sub.i(s.sub.i)=1, and w(q.fwdarw.p) is an optional weight given to the link q.fwdarw.p based on its properties (with the default weight of 1).
Generally, it is desirable to use a large number of seed pages to accommodate the different languages and a wide range of fields which are contained in the fast growing web contents. Unfortunately, this variation of PageRank requires solving the entire system for each seed separately. Hence, as the number of seed pages increases, the complexity of computation increases linearly, thereby limiting the number of seeds that can be practically used.
Hence, what is needed is a method and an apparatus for producing a ranking for pages on the web using a large number of diversified seed pages without the problems of the above-described techniques.
The summary of the patent describes it like this:
One embodiment of the present invention provides a system that ranks pages on the web based on distances between the pages, wherein the pages are interconnected with links to form a link-graph. More specifically, a set of high-quality seed pages are chosen as references for ranking the pages in the link-graph, and shortest distances from the set of seed pages to each given page in the link-graph are computed. Each of the shortest distances is obtained by summing lengths of a set of links which follows the shortest path from a seed page to a given page, wherein the length of a given link is assigned to the link based on properties of the link and properties of the page attached to the link. The computed shortest distances are then used to determine the ranking scores of the associated pages.
The patent discusses the importance of a diversity of topics covered by seed sites, and the value of a large set of seed sites. It also gives us a summary of crawling and ranking and searching like this:
Crawling Ranking and Searching Processes
FIG. 3 illustrates the crawling, ranking and searching processes in accordance with an embodiment of the present invention. During the crawling process, web crawler 304 crawls or otherwise searches through websites on web 302 to select web pages to be stored in indexed form in data center 308. In particular, web crawler 304 can prioritize the crawling process by using the page rank scores. The selected web pages are then compressed, indexed and ranked in 305 (using the ranking process described above) before being stored in data center 308.
During a subsequent search process, a search engine 312 receives a query 313 from a user 311 through a web browser 314. This query 313 specifies a number of terms to be searched for in the set of documents. In response to query 313, search engine 312 uses the ranking information to identify highly-ranked documents that satisfy the query. Search engine 312 then returns a response 315 through web browser 314, wherein the response 315 contains matching pages along with ranking information and references to the identified documents.
I’m thinking about looking up the many articles cited in the patent, and providing links to them, because they seem to be tremendous resources about the Web. I’ll likely publish those soon.
Comments
Interesting content. How will they decide which Seed pages to use, or will they be hand picked?
Incredible post Bill. Much needed during these times of fluctuations and uncertainty.
Bill says
Thanks, David.
I knew when I saw this one at the USPTO that I had to write about it, and give people a chance to see it, and read it. When I see a continuation patent, it’s a indication to me that the patent holder had a reason to update the claims to reflect a change in how they are being applied. I didn’t compare how they changed from the earlier version to the later version, but it’s probably worth spending some time on, and I will likely return to it to compare them.
But we've always known the closer you are to a trusted source – the better. I wonder they only applied for the patent in 2015.
Thanks for the updated information on PageRank, Bill!
Perfect Post.. Keep Share such posts.
Really appreciate the work that you do on your site Bill!
Interesting, just makes me wonder how much of the old code from 1999 still running today.
But Bill, they turned off Page Rank years ago, didn't ya hear?!

– Yours, Uninformed SEO
– Yours, Uninformed SEO
Your posts never cease to amaze me Bill, your dedication is unparalleled.
I’d like to see those changes in versions too.
Great stuff, thanks for posting.
They didn't stop using PageRank, Uninformed SEO. They just stopped publishing and updating the number your domain name currently has.
Bill you continue to be a great source for Google news with your patent tracking, thanks for sharing this with us all!
I also hope May grey isn’t so bad this year.
Because they Patent it, does not mean they use the patent.
Kudos Bill! Fantastic post!
Bill Slawski says
Yes, that is true. But they had people spend time researching and creating the patent. And this was a continuation patent, which means they filed it before, and it was granted before, and they updated the claims to reflect changes in a process that may have been in place, or at least that they want to protect legally. So, even if they aren’t using the patented process, they are taking action to protect that process. It does appear that they are continuing to use PageRank, and the original Pagerank patent from Stanford has expired – that doesn’t mean that they can’t use PageRank. They seem to have developed their own version with this orginal and updated patent.
Bill Slawski says
Thank you, Andrew.
I hope the weather turns out fine here. So far it has been pretty nice.
Bill Slawski says
The pagerank toolbar was for Internet Explorer, and there wasn’t a version for newer browsers. When they didn’t update the toolbar for newer versions of internet explorer, that wasn’t a surprise. It was convenient being able to see the proxy numbers for Pagerank for your site. But it definitely still exists, and is likely still being used.
Bill Slawski says
Thank you, Graham.
Bill Slawski says
Hi Colin,
I believe I have seen a few people making that claim.
Bill Slawski says
Hi Peter,
Matt Cutts wrote a post about PageRank, where he said that it really changed a lot from when they first thought of it, to when they implemented it. I suspect most of the older code has been updated. The Web is a lot larger now than when Google first started out.
Bill Slawski says
This was a continuation patent, which means that it updates an earlier one, that was originally filed in 2006. As a continuation patent it takes the original filing date of the patent it updates (the one from 2006.)
Bill Slawski says
Hi Marco,
They handpick the seedpages. The patent explains more about what they look for in those, and it provides examples, such as The New York Times, and the Google Directory (which is now closed.)
Awesome! Thanks for getting this one Bill. Very interesting to see mentions of PageRank once again and definitely makes you wonder how Google has been using it behind the scenes…
Bill, you gave me in-depth knowledge about updated Page Rank.
amazing stuff you explained. thank you 🙂
"All pages are equal, but some pages are more equal than others."
Cheers Bill
Bill Slawski says
Hi Kieran,
It’s good seeing some concepts get mentioned and addressed in patents that are just granted so recently. The mention of how the patent is aimed at issues like “link farms” was also interesting.
Bill Slawski says
Hi Grant,
It is interesting to see that Google appears to have taken ownership and control over PageRank, now that Stanford’s patent on it has expired (and Google’s exclusive license to use that.) Seeing that they issued this patent and then have updated it with a continuation patent is interesting, too.
Fantastic article Bill, and seems like huge news. (I was alerted to this by Roger’s excellent take on it in SEJ). I’ve been trying to read thru the patent, side-stepping the math equations (which are beyond me I’m afraid) but some of the details listed are fascinating. It mentions “assigning lengths to the links based on properties of the links”. I note that elsewhere in the paper it expands on these link “properties” as things such as “the link’s position, the link’s font, and the source page’s out-degree”. (I’m assuming out-degree refers to number of outgoing links, whereby the more outgoing links on the page, the less value is passed to any destination page receiving the link ? ). But it also refers to “properties of the pages attached to the links”. Are these “properties” to be considered as indicators of quality/relevancy both on-page and offsite (inbound links ), in which case are they using existing signals to determine this or is this something different (or new ?) I’d love to get your take on these points, Bill. Thanks!
Bill Slawski says
Hi Jim,
Thanks. I noticed some of that language, and my thought was that the patent may be referring to properties about links described in the Google patent describing their reasonable surfer model, which I wrote about the latest version here: http://www.seobythesea.com/2016/04/googles-reasonable-surfer-patent-updated/ It would make a lot of sense if this patent was referring to those, since it determines how much weight might be passed along from a link considering very similar properties (but not the lengths mentions in this patent.)
I am not sure the place you are getting your information, however good topic. I needs to spend some time studying more or understanding more.
Bill Slawski says
Hi renumongia
I linked to the Patent I wrote about. It is at the United States Patent and Trademark Office. 🙂
Google files a number of patent applications with them every week, and every week some of the filings from Google get granted. I try to keep an eye out for granted and applications I think are interesting.