If you were asked to point out the patent that describes PageRank, and you went searching at the US Patent and Trademark Office (USPTO), you might quickly get confused. A little more confusion comes today, granting a new patent on PageRank to Stanford University. However, I’ve also located the very first PageRank patent, which I haven’t seen anywhere else other than in the USPTO information retrieval system.
PageRank filed many related patents in the late 90s addressing different aspects of PageRank by Lawrence Page and a stream of continuation patents that updated the originals. Many of the patents either claim priority over earlier patents or state that they are continuations of some of the earlier ones.
The First PageRank Patent
The earliest filing was for a provisional patent (application number 60035205) which was never officially assigned or published but was filed on January 10, 1997. Titled Improved Text Searching in Hypertext Systems (pdf – 1.7MB), the patent office information retrieval system contains a document it describes as “Miscellaneous Incoming Letter.” It contains the provisional patent filing and an appendix describing the processes being applied for. It is highly recommended reading if you’re interested in the history of PageRank and Google.
Here’s a snippet from the First PageRank Patent filing:
Existing search engines on the web produce very poor results when the query matches many documents. Yet, these simple queries are very frequently issued by users.
Described here is a system that yields radically improved results for these queries using the additional information available from a large web link database. This web citation database is used to determine a citation importance ranking for every web page, which is then used to sort the query results.
This system has been implemented and yields excellent results, even on a relatively small database of four million web pages. Not only does the system yield better results, but it does so at a significantly reduced computational cost, which can be a huge expense for web search engines.
Demonstrating the improvement is as easy as picking a general query. For example, choose “weather” and compare the results to the results from a traditional web search engine, like AltaVista (the results section shows some sample queries).
After the Provisional First PageRank Patent
On January 9, 1998, a new patent was filed. It claimed priority over that provisional patent filing Method for node ranking in a linked database (US Patent 6,285,999).
That patent filing was updated with Method for node ranking in a linked database (US Patent 7,058,628), originally filed on July 2, 2001.
Another patent filing a few days later, on July 6th, 2001, Method for scoring documents in a linked database (US Patent 6,799,176) isn’t mentioned in the newest patent to be granted. Still, it is related and notes that it is a continuation of the first provisional patent and U.S. Pat. No. 6,285,999.
Scoring documents in a linked database (US Patent 7,269,587), filed on December 1, 2004, claims to be a continuation of US Patent 7,058,628.
Finally, Stanford University was granted a patent today titled Annotating links in a document based on the ranks of documents pointed to by the links, which is a continuation of this line of PageRank patents.
The claims in the patent are written very differently than in some of the earlier patent filings. However, they cover substantially the same ground as many of the earlier versions. A difference involves the annotation of links being pointed to in the document.
This is the part of the claims that describe how It might annotate a link on a page:
24. The method of claim 18, where annotating the one or more links includes: associating an icon or text indicative of the one or more determined ranks with the one or more links.
25. A method performed by a computer, the method comprising: determining, by the computer, a rank for each of a plurality of documents in a database, the documents including linking documents and linked documents, one of the linking documents including a link to one of the linked documents; annotating, by the computer, the link in one of the linking documents, based on the determined rank of one of the linked documents, to form a modified document; and providing, by the computer, the modified document to a user.*
26. The method of claim 23, where annotating the link includes: associating an indicator of the determined rank with the link within one of the linking documents
* My emphasis.
I’m not sure if we will start seeing Google annotating links with the PageRank for the pages those links point to any time in the future, but that seems to be the major difference between this patent and the earlier ones.
The Newer PageRank Patent
The newest version of Stanford’s PageRank Patent’s claims seems to explain PageRank in plainer and more understandable language than the earlier patents. Still, I’m not sure if we will see the annotation system in action that it describes.
The most interesting discovery for me in researching this newest patent was the letter that I linked to, which contains the very first PageRank provisional patent, Improved Text Searching in Hypertext Systems. I hadn’t seen that before. I’m not sure if it’s available anywhere else yet but here and in the USTPO Information Retrieval system.
Google always says that Page Rank doesn’t matter but I still believe it clearly does matter. They might display a different PR in their toolbar but behind the scenes they must use it. Thanks for digging this stuff up, Bill.
Bill, awesome finding as usual.
I do NOT believe they will annotate the actual “pagerank number”
but anything related to it in a graphical format, be it printing
stronger ones also bolder, or the weak ones in a lighter grey on white ?
This could be some part of a “google browser” (think Chrome) that
does reformat pages based on their importance.
CU in NYC?
Best, Christoph
With the granting of the new patent to Stanford, it will be interesting to discover if Google returns to a more consistently updating toolbar PageRank. If Google goes back to updating every three months, next update would be right around the Federal Income Tax filing extended deadline of April 18.
Nice article. I love looking back in history at various passions of mine including technology. It is so easy to forget how fast this field has moved. The field of math (another passion) moves like a snail compared to technology. I agree with Nathan that Page Range must have some value. Google would probably eliminate the whole exercise if the value was zero.
Pagerank still exists but IMO meanwhile PR is just a sum of a lot of calculations that include the number of incoming and outgoing links on different weighted pages, that partly influence rankings (less than before).
Even more calculations are going on behind google’s closed doors that are not visible in the visible PR, but still influence rankings.
Hi Nathan
You’re welcome. I’m not sure if I can remember a direct statement from anyone at Google that said that PageRank didn’t matter anymore, but I’ve seen many that said that it’s often over rated by the public. It is only one of a large number of signals that Google is using these days.
Chances are that it does play a role in things like decisions of which URLs to crawl first when a search crawler ventures out on the Web. It also likely continues to have a role in ranking pages, along with a lot of other signals. The toolbar representation of PageRank is often dated and something that Google seems slow to update, but I’m guessing that a lot of people who see the toolbar pagerank make value judgments about pages they are on based upon the toolbar pagerank.
Hi Christoph,
Thank you. I’m wondering if the addition of the annotations in the claims section of the new patent was intended to make the patent different enough from the previous ones so that it would be granted, but it was an addition that likely won’t be implemented. The claims in the new patent are written in a manner that is much clearer than in the previous patents, and I think that may have been the best reason for publishing this “continuation” patent.
No NYC for me this year, but if you’re going I hope that you have a great time and learn a lot while you’re there.
Hi Randy,
I’ve always been at a loss to understand why Google seems to update Toolbar pagerank as infrequently as they do. If the pagerank in the toolbar is intended as an “annotation” showing the value of a page, then it’s updated so infrequently that it doesn’t hold as much value as it could.
I’ve also been trying to uncover what, if anything, the publication of this patent might have on Google’s exclusive right to use pagerank. It’s possible that there may have been some further negotiation on that point that we just don’t know anything about.
Hi Allen,
The history of search and technology is pretty fascinating, and I was pretty excited to have found and read that first PageRank patent. Larry Page and Google have come a long way since 1997.
As for the toolbar, I can’t help but wonder if it has more value as a marketing tool these days than as an annotation system. I know people can get pretty obsessive over the pagerank that shows for their pages in the toolbar, and it can sometimes be useful as a signal that pages have been viewed by the search engine and indexed, but since it’s updated so infrequently, it doesn’t seem to be a very accurate indication of the value of a page. Since it’s also likely only concerned with one aspect of Google’s rankings of pages – the strength of links pointing to those pages, it’s also often a confusing signal. A high pagerank for a page doesn’t, and has never, been an indication of how well that page might rank for different terms because it doesn’t take into account a large number of other ranking signals.
Hi Andreas
Very good points. I agree with you that much of what goes on at Google to rank pages for specific queries includes many more calculations than just PageRank. I think that it still has value in the way that Google ranks pages, but it’s only the tip of the iceberg (to use a slightly different analogy).
Great find Bill, that provisional filing is right up there with Linus Torvald’s Usenet posting way back then:
https://groups.google.com/group/comp.os.minix/browse_thread/thread/76536d1fb451ac60/b813d52cbc5a044b?hl=en
Definitely one for the history books!
– Ted
Hi Ted,
That’s a great thread. Thanks for pointing it out.
I’m not sure how I’ve missed Larry Page’s provisional patent for PageRank all these years. Guess I didn’t do enough digging around in the patent office’s information system, but then again, no one else seems to have done so either. It’s free of the legalese of most patents, and does a great job of letting us see the person behind the invention in his own words. Doesn’t sound like a lawyer had any part in drafting the document either.
Good find, although I do find this particular patent a little bit confusing. Lines like the following don’t make for a very intuitive read: “A method may identify a document that includes a link that points to a linked document, determine a score for the link in the identified document based on a score of the linked document, modify the identified document based on the determined score, and provide the modified document.”
From my superficial understanding, it would seem that these additions are a bit superfluous. Who would these annotations serve? And for what purpose? PageRank is very pertinent to those of us who’re involved in the digital industry (although it’s probably a system we’ve got a disproportionate awareness of) but why would anyone else need this kind of information?
@ Christoph C. Cemper
I’m pretty sure any kind of note that is used to clarify meaning for a reader in their process of reading a text is still considered annotation (technically highlighting and underlining are considered annotation). I don’t think they’ll display any kind of actual textual commentary either, but I think graphical annotation could be equally powerful for users and webmasters if some kind of key is provided.
I love seeing these patents. It always gets me excited for these new features that are set to come out…and with the language of a patent you never quite know what form the feature will actually take.
I am not so sure about Andreas said that PR is just a sum of a lot of calculations.
Just look at all .gov and .edu sites.
Try to imagine that large and very old ( we knew how Google adore old pages )network with a lot of pages linking to each other all with high PR and tell me again that PR doesn’t important.
And 4 Randy Pickard
“If Google goes back to updating every three months, next update would be right around the Federal Income Tax filing extended deadline of April 18.”
My new site and one page went from PR n/a to PR zero yesterday, so I think that PR updates is counted in real time and there is no updates every 3 months
I’m sure having old pages and lots of visitor is what google most loves but who can be sure since they have chosen to stay pretty quite on that front.
I believe PageRank remains a central factor in the algorithm. It is, after all, THE *quantitative* measurement Google uses to ascertain the importance or value of links. And we know links are critical if a site is to rank well in the SERPs. Google’s spam team has spent a lot of time trying to downplay the importance of PageRank, saying, repeatedly, it’s 1 of more than 200 factors. Google opened this can of worms when they released a toolbar that revealed PR; they’ve been backpedaling ever since. It may be one of 200 factors, but it is in the top 5, IMO. Slawski, as always, you rock.
Toolbar pagerank is a complete red herring for SEOs to argue endlessly over. Pagerank is central to Google’s algorithm, and is clearly updated on a continuous basis.
Bill – you are a great researcher!
In the debate over the importance of toolbar pagerank, here’s my take. Toolbar rank isn’t everything, but it isn’t nothing. I’m sure that the actual mathematical formulas Google uses to rank pages is much more complicated than I could compute. What toolbar pagerank tells is how strong is this website’s inbound link network. I think it’s valuable to know that.
Hi, Bill,
I do believe Cutts has said there is a direct correlation between PageRank and crawl rate, which you mention, but I wonder – some lower toolbar PR blog sites certainly get crawled frequently, but for static sites, it would seem there is a relationship. Does that jive with what you believe or know?
Hi Twosteps
The claims sections of patents can make a native speaker of the language they are written in wonder if it is indeed the same language.
The language in the newest pagerank patent is a lot clearer than most of the earlier pagerank patents (except for the very first one). Instead of referring to web pages as “nodes,” it actually calls them web pages. Instead of referring to links as “edges,” it calls them links. There’s definitely a reason why patents include description sections, to fill out and provide examples of what is written in the claims. Unfortunately, the descriptions can sometimes be pretty cryptic as well.
Annotations would be for the benefit of people finding links to pages in the search results, and if the right kind of information is include in those, they would have the potential to be pretty useful. In one of my more recent posts, How a Search Engine Might Visualize and Rerank Web Pages Based Upon Credibility, the Microsoft Research team explores what kind of information they might collect to provide annotations in search results based upon credibility signals.
Would those be helpful? I’m not sure. I guess it depends upon what kinds of signals they might use.
Hi FinallyFast,
Very good point on how things like bolding and highlighting in search results can be considered annotation. To some degree, the snippet that a search engine shows searchers is also an annotation.
I love seeing these kinds of patents, too. I really like getting a chance to see the possibility of some new addition to how search might work, and some hints and ideas of why these changes might take place as well.
Hi Peter,
Agree with your points about PageRank calculations. One of the original ideas behind PageRank was that if you start at one page on the Web, what is the probability that you would end up at another specific page on the Web. While it is just a series of calculations, the intent behind it is what makes it interesting.
It’s hard to say exactly what the schedule is behind the updates of the toolbar PageRank. Members of forums like Webmaster World tend to keep pretty good track of large-scale changes to the PageRank that shows up on toolbars, and if you follow their reporting of changes to the toolbar, it does seem like the toolbar pagerank is updated very infrequently.
In a January 2007 blog post, Matt Cutts notes that a number of different features from Google tend to be updated on a quarterly basis:
We know that Google was updating actual PageRank for pages on a much more frequent basis, and information about web pages in Google’s index has been happening much more quickly. From the same 2007 blog post:
I’ve seen some toolbar pagerank changes happen at times when no one else is reporting changes like those, but I suspect that the majority of pages only have their Toolbar PageRank updated 3-4 times a year.
Hi Craig,
Then again, sometimes Google really loves fresh new content that describes something very recent that people are doing a lot of searches for. 🙂
Hi M.J.,
It’s possible that PageRank is one of the most important signals that Google uses, even if they are looking at many more signals than that. There’s no denying that one of the difference between most electronic databases and the Web is that the Web enables people to link to other pages, and that those linking patterns can influence which pages people visit, and what they see on the Web.
I’m not sure that Google could have grown the way that it had without making PageRank public. It was one of the things that set it apart from search engines like AltaVista and Excite and Lycos. If Google hadn’t included PageRank in the toolbar, chances are they would have included it in search results, like they’ve done in the past with listings in the Google Directory.
Hi Ryan,
Unfortunately, I’ve seen a good number of blog and forum posts and articles about actual pagerank devolve into arguments over the worthlessness of toolbar pagerank in followup comments and forum responses. Not an avenue that I want to see followed here. 🙂
Actual PageRank may influence more than rankings, including which pages get crawled on a website and how frequently, and which pages are filtered out of search results when there are two or more pages that may contain near duplicate content.
Hi Juliemarg,
Thank you. There’s definitely a difference between what’s shown in the toolbar for PageRank, and what the actual PageRank for a page might be. What’s interesting about both the first PageRank patent, and the newest one is that both mention how it might be used as an annotation in search results, much like the toolbar pagerank can be seen as an annotation regarding pages it shows pagerank for. Unfortunately, I don’t think it’s very good as an annotation because it’s not updated on the same schedule as actual PageRank. I also question it’s use as an annotation because so many other factors may play a role in how good a page might be for the query used to find it.
surly high traffic and good content is what counts the most. It stands to reason that pages with have had longer to establish themselves will have more time to increase pr but it’s not guaranteed. I have always found that useful content and high traffic has been the biggest helper.
Hi M.J.,
Crawl Rate is most likely influenced by a combination of factors, which includes things like PageRank and how frequently they might be updated. Some sites might get crawled more frequently because they are linked to from other sites that tend to get crawled frequently as well.
A static site might not get crawled as frequently, even if it has a high PageRank if nothing much changes on the pages of the site. But, if the site does possess some content that might change on a regular basis – every four hours, daily, weekly, etc., even though it’s not a blog, it might get crawled more frequently. Those changes might include updated testimonials, news, featured products, products on sale, and other “changes.” So frequency of change is another metric that plays a role alongside pagerank in determining how often a page might be recrawled.
Hi Scott
Google tells us that there are over 200 signals that they look at that determine how highly pages might rank for query terms. We can say that things like “good content” and “high traffic” might be part of those, but how does a computer program determine what “good content” is, or “high traffic”? It can’t, without breaking those down into smaller parts that it can measure and compare those measurements with other sites.
I think Google is not only about relevancy but also fast search results; adding more information next to their listings might not be what they’re looking for, at least not this year (in my opinion)
P.S. I salute Firefox initiative to make their browser load faster with each upgrade. Did you guys using latest 4. version? 🙂
Google goes now on quality, how ever PR is stil important. Im searching for a tool which can calculate all yours IBL and give you some clue how to proceed with building PR. If some one finds it i like to be informed.
Bill, osom work, regarding that i search a lot over the net for high quality articles and that most are just selling you ****, i can say that you are part of 5% quality post made.
I hardly wait for yours next article.
That’s a great point Bill thanks for the heads up. I will certainly continue to add new pages all the time
Hi Codrut,
The speed of returing search results seems to have always been an important part of what Google offers. What’s really amazing is the number of computers that information about a search will go through before you have results for a query. I’ve seen it reported that 700-800 computers are involved in answering a query.
I’ve been using FireFox 4 for the last week or so. It is much faster than before.
Hi Robert
Thanks for your kind words. It’s hard to build the kind of tool that you’re suggesting using inbound links because we can’t be sure of the links that Google knows about, and how much value they give to each of them.
Hi Craig,
I’m not sure that adding new pages on a regular basis by itself is helpful. It may be more helpful to spend more time creating useful/helpful/informative pages that attract readers and keep them on your pages longer than it is to just grind out something new more frequently.
Hi Bill,
Is backlink the only factor that decides the pagerank? Does design, content quality also helps Google in determining the pagerank of particular page?
Hi Max,
PageRank as it was originally conceived on paper focused upon just links, though it’s likely evolved in a number of ways, possibly even when it was first implemented. I wrote about the reasonable surfer model last year, and how features associated with links might potentially influence how much PageRank each might pass along. Those features can include things like what part of a page a link might appear within (heading of page, sidebar, footer, etc.), the size and color and style of the font used in the link (and if it differs from the fonts around it), whether the anchor text used in a link is a good match for the topic express on the page it appears upon or the page that it points to, and more.
Thanks for this post. I find it funny that Google states PR doesn’t matter. I mean, sure, a PR1 site may be able to outrank a PR3 site but that’s really difficult to do unless the PR3’s content is not as relevant. To outrank a PR3 one will definitely need more backlinks, preferably of high quality, thus nominating for a higher PR with the next update. So PR may not *directly* count but it does seem to me that it’s a very strong indicator no matter what Google Says. In the long term that is.
Hi George,
I’m not sure that I’ve seen anyone from Google say that PageRank doesn’t matter, but I have seen more than a couple of Google Employees state that too many people place too much emphasis upon it. It still has its uses, but it’s only one of a large number of ranking signals, and if you pay attention only to PageRank, without being concerned about the others, that might not help you much.
Bill, you haven’t heard anyone from Google say it doesn’t matter, but the propaganda is strong enough that many people believe it no longer matters. I see it all the time on the forums, and I spend a lot of time trying to explain that it matters still. Yes, it’s one of 200, but I think it’s a very important one of 200. As George says, a PR1 page will not often outrank a higher PR page unless that higher PR page is not as relevant. I’m just saying …
Hi MJ,
That’s one of the reasons why I try to treat most propaganda with some skepticism. For year’s people have been obsessing over PageRank to the point where they ignore other aspects of ranking pages, and representives from Google have been stating publicly that they should consider those other aspects of how Google ranks pages as well.
Here are some of those statements:
Google Research Head Peter Norvig:
http://www.theregister.co.uk/2010/03/03/google_research_head_norvig_on_pagerank/
Matt Cutts on Peter Norvig’s statement:
http://www.webpronews.com/google-may-start-calling-pagerank-something-else-2010-03
Matt Cutts on PageRank, in a post on PageRank Sculpting:
http://www.mattcutts.com/blog/pagerank-sculpting/
Google Technology Overview
http://www.google.com/about/corporate/company/tech.html
Google’s Susan Moskwa
http://googlewebmastercentral.blogspot.com/2011/06/beyond-pagerank-graduating-to.html
PageRank is still used as a ranking signal by Google, and it’s still significant, but there are also many other signals that Google uses as well. Google may also look at PageRank to determine a crawling priority for pages, and possibly for other uses as well.
To draw an analogy to legal evidence, a piece of evidence that a lawyer is trying to enter into a case might be relevant but not material, like the character testimony of a kindergarten teacher for someone on trial for assault on the day of their 40th birthday. The character testimony is definitely relevant, but it’s just not that important since it’s so old. PageRank is similar to how material something is – how much weight it should carry. A link from a PageRank 1 page probably doesn’t convey as much weight or materiality (or importance) as a link from a PageRank 10 page.
Wonderful references, Bill. I must add this to the posts and articles I have on PageRank. Thank you.
One of the ways I like to explain it is to say that PageRank is simply a *quantitative* calculation of the importance of a page as opposed to a qualitative factor such as anchor text.
Was PageRank the be-all, end all? It did seem to be the central concept upon which Google’s original algorithm was based – and not much more than a year ago, the page http://www.google.com/corporate/tech.html read:
“PageRank Technology: PageRank reflects our view of the importance of web pages by considering more than 500 million variables and 2 billion terms. Pages that we believe are important pages receive a higher PageRank and are more likely to appear at the top of the search results.”
They have changed that now and it reads:
“When Google was founded, one key innovation was PageRank, a technology that determined the “importance†of a webpage by looking at what other pages link to it, as well as other data. Today we use more than 200 signals, including PageRank, to order websites, and we update these algorithms on a weekly basis. For example, we offer personalized search results based on your web history and location.”
Is PageRank less important because there are 200 more factors? Or because they say it isn’t quite so central anymore?
I don’t think so, Bill. I think Google protests too much. If PageRank weren’t important, they wouldn’t strip it manually from pages they find selling links. They wouldn’t be so terribly concerned about links sales, period.
Here’s my bottom line: you can have the most relevant content, but without links, what visibility do you have?
Yes, they have added more than 200+
I agree with Bill, the main thing about Page Rank is the links on the page. Page Rank should not be used as a highly important metric for ranking, but it should be used to find links with more weight than others. To me, I only think about page rank when finding link prospects, and not as an on-page metric on my own site for ranking.
Now, excuse me for not being a SEO history buff (I haven’t been in the industry as long as some of you guys), but has Page Rank ever had that be-all and end-all characteristic? I’m curious to know.
Hi Jonathan,
One of the reasons that I published this post is because the patent that I’ve copied and linked to really isn’t given the credit that it deserves as the first patent specifically about PageRank. It does seem pretty optimistic about PageRank being almost everything. But it wasn’t – relevance for the content on the page or at least in titles in the very early days, and for anchor text in links pointing to the pages not too long after were still just as important.
I’m not saying that PageRank is unimportant. It is a highly important metric for page ranking. But it’s just one aspect of ranking, sort of like a pyramid has a width aspect and a height aspect and a depth aspect.
It’s just that the other signals the search engines are important too. People have been obsessing over PageRank over the years while not giving other signals credit as well.
Hi MJ,
Thanks. Another way to explain PageRank is as a “query independent” ranking factor, where a page has the same PageRank score regardless of the query being searched for. Matching terms on a page, or having anchor text pointing to that page is “query dependent” because it does matter what the query is. The query independent and query dependent scores are used together when ranking a page.
@Bill – I struggle a bit with these patents.
Do you mean, a link would get a scored based on where it links too?
A = 1 -> B=3
A= NEW SCORE
25. ……..one of the linking documents including a link to one of the linked documents; annotating, by the computer, the link in the one of the linking documents, based on the determined rank of the one of the linked documents, to form a modified document; and providing, by the computer, the modified document to a user.*
Hi JC,
All of those patents have parts that are somewhat obtuse and difficult to understand.
There are some parts of the newest patent that does say something like that, such as the possibility that if a link is from one page on a domain to another page on the same domain, the link might not pass along as much PageRank. That’s were the patent says the following: