10 Most Important SEO Patents: Part 1 – The Original PageRank Patent Application

I like looking at patents and whitepapers and other primary sources from search engines to help me in my practice of SEO. I’ve been writing about them for more than 5 years now, and am putting together this series of the 10 Most important SEO patents to share some of what I’ve learned during that time. These aren’t patents about SEO, but rather ones that I would recommend to anyone interested in learning more about SEO by looking at patents from sources like Google or Microsoft or Yahoo.

The first PageRank patent application was never published by the United States Patent and Trademark Office (USPTO), it was never assigned to a particular company or organization, and it was never granted. It avoids dense legal language and mathematics that can make reading patents difficult, and it captures the excitement of a candidate Ph.D. student, Larry Page, who has just come up with a breakthrough in indexing webpages that had the potential to be a vast improvement over other search engines at the time it was published.

The top of the cover letter for the provisional patent filing for PageRank.

The patent is Improved Text Searching in Hypertext Systems (pdf – 1.7mb), Patent Application number 60/035,205, filed on January 10, 1997. I was digging through the USPTO’s Patent Information Retrieval Database this past March when I came across it, and I wrote about it in a blog post at The First PageRank Patent and the Newest. I hadn’t seen it referred to anywhere else on the Web, which I think is a shame.

This provisional patent may not have the weight or legal value of the continuation patents that followed it, but it captures the excitement and personality of its inventor, Larry Page, in a manner that those patents missed. It also provides head-to-head examples of search results from both Google and AltaVista for specific queries to illustrate how the link analysis involved in what Page was doing with PageRank made a difference.

One of the other interesting aspects of the patent is the display of “PageRank” next to individual pages in search results themselves. This isn’t the ToolBar PageRank that we see these days, but rather actual PageRank numbers. It also shows a count of the actual backlinks for pages in search results as well. For example, on a search for [University], the top ranked page is the homepage of the University of Illinois at Urbana-Champaign, with a PageRank of 694.687, and 8,460 backlinks. At the time, the search engine returning these results was referred to as “Backrub” rather than Google.

The introduction and summary in the patent tells us, in part, about the invention within the patent:

Described here is a system which yields radically improved results for these queries using the additional information available from a large database of web links. This database of Web citations is used to determine a citation importance ranking for every web page, which is then used to sort the query results.

This system has been implemented, and yields excellent results, even on a relatively small database of four million web pages.

Not only does the system yield better results, but it does so at significantly reduced computational cost, which can be a very large expense for web search engines.

Demonstrating the improvement is as easy as picking a general query, for example “weather”, and comparing the results to the results from a traditional web search engine, like AltaVista (the results section shows some sample queries).

PageRank turned out to be a significant improvement over algorithms used by other search engines, and Google was granted an exclusive license to use the technology from Stanford University, until the year 2011. I don’t know if that license was extended at any point in time, but patents from Microsoft and Yahoo and other organizations have been filed and granted which build upon PageRank in a number of ways.

It’s also very much likely that the original PageRank algorithm was altered and improved upon almost immediately upon implementation, and I’ll be pointing to at least one set of improvements later in this series of the 10 Most Important SEO Patents.

In addition to reading patents about PageRank, it’s also worth looking at some of the early papers about it as well, such as the one penned by Google Founders Page and Brin, The Anatomy of a Large-Scale Hypertextual Web Search Engine, and The PageRank Citation Ranking: Bringing Order to the Web.

Like many patents, it’s not unusual to see other patents spawn from a single patent as continuation or divisional patents that either include the original patent, supersede it, or take one aspect of it and build upon it. There were a number of PageRank patents from Stanford University authored by Larry Page which built upon this original patent, and added to it, as follows:

Method for node ranking in a linked database
Invented by Lawrence Page
Assigned to The Board of Trustees of the Leland Stanford Junior University
US Patent 6,285,999
Granted September 4, 2001
Filed: January 9, 1998

Abstract

A method assigns importance ranks to nodes in a linked database, such as any database of documents containing citations, the world wide web or any other hypermedia database. The rank assigned to a document is calculated from the ranks of documents citing it. In addition, the rank of a document is calculated from a constant representing the probability that a browser through the database will randomly jump to the document. The method is particularly useful in enhancing the performance of search engine results for hypermedia databases, such as the world wide web, whose documents have a large variation in quality.

Continuation to 6,285,999…

Method for scoring documents in a linked database
Invented by Lawrence Page
Assigned to The Board of Trustees of the Leland Stanford Junior University
US Patent 6,799,176
Granted September 28, 2004
Filed: July 6, 2001

Abstract

A method is presented for scoring documents stored in a network. The method includes identifying links from linking documents to linked documents in the network and determining an importance of the identified links. The method further includes weighting the identified links based on the determined importance and scoring the linked documents based on the weighted links.

Continuation to 6,285,999…

Method for node ranking in a linked database
Invented by Lawrence Page
Assigned to The Board of Trustees of the Leland Stanford Junior University
US Patent 7,058,628
Granted June 6, 2006
Filed: July 2, 2001

Abstract

A method assigns importance ranks to nodes in a linked database, such as any database of documents containing citations, the world wide web or any other hypermedia database. The rank assigned to a document is calculated from the ranks of documents citing it. In addition, the rank of a document is calculated from a constant representing the probability that a browser through the database will randomly jump to the document. The method is particularly useful in enhancing the performance of search engine results for hypermedia databases, such as the world wide web, whose documents have a large variation in quality.

Continuation to 7,058,628…

Scoring documents in a linked database
Invented by Lawrence Page
Assigned to The Board of Trustees of the Leland Stanford Junior University
US Patent 7,269,587
Granted September 11, 2007
Filed: December 1, 2004

Abstract

A method assigns importance ranks to nodes in a linked database, such as any database of documents containing citations, the world wide web or any other hypermedia database. The rank assigned to a document is calculated from the ranks of documents citing it. In addition, the rank of a document is calculated from a constant representing the probability that a browser through the database will randomly jump to the document.

Continuation to 7,058,628…

Annotating links in a document based on the ranks of documents pointed to by the links
Invented by Lawrence Page
Assigned to The Board of Trustees of the Leland Stanford Junior University
US Patent 7,908,277
Granted March 15, 2011
Filed: February 5, 2007

Abstract

A method may identify a document that includes a link that points to a linked document, determine a score for the link in the identified document based on a score of the linked document, modify the identified document based on the determined score, and provide the modified document.

These weren’t the only patents that were based in part upon the PageRank algorithm, and there’s an excellent overview of some of the patents and papers that followed it from Yahoo’s Pavel Berhkin, in his paper A Survey on PageRank Computing. He was also co-inventor on a Yahoo patent that I wrote about in Yahoo Replaces PageRank Assumptions with User Data, which includes a number of thoughtful criticisms of PageRank. The patent at the heart of that post is User-sensitive pagerank.

If you want to dig even more deeply into PageRank, and the approaches that followed it, the book Google’s PageRank and Beyond: The Science of Search Engine Rankings by Amy N. Langville and Carl D. Meyer is worth spending some time on.

Conclusion

A few notes on this series. I called this the “10 Most Important SEO Patents” rather than the “10 Most Important Search Patents”, because I’ve been having people ask me for a few years to point out the patents that they should read that might be most helpful for them in their practice of SEO. I’ve made a number of lists over that time, and found that it was easy to come up with the first 5 or so, but the last 5 proved considerably more elusive.

I’ve now nailed down at least the top 7 that I would recommend, and I’m hoping that by the time I reach number 8, I’ll have some idea of the last three that I want to include in this list. Of course, I’m open to suggestions and to hearing from readers of this series which patents they would recommend, as well as questions about these patents themselves.

There are other ways to learn about SEO, and actual execution and experience in building web pages and optimizing them can’t be rivaled, but looking at patents and papers from the people who build search engines provides a window into the challenges they’ve faced, the assumptions that they’ve made, and the ambitions that they hold. Gaining the perspective of search engineers in how they intended search engines to work is invaluable to those who practice search engine optimization.

I also want to note that it’s important to avoid placing too much faith in any one patent and the methods that it describes as to the actual practices of search engines these days. What a patent describes is only a summary of an approach that a search engine might take, and what is actually developed by a search engine may change in actual practice.

Like PageRank, an algorithm may transform over time as it is implemented and tweaked by a search engine. Look at these patents for the assumptions search engineers have made about the Web, about search, and about searchers. As you read them, come up with questions that you can ask yourself and others. Look for ways to experiment with the ideas within them as well.

Note that PageRank is only one of many signals that Google uses at this time, and that the search engine is exploring the use of many other signals on a regular basis. It was first introduced back in 1997, almost 15 years ago. But it had a tremendous impact in its day, and likely continues to be an important part of how Google ranks Web pages.

All parts of the 10 Most Important SEO Patents series:

Part 1 – The Original PageRank Patent Application
Part 2 – The Original Historical Data Patent Filing and its Children
Part 3 – Classifying Web Blocks with Linguistic Features
Part 4 – PageRank Meets the Reasonable Surfer
Part 5 – Phrase Based Indexing
Part 6 – Named Entity Detection in Queries
Part 7 – Sets, Semantic Closeness, Segmentation, and Webtables
Part 8 – Assigning Geographic Relevance to Web Pages
Part 9 – From Ten Blue Links to Blended and Universal Search
Part 10 – Just the Beginning

Share

29 thoughts on “10 Most Important SEO Patents: Part 1 – The Original PageRank Patent Application”

  1. Most people still have the conception that the pagerank of 1997 is still the pagerank of today, and therefore isn’t relevant. I can’t imagine how many iterations they have made to this algorithm over the years, and the hundreds (possibly thousands) of patents that have come with it. Look forward to the rest of this series Bill, and glad you finally got around to it.

  2. Awesome! Bill, I will be watching this series very closely. Especially if you write it.

    With respect to this post and PageRank, it is interesting to note (if I understand this correctly) that what originally started as a “linked database” heat-map of sorts was eventually scaled up to include the entire web itself.

    Pretty smart I think, but a metric that will be losing steam in the future as Social Rank has more and more of a say.

    Hey, that could be the next PageRank, SocialRank, and it could be a little blue bar next to the green one… ;)

    Mark

  3. Pingback: 10 Most Important SEO Patents: Part 1 - The Original PageRank Patent Application | Content Strategy |Brand Development |Organic SEO | Scoop.it
  4. I still think PageRank is one of the best search patents ever. It revolutionized the way search engines operate. Though PageRank can be manipulated to a certain extent (for instance my personal site has a PR 7) it is still a pillar of search engine evolution.

  5. Bill, one thing I would love to see from you is something explain the process of patents. I know they prevent lawsuits, but it seems really strange and not as much set in stone as I think I imagine it. What makes a company want to patent something? When should a company go about it? How does the process work/why does it take so long? What are the implications of these patents on a SE such as Bing – does it cripple them from developing an even-field search engine or am I misinterpreting what they do? Would love to read a post (perhaps you’ve done one?) from that perspective. Thanks!

  6. Great! PageRank revolutionized the world of search engines and the whole web.
    And even if other signals come up, I think PageRank will stay important.

  7. This patent is really the one that every SEO should read. It is clear and explains in great detail the PageRank algorithm, its amazing simplicity, but its difficult to calculate. I never read the patent but the associated academic paper, thank you for this wonderful (re)discovery!

  8. Thanks for sharing Bill. I think PageRank. I think PR still vert important for ranking.

  9. @Nicolas definitely agree, most advanced SEOs have a tendency to dog PR and other dated metrics, but it’s still very useful today.

  10. Hi Keith,

    I’m not sure I can even imagine how many changes PageRank went through in its first year of implementation, but I imagine there were more than a few. The Stanford patents after the first provisional patent even suggest doing things like giving different weights to links based upon different features associated with them.

    You have me curious:

    A search for granted patents that include the word PageRank = 213
    A search for published pending patent applications with the word PageRank = 543

    Thanks on your kind words about this series. It’s been fun so far. :)

  11. Hi Mark,

    Thank you. Not sure that I’ve heard the linked database heatmap story, but its possible that Page was looking at a number of different ideas. Back then the Web was nowhere as large as it is now. I believe he refers in the provisional patent to having about 4 million pages indexed. There are websites larger than that now.

    Google has been telling us over the years that they are looking at more and more signals to rank pages over the years, and a number that I’ve heard most frequently from them lately has been “more than 200″. It’s possible that they could be looking at much more. Microsoft claimed in one paper from around 2005 that they were using almost 600 different signals in their Ranknet algorithm.

    I do expect some social/credibility/author rank to be added at some point to Google rankings. It’s possible that I may be discussing some aspect of that in this series. :)

  12. Hi Ryan,

    PageRank added an element to ranking pages that did add an intelligent aspect to how pages were ranked. It may not have been perfect, but in many ways it was an improvement over how other search engines were ranking pages. I’m not sure that it’s going away anytime soon, but we are definitely going to see Google add more signals in how they rank pages.

    I’ve been getting a kick out of the following paper for a few years:

    Online Reputation Systems: The Cost of Attack of PageRank

    I think the author overestimates how effective PageRank is as a ranking signal that’s difficult to manipulate, but the thesis that search engines will evolve in ways that do make them harder for people to manipulate is a good one.

  13. Hi Jessica

    That sounds like a good topic for a blog post. I’ve answered a lot of questions like the ones that you’re asking in comments, but it would be nice to have a page to point people towards when I do get questions about the patent process.

    One thing that I do see with search engines and patents is that while the search engines often share common objectives, they often find different ways to do sometimes very similar things. Part 3 of this series on how search engines might segment content on web pages includes links to patent from Microsoft, Google, and Yahoo that describe different approaches with similar results.

  14. Hi Claude,

    I do think we will see PageRank around for a while, and it’s likely that Google is also using it in other ways, such as using it as an importance metric when they decide which pages to crawl and recrawl, but Google is definitely working on other ranking signals as well.

  15. Hi Nicolas,

    I agree completely. Anyone first starting learning about SEO should definitely learn as much about PageRank as possible. The provisional patent application is much more readable and easy to understand than the granted patents and possibly even the early papers on PageRank.

  16. Hi Keith,

    I think it’s helpful for people learning about SEO not only to learn about it because it’s still being used, but also because it’s a great example of an algorithm that had a real impact on the way that pages have been ranked for years. If you have a sense of how PageRank works, and even a little bit about how it has evolved over the years, it can give you some ideas on how to understand other ranking algorithms.

  17. Hi Nicomp,

    Yes, I posted a link to that paper in the post, but at the Stanford page:

    In addition to reading patents about PageRank, it’s also worth looking at some of the early papers about it as well, such as the one penned by Google Founders Page and Brin, The Anatomy of a Large-Scale Hypertextual Web Search Engine, and The PageRank Citation Ranking: Bringing Order to the Web.

    The Pagerank Citation paper is also definitely worth a good look as well.

  18. Pingback: Wordpress - The Future of SEO

Comments are closed.