Google Granted Patent on Brin Rank

Note: This is an April Fools Day post. The post is a play on the fact that the Google PageRank algorithm was named after Google founder Larry Page, and that there isn’t an equivalent algorithm from Google named after Google co-founder Sergey Brin. With the exception of the link to the “Brin Rank” patent below, all of the links in this post are legitimate, and the post speculates upon what a ranking algorithm from Sergey Brin might look like based upon his history of research, and the increasing use of user-behavior data that Google appears to be looking at based upon the whitepapers and patents that they have published in the past few years.

Google finds itself in an interesting predicament with one of the core aspects of its search technology, PageRank, falling out of exclusive control later this year. Fortunately, Google was granted a new patent this week that looks like it contains a substitute that overcomes some of the weaknesses of Lawrence Page’s search innovation of the 90s.

PageRank was predicated on the assumption that the existence of links between pages on the Web was a signal that could be used to sort and rank pages on the Web, scoring pages on an importance scale based upon the links they received from “important” pages. An “important” page is one that has links to it from other important pages. As inventor Larry Page noted in the first PageRank Patent, Improved Text Searching in Hypertext Systems:

The reasons why my system works so well, is that it decides which documents to return, and in what order, by using an approximation to how well cited, or “important” the matching documents are. I will call this aproximation to importance PageRank from now on. Web pages get a higher PageRank from being mentioned on other pages. But, the PageRank a page gains from a citation is based on the PageRank of the page that cites it. This definition may sound circular because it is in fact circular.

No More Random Surfers

The original PageRank algorithm was based upon the movements of a “Random Surfer” who might visit a page and randomly follow any link on that page to another, with a 15% chance that he or she might just type in a new address in his or her browser and go somewhere else. The PageRank for a page is a statistical probability that if someone starts anywhere on the Web, and follows links in a random surfer style, they may end up at another specific page.

But, as Google’s Webspam head, Matt Cutts, noted in a blog post on PageRank Sculpting, that Random Surfer Model was in Jeopardy even from the early days of the company:

Disclaimer: Even when I joined the company in 2000, Google was doing more sophisticated link computation than you would observe from the classic PageRank papers. If you believe that Google stopped innovating in link analysis, that’s a flawed assumption. Although we still refer to it as PageRank, Google’s ability to compute reputation based on links has advanced considerably over the years.

By 2004, the Random Surfer was likely unemployed, with a Reasonable Surfer taking his place, as described in a Google patent titled Ranking documents based on user behavior and/or feature data. The patent was granted in May of 2010, and I wrote about it in Google’s Reasonable Surfer: How the Value of a Link May Differ Based upon Link and Document Features and User Data.

As we’re told in that patent:

Systems and methods consistent with the principles of the invention may provide a reasonable surfer model that indicates that when a surfer accesses a document with a set of links, the surfer will follow some of the links with higher probability than others.

This reasonable surfer model reflects the fact that not all of the links associated with a document are equally likely to be followed. Examples of unlikely followed links may include “Terms of Service” links, banner advertisements, and links unrelated to the document.

Kicking the Surfers Out for Good

Sergey Brin, somewhat uncomfortable with the surfer metaphor from the start, began pursuing topics that focused upon data mining and information extraction. His 1999 work, Extracting Patterns and Relations from the World Wide Web looked to extract structured data from unstructured web pages, and draw useful relationships from those pages, and a way to find meaningful data out of the chaos of the Web. The process he developed is known as DIPRE – Dual Iterative Pattern Relation Expansion, and it is part of the fruit that bore his new Brin Rank algorithm.

Another key element of Brin Rank was exposed in a work that Brin co-authored, Beyond market baskets: generalizing association rules to correlations, which defined rules about data, with a footnote example providing a pretty clear encapsulation of the concept:

A Classic Example is the rule that people who buy diapers in the afternoon are particularly likely to buy beer at the same time.

Brin’s patent Information extraction from a database (US 6,678,684) filed on March 9, 2000 and granted on January 13, 2004 points to research that further led to his Brin Rank innovation. The abstract tells us:

Techniques for extracting information from a database are provided. A database such as the Web is searched for occurrences of tuples of information. The occurrences of the tuples of information that were found in the database are analyzed to identify a pattern in which the tuples of information were stored. Additional tuples of information can then be extracted from the database utilizing the pattern. This process can be repeated with the additional tuples of information, if desired.

The turning point where this type of data mining seems to have become useful was in associating the mining of information from pages on the web, and the creation of association rules about that data, with a look at information collected about people’s actual use of that data through query logs and query sessions, and browsing information collected about users web histories.

Brin’s research noted, for instance, that people searching for science fiction novels in the morning, also tend to look up information related to their stock portfolios around the same time.

The patent is:

Extracting and Associating Informational Nodes in a Large Scale Database
Invented by Sergey Brin
Assigned to Google
US Patent 75,008,681
Granted April 1, 2011
Filed: April 1, 2007

Abstract

Techniques for extracting and associating information from a large scale database are provided. Occurrences of tuples of information are explored in an index of the Web, and within searching and browsing patterns of web users to identify associative rules and identify authoritative pages on the web for that information. The strength of relationships identified by these rules can be used to develop a score for pages in response to a query, referred hereinafter as Brin Rank.

Conclusion

The patent provides a very detailed look at how Brin Rank is calculated, and how it improves upon ranking documents on the web based upon associative rules on how people behave on the web when searching for different types of information.

It has the benefit of being owned completely by Google, and not subject to an expiring exclusive license like PageRank.

No surfers were involved in the conceptualization of Brin Rank.

Added 3:18, 2011/4/1 If you clicked on the link above to the patent, you’ve probably noticed that you arrive back at this page. While there isn’t an actual Brin Rank patent, I’ve been wondering for a while what Sergey Brin may have come up with if it were his algorithm at the heart of Google rather than PageRank.

PageRank was an algorithm that provided much more relevant results than its competitors back when it was introduced, and it’s likely evolved considerably since those early days. Google has been looking at ways to rerank results, and adding more signals to their ranking algorithm than PageRank, and at present PageRank is only one amongst hundreds of signals that the search engines use.

It’s clear that Google has been spending significantly more time looking at user-behavior signals, and some of the things that I alluded to in Brin Rank quite possibly play a role in how Google ranks pages today.

I hope you enjoyed this April 1st post, and I thank everyone who passed it along.

Share

13 thoughts on “Google Granted Patent on Brin Rank”

  1. Thanks for the clear explanation. Also it is amazing the pagerank patent is running out. Time flies. While it does seem like an innovation to be smarter about the value of the different links on the page it hardly seems like a huge innovation. Wouldn’t it make more sense to say the impact of the original patent expiring is muted by the hundreds/thousands of small modifications in the ranking algorithm than just BrinRank (which is a cool name – and if it could be adopted could remind people what Page in PageRank stands for).

  2. Hi John,

    Thanks for your comment. More Universities seem to be following the model of engaging in public/private partnerships with professors and students to research and develop an actual business behind their research. Google’s exclusive license may be running out, but the PageRank of today is likely vastly different than the one developed in the late 90s. They have made a considerable number of changes.

    Google has definitely expanded beyond PageRank in a number of ways, and have a few patents out there that aren’t that different from what I described as a “Brin Rank” patent. I suspect that the amount of user-based search and browsing data Google has collected over the years dwarfs the amount of data that they’ve collected about pages they’ve found on the Web. To a degree, the kinds of “associations” that I wrote about can impact the search refinement suggestions that you see, customizations that can happen based upon previous queries, and possibly in a number of other ways as well.

  3. Hi Santa Cruz Seo,

    Yes, this post is an April Fools Joke. I don’t trust much of what I read today either. That’s part of the reason why I like looking at patents and whitepapers from the search engines, themselves.

  4. Hi Stephen,

    I believe both Amit Singal and Matt Cutts from Google have noted a few times publicly over the past few years that Google updates their core search algorithm roughly around 400 times a year, so it’s something they do regularly. Sometimes those changes are small ones that might only influence a very small percentage of searches, and sometimes they are much larger ones, like the recent Panda update which we were told from Google would impact roughly 12% of all queries.

    I do write a lot about patents and whitepapers that describe how Google might change (or might have changed in the past), and those point to a lot of different choices that the search engine may make in determining what searchers see in response to queries. If you look through my archives, you’ll see a lot of posts on changes to Google’s algorithms.

  5. Hi Nicolas,

    I wonder sometimes how many people know that PageRank is named after Lawrence Page, rather than the ranking of web “pages.” I suspect that it’s one of the most well known computer algorithms in the world, even if many people don’t know exactly what it does. I also suspect that most people don’t realize that it isn’t owned by Google, but rather by Stanford University.

  6. Given that it is already past April 1, it is nice that you posted the “Note: This is an April Fools Day post.” at the top of the post.. You would have duped me if it wasn’t there given how my a-store was affected by Google’s algorithm changes this past months ;)

  7. Hilarious Bill, you crack me up. Sergei finally gets his props. Surfer never did it for me either – doubtless he’d prefer to characterize online manoeuvres as something a little less jock sounding.

  8. Hi Sonoran,

    Chances are that some of the algorithms that have been put into place by Google use information extraction, user behavior data, and data association approaches like I’ve briefly described above.

  9. Hi Matthew,

    Sergey is actually somewhat athletic, from what I’ve read. He does a lot of swimming and diving. But surfing is a somewhat odd analogy for browsing pages on the Web. I’m guessing that it was around before the early PageRank patents and papers, but I’m wondering how that term first originated.

Comments are closed.