Systems and methods consistent with the principles of the invention may provide a reasonable surfer model that indicates that when a surfer accesses a document with a set of links, the surfer will follow some of the links with higher probability than others. This reasonable surfer model reflects the fact that not all of the links associated with a document are equally likely to be followed. Examples of unlikely followed links may include “Terms of Service” links, banner advertisements, and links unrelated to the document.
Google’s original PageRank algorithm is based upon what its inventor referred to as the Random Surfer model, where it ranked pages on the Web based upon a probability that a person following links at random on the Web might end up upon a particular page:
The rank of a page can be interpreted as the probability that a surfer will be at the page after following a large number of forward links. The constant α in the formula is interpreted as the probability that the web surfer will jump randomly to any web page instead of following a forward link.
Years later, some search engineers at Google came out with a newer patent based upon something referred to as the Reasonable Surfer model, which looked at different probabilities involving the likelihood that a person might click upon certain links, and that those probabilities could determine how likely it might be that someone might click upon links to specific pages on the web, and end up at one of those pages.
I wrote about this patent in a post from 2010 which I titled, Google’s Reasonable Surfer: How The Value Of A Link May Differ Based Upon Link And Document Features And User Data
Patents do sometimes get updated by the people who originally file them. These updates often take the shape of changes to the claims within the patents.
These changes may reflect a change in the way that the processes described within the patent operate.
It’s the claims section that is changed when one of these continuation patents is filed, because patent examiners from the patent office look at the claims, and compare those to claims from other patents to make sure that the new claims don’t copy other granted patents, and could be said to infringe those patents.
A continuation patent is called that because it “continues” the protection given by the original version of the patent and is given a date of coverage that begins with the original filing date of the original version of the patent.
The continuation patent is:
Ranking documents based on user behavior and/or feature data
Inventors: Jeffrey A. Dean, Corin Anderson, and Alexis Battle
Assigned to: Google
US Patent 9,305,099
Granted April 5, 2016
Filed: January 10, 2012
A system generates a model based on feature data relating to different features of a link from a linking document to a linked document and user behavior data relating to navigational actions associated with the link. The system also assigns a rank to a document based on the model.
As I pointed out in my original post about the Reasonable Surfer patent, it changes the amount of PageRank that might flow through a link based upon different features associated with a link. If a link is in the main content area of a page, uses a font and color that might make it stand out, and uses text that may make it something likely that someone might click upon it, then it could pass along a fair amount of PageRank. On the otherhand, if it combines features that make it less likely to be clicked upon, such as being in the footer of a page, in the same color text as the rest of the text on that page, and the same font type, and uses anchor text that doesn’t interest people, it may not pass along a lot of PageRank.
So, how has the Claims for this patent changed, changing the Reasonable surfer model?
I’m seeing it refer to anchor text in those claims more frequently, and how much weight might be passed along based upon the probability that people might click upon a link. Here is some language that stands out to me, from the first new claim in the patent:
… a rank for a particular document, generating the rank including: determining particular feature data associated with a link to the particular document, the particular feature data identifying one or more attributes of the link, determining a weight indicating a probability of the link being selected, the weight being determined based on the particular feature data and selection data, the selection data identifying user behavior relating to links to other documents …the weight indicating a higher probability of the link being selected when the particular feature data corresponds to feature data associated with the one or more links than when the particular feature data corresponds to feature data associated with the one or more other links…words in anchor text associated with the links, and a quantity of the words in the anchor text
The claims in the original version of Ranking documents based on user behavior and/or feature data are different, and these newer claims seem to emphasize more that the weight that is passed along by links seems to be based upon the probability that people will click upon a link found upon a page.
It’s no longer a “random” probability, but now seems to be even more “reasonable” than it was even in the first version of the reasonable surfer patent.