10 Most Important SEO Patents: Part 4 – PageRank Meets the Reasonable Surfer

PageRank is a measure that stands for a probability that if someone starts out any page on the Web, and randomly clicks on links they find on pages, or gets bored every so often and teleports (yes, that is official technical search engineer jargon) to a random page, that eventually they will end up at a specific page.

Larry Page referred to this person clicking on links as a “random surfer.” Thing is, most people aren’t so random. It’s not like we’re standing at some street corner somewhere, and just randomly set off in some direction. (OK, I confess that I do sometimes do just that, especially when faced with a sign like that below.)

A street corner in The Plains, Virginia, with a sign showing distances to many other cities near and far.

Imagine someone from Google waking up in the middle of the night, with the thought, “Hmmmm. Maybe we’re not quite doing PageRank quite right. Maybe we should be doing things like paying attention to where links appear on a page, and other things as well.”

That’s the scenario I envisioned when reading the Google patent Ranking documents based on user behavior and/or feature data, which took away some of the randomness, and introduced us to a different model of surfer – the reasonable surfer.

Back in 2008, when Yahoo had their own search engine, Yahoo’s Priyank Garg told Eric Enge in an interview about how Yahoo treated some links:

The irrelevant links at the bottom of a page, which will not be as valuable for a user, don’t add to the quality of the user experience, so we don’t account for those in our ranking. All of those links might still be useful for crawl discovery, but they won’t support the ranking.

Was Google doing the same thing?

In a 2009 blog post on PageRank Sculpting, Google’s Matt Cutts added the following Disclaimer:

Disclaimer: Even when I joined the company in 2000, Google was doing more sophisticated link computation than you would observe from the classic PageRank papers. If you believe that Google stopped innovating in link analysis, that’s a flawed assumption.

Although we still refer to it as PageRank, Google’s ability to compute reputation based on links has advanced considerably over the years. I’ll do the rest of my blog post in the framework of “classic PageRank” but bear in mind that it’s not a perfect analogy.

So imagine that instead of Google giving every link it found on a page the same amount of PageRank to distribute, it gave different amounts of PageRank through each link after a detailed analysis, looking at a range of features associated with each link.

The patent behind the Reasonable Surfer model, which I wrote a detailed post about in Google’s Reasonable Surfer: How the Value of a Link May Differ Based upon Link and Document Features and User Data, doesn’t just look at the location of a link on a page to gauge how much PageRank to pass along.

The Reasonable Surfer model doesn’t just look at how emphasized the text of a link might be in relation to text around it to determine whether to boost the amount of PageRank to pass through that link, whether the link is in a different color, or different font family, or is larger or bolder or underlined or decorated in some other way.

The Reasonable Surfer model might also look at how many words there are associated with a link, what those words are themselves, how commercial the words might be, and many other features as well.

So if a link appears near the top of a main content area on a page about a pie eating contest at the local county fair, and it uses the anchor text “cheap nfl jerseys” in bold letters, the algorithm behind the Reasonable Surfer model might determine that even though the link is prominently placed and stands out from the rest of the text in an important part of a page, the text of the link has nothing to do with the content of the rest of the page, and that text evidences a very commercial intent.

And it’s reasonable that most people who visited the page to learn about things to do at the county fair aren’t going to click upon that link. Therefore, it really shouldn’t pass along very much PageRank.

Why did I choose this particular patent as one of the 10 most important SEO patents?

One of the reasons is that it’s a great illustration of how an algorithm might be modified when the assumptions and models that support it are changed over time, with the experience and hindsight that running a search engine might bring.

Another is that it’s been fairly obvious for a few years that Google hadn’t been passing along the same amount of PageRank for different links on the same page, and that we had statements like the one above from Matt Cutts that PageRank had evolved, even in its early days, but we didn’t have anything we could point to from Google itself about how the search engine might have been calculating PageRank differently.

This Reasonable Surfer patent was filed way back in 2004, but it didn’t become publicly accessible until it was granted in 2010. While I was reading it, I kept on saying to myself, “Yeah, that makes sense. It explains a lot of things.”

There are a number of valid criticisms that could be made of the original PageRank algorithm, including the Random Surfer model not being an example of how people actually use the Web.

To put it as succinctly as possible, the Reasonable Surfer model changed that by looking at a combination of factors that might help determine which link or links on a page someone was most likely to follow, and passing along the most PageRank through those links. Again, it’s likely that Google has continued to evolve how PageRank works, but it seems much more reasonable now then it did before.

All parts of the 10 Most Important SEO Patents series:

Part 1 – The Original PageRank Patent Application
Part 2 – The Original Historical Data Patent Filing and its Children
Part 3 – Classifying Web Blocks with Linguistic Features
Part 4 – PageRank Meets the Reasonable Surfer
Part 5 – Phrase Based Indexing
Part 6 – Named Entity Detection in Queries
Part 7 – Sets, Semantic Closeness, Segmentation, and Webtables
Part 8 – Assigning Geographic Relevance to Web Pages
Part 9 – From Ten Blue Links to Blended and Universal Search
Part 10 – Just the Beginning

Share

37 thoughts on “10 Most Important SEO Patents: Part 4 – PageRank Meets the Reasonable Surfer”

  1. Good post, also it would be intersting to see what Google will do as their pageRank patent expires soon.

  2. I often wonder if they’ll ever (or already have) move on from a reasonable-surfer model to an informed-surfer one. The ability to get rick-rolled shows that it’s easy to trick a reasonable surfer. But an informed surfer, who knew metrics about the page being linked to, would tend get to what they want far more quickly as they would rarely end up on irrelevant pages.

  3. I think this also seems to be a trend where links are placed on site. It only makes sense that the higher a link is placed on the site the more weight it holds. That’s why we are told to put the most important keywords at the top of the page to show what’s really important about that specific page. Have the most important links at the top only makes sense.

  4. Once SEOs understood classification algortihm used by Google, and the weight of quality links in this ranking, they started to add links to important sites, which bias the algo. This is actually one of the first adaptations to counter the “spam” I think.

  5. Great post! Although I can’t keep up much on the recent stuff. But I’m amazed that you could gather this much information. So far your the best information source I have on the recent things. Thank you Bill.

  6. I think link placement do carry weightage but then again the link that is informative and relevant , is what a user is looking for and related to the page itself, cant be ignored irrespective of the lower placement on the page. Eventually every surfer on the web is looking for the most accurate results against their search.
    Keeping in mind there can be decisive elements that signifies the value of any link.

  7. There is no doubt for me that Google treats links on different positions on the page, differently. I am sure it is very complex. Also I know that the first few links are more important too. We can see this even with internal links. First few flow more PR. The basic and simple PR algo i long gone :).

  8. Very interesting post Bill. Explaining page rank to people has never been the easiest of tasks however, this post along with other posts provided by Google and other people in the know have made it easier over the last few years. One thing that’s always been key is relevance.

  9. Hi Wais.

    The interesting thing is that Google isn’t the owner of the PageRank patent. It’s owned by Stanford University, though Google had an exclusive license to use the technology in the patent that was set to expire in 2011. I don’t know if Google and Stanford extended that in any way, but it’s possible that they may have. The patent itself still has a few more years left to it before it does expire.

  10. Hi Malcolm,

    Interesting point. Under the Reasonable Surfer model, one of the features that patent looks at is how much of a match between the anchor text pointing to a page and the actual content of that pages matches up. So a link with the text “Newest Android Features” that points to a page about “fishing in Costa Rica” is going to fail under that analysis. So there is something of a “informed” surfer aspect to the patent.

    But I understand what you mean about Google having access to a lot more information and computing capacity now then they did when this patent was written, even to the point where they could track how often people clicking on a particular link on a page immediately left that page. But rather than just relying upon user behavior signals like that, they could even aggregate information about features found on pages from links like that where someone usually leaves very quickly, and determine a linking weight based upon those features (sort of like a Panda approach to link weights).

  11. Hi Sam,

    I understand how its tempting to think that, and it’s something that people have been saying for years. But, with the page segmentation patent and this reasonable surfer patent, that’s not necessarily true.

    This patent is telling us that there are many more things that Google will look at than just the location of a link on a page, and that sometimes when a link is highly placed on a page that other features might make it so that it passes along less weight than other links on the same page. For instance, if the anchor text has little or nothing to do with the content of the page that it is on, and the content of the page that it points to, that might not be good. If the link is virtually indestinquishable from the text around it such as the same color and font style and font emphasis and no text decoration such as an underline, so that it’s almost hidden, that isn’t sending a great signal to the search engine either, and it might be devalued because of that. It’s the total mix of the different possible features that is going to determine how much weight a link might pass along, and not just one.

  12. Hi Nicolas,

    I agree. This was an intelligent adaption to a number of problems involving some possible abuses of the PageRank Algorithm. Hard to say it was one of the first, because we just don’t have that much access to what goes on behind the scenes at Google. But you can see some of the abuses that they’ve addressed with it if you look at the kinds of features it considers, such as giving less weight to links in text the same color as the background of the segment they are in, links placed in footers of pages to try to boost the pageranks of those pages, and many others.

  13. Hi Parvesh,

    Exactly, which is why the patent doesn’t rely just upon any one of the features, but considers them altogether. Some of the features though can be decisive on their own, such as a link being the same color as the background it’s on. That seems to be a clear sign that someone is using the link just to try to boost the PageRank of the page it is pointed towards.

  14. Hi Nikolay,

    The fairly simple rule of thumb I try to keep in mind when thinking about this patent is that the link that people are most likely to click upon when they look at a page stands a good chance of being the link that passes along the most PageRank on that page. Maybe not true all the time, but maybe most of the time.

  15. Hi Gavin,

    One of the trickiest parts though is that PageRank is really independent of “relevance” or at least relevance to the query that someone might find a page for. It’s calculated independently of a query. Now the relevance of anchor text used in a link to the text on the rest of the page, or on the text of the page being linked to is another matter, and that could play a role in how much weight a link could pass along. :)

  16. Interesting stuff Bill. I had no idea that Yahoo disregarded links at the bottom of the page but that makes sense since usually those links are garbage.

    Also I have no doubt that Google is on top of things regarding the reasonable surfer. Matt Cutts is an SEO wizard of Oz. It makes sense that a link at the top of content placed prominently would not pass on much page rank as other links on the page if it was a commercial link not relating to the content.

  17. Bill, this is great stuff, it does seem like something that many people calling themselves “SEO Specialists” aren’t all to aware of.

    It makes me wonder how Bing are doing with this sort of thing because whenever I search in Bing the results seem to be poorly related in comparison to the results I get with Google, I wonder if you or anyone else has experienced the same.

  18. Very interesting as usual Bill. Not applying the same value to all links on a page does indeed make sense.

    Personally, this is why I feel that link building via guest posting is so beneficial as opposed to outright link spamming which seems to be what many SEOs do.

    The best links, therefore, will always be contextual and related to the theme of the page.

    Makes sense to me.

    Mark

  19. @Bill Slawski
    Yes, thanks for that different point of view. I am not sure if making it bigger with CSS will make it give more value, but generally you are right. In the future I guess only links that are being clicked will matter. Of course Google will need to track that somehow.

  20. I don’t think the location of the link should matter as much as the relevancy. I really hate those posts that have such completely obvious backlinks like “NFL jerseys” on a blog about travel or dieting. Why do people still do this?

  21. Hi Tim,

    It’s great when someone from one of the search engines points out things like how they might treat links at the bottoms of pages. Those can often be useful to people who just finished reading a page in providing them with a place to go once they’ve found the information they may have been looking for, but it does make sense for a search engine to often not give those links much weight, if any at all.

    The reasonable surfer model does seem to fit its name well, as being “reasonable.” :)

  22. Hi Adam,

    Thank you. There do seem to be a lot of people who working at SEO who may be doing what they are doing without an awareness of how a reasonable surfer approach might impact what they are doing.

    I suspect that Bing does do something similar as well, based upon the papers that they’ve been releasing over the years like their whitepaper on block level link analysis (pdf). It’s likely that they are at least paying attention to where links appear on a page, and likely that they are looking at some other similar signals described in the reasonable surfer patent.

  23. Hi Mark,

    There are a lot of people who do try to use comment link spam to manipulate search results when they could instead be getting much better benefits out of creating their own content on their own pages that provides value to people.

    I like the idea of guest blog posts, but have seen a number of people abuse that too much. I’ve received offers for free “guest” posts from people I’ve never heard of, who haven’t even tried to build relationships with me in any way, and who seem intent to use the platform I’ve built to create link spam to clients who are paying them for those links. I really don’t want anything to do with that. :(

  24. Hi Nikolay,

    Google has made it increasingly easier for themselves to track information like that with their own browser, with the Google Toolbar, with logged in personalized search, their own bookmarking service, people sharing pages on Google Plus, and more. :)

  25. Hi Julie,

    Google considers a wide range of factors in the reasonable surfer approach, so page location plus relevance plus features about the link itself and other factors are considered altogether. A very relevant link at the very bottom of the page might be less likely to be clicked upon by a visitor than a link with much more neutral anchor text near the top of a page, for instance.

    Not sure why people do things like add a link to “nfl jerseys” or “casino adventures” to an article or blog post that has nothing to do with those, except maybe they think they can get away with it, at least for a short period of time.

  26. Hi Pavel,

    Back in the day, PageRank would only change about once every 4-5 weeks, but since then Google calculates it and recalculates it much much faster. I probably need to point out here that the toolbar pagerank that you see for pages isn’t really a very good indicator of the PageRank of a page at any one point in time, because it’s only updated a few times a year and only tells you of the PageRank that a page might have been at during some point of time in the past.

    As for “the sandbox,” there is no such thing.

    A number of people on SEO and Webmaster forums hypothesized about a “sandbox” which new site would be put into until such time as they were trusted by Google, and then those sites would start to rank in search results. Sometimes these pages initially started out ranking well, and then disappeared only to re-emerge many months later.

    Matt Cutts of Google did concede the point that Google was doing something algorithmically that could look like the existence of a “samdbox,” but that Google never set out to actually create a sandbox like the one described in the forums that I referred to.

    The truth is that many internet websites begin like many bricks and mortar businesses with a “cold start,” in that there aren’t any links to them, there isn’t any word of month about them, and they are virtually unknown. Sites that contain topically relevant information, and that build and attract links on a steady and regular basis can avoid any type of “sandbox” type effects.

  27. Super bit of research. I certainly would not have the time to go into such detail although I do try to keep up to speed with any SEO issues through blogs and forums.

    The one thing I would say, surely to anyone genuinely interested in SEO the most important things to concentrate on are content and relevant links.

    If you get links from relevant sites then the position and font of anchor text should not be a worry. You will still get good link juice if the site from which the link is passed is relevant and not stuffed with outbound links

    Would be interested to hear what anyone thinks of my comment

    CC

  28. Hi Colin,

    Thank you. I’ve found over the years that if I try to learn about something, and I attempt to put it into words that other people might understand, I tend to grasp a lot more of the concepts behind it more quickly and it stays with me longer. I also have something that I can return to to refresh my memory as well.:)

    I’m not convinced that you absolutely need links from “relevant” sites or what that means exactly. If I get a link from a blog that write about small business issues, I don’t think that will count less than if I get a link from a site that writes about SEO. Yet the small business site may be “less” relevant to my site.

    If the links from either are in their footers and look exactly the same as the non-link text around them, I wouldn’t expect as much PageRank to flow through those links than some other links on the same pages that might for instance be in the main content area of those pages, and that stand positively out in some way.

  29. Hi Bill

    What I mean by links from relevant pages is links from a site on the same or related topic.

    For example if you had a site about remote control helicopters an inbound link from a model shop would carry greater weight than a link from a site about BBQ’s

    I have no scientific backup to my statement just what I think I have read on various forums.

    Colin

  30. Hi Colin,

    Ok, so if a blog that writes about small business, and they post about promoting a business website, and link to a post I’ve written about marketing online for small businesses, that would be the kind of thing that you’re writing about.

    I think it might help, and there’s some language in the reasonable surfer patent that sounds similar to that. But I do think that PageRank will still pass along if the anchor text is somewhat neutral and there might not be a tremendous amount of similarity between topics of either page.

  31. Hey Bill,

    This is a very interesting post about page rank and links. I wonder how this is translated to a page that specifically set up in content blocks (more like a wiki). It would seem to me that according to this logic, the first prominent link in each block, related to the topic, would be the “most relevant” link – or would it be the header of the content block itself that would get most of the glory?

    In any case, using unrelated anchor text has been “bad practice” for a while now, it’s interesting to see this translated into page rank.

    ~Duru

  32. Hi Shira,

    I think the analysis would be more complex than just that, but it’s worth spending some time thinking about.

    There are a number of whitepapers from Microsoft on visual segmentation of blocks on pages, and some of those do things like analyze what the most important segment or block of a page might be based on a few different factors. For instance, a newspaper web site front page might cover a number of different topics, and each of those might be considered independently, but Microsoft might still consider one block on that page more important than others.

    Google might have an approach that is unique for wiki pages as well, but it might not be as simple as just determining that the first link in each paragraph or section of a wiki post is the most important. The reasonable surfer approach includes a number of features beyond just the location of a link on a page.

  33. Bill and all that have contributed to the excellent write ups and commentaries.

    This is great stuff. Yes I believe Google is collecting more user data through Chrome and I would believe they are implementing most of the features defined in the “reasonable surfer” SEO model making searches more relevant compared to other search engines such as Bing and Yahoo!.

    It is noticed that Chrome renders search results faster than running Google search from other browsers. Would this be because Chrome is now doing a sufficient amount of search processing on the Client-side instead of the Server-side?

    From Google’s whole of SEO business model perspective, unlike when Google didn’t have its own browser all of the processing now would have been done on Google’s data centers on the server-side. This would have freed up processing time on Google’s servers as well as spread some of the fixed cost of purchasing electricity to fuel the processing power back to users of Google’s services such as Search, Youtube, Email and etc… Could this be true as well?

Comments are closed.