How Google Might Index Link Behavior Information

Under a conventional approach to indexing links by a search engine, information about the targeted address that a link is pointed towards might be included in a search engine’s index, as well as the anchor text displayed within the links, and possibly even some text near the link itself. The Google Reasonable Surfer model points to the possibility of other information being collected about a link as well, which could be taken together as a whole to calculate how much value or weight might be passed along by the link to another page under a PageRank link analysis model or even in determining how much weight the anchor text used to point to a link might carry.

The question, Just How Smart are Search Engine Robots has been asked with more frequency lately, and a pending patent application published by Google shows how the search engine might be collecting a whole different type of link behavior information about links that are found on the Web. Given Google’s move towards building their own Chrome Browser and providing access to web pages via alternative screens such as those on smart phones and other handheld devices and television screens, it makes sense for the search engine to capture this kind of information as well. The image from the patent filing below shows sections of links, including target and onclick attributes that the search engine might now be indexing.

A screenshot from Google Maps showing an information box over the map that appears after clicking upon a link in the column to the left.

When we think about how search engines index content on the Web, it’s usually in the form of a search engine program crawling and collecting information about pages that seem somewhat static, and not the kinds of changes that might happen on those pages if and when links are clicked upon.

Link behavior information can include such things as:

  • How a link is displayed
  • The location of the link on the page, as a “link placeholder”
  • Whether the selection of a link might launch a new application and/or a new browser window
  • Whether alert messages are generated upon selection of a link
  • Whether a web page (or additional information) associated with a link is opened in an existing browser window by that content into a section or a tab of the browser window, rather than in multiple browser windows

It’s possible that the type of link behaviors supported by a mobile device might be different than the type of link behaviors support by a laptop computer. For example, if you perform a search using a laptop at Google Maps for a particular type of business, and you have a map displayed on the right, with choices of businesses to click upon on the left, clicking on one of the links to businesses on the left might result in a information box displayed over the map showing more information about the businesses, including location, address, and a link to the home page of the business. A phone might not be able to display that information box.

A screenshot from the patent showing sections of HTML anchor links, including both target and onclick attributes that Google might now be indexing.

This link behavior information might be collected in a real time manner as well, capturing context information associated with links, such as:

  • Type of computing device that requested the web page
  • A target address associated with at least one of the one or more link placeholders
  • A placement of at least one of the one or more link placeholders in a graphical user interface associated with the web page
  • A display mode associated with the web page
  • Parsing of the request to generate the context information associated with the computing device

The patent doesn’t tell us if this real time collection of link behavior information is captured through something like a browser or browser add on, but it might be.

So the purpose of collecting this kind of link behavior information is for Google to understand: how a link should be displayed, how the content targeted by a link should be displayed, and what kinds of events might be associated with a link.

The patent application is:

Generating Behavior Information For a Link
Invented by Lori D. Meiskey and Jana S. Urban
US Patent Application 20120084630
Published April 5, 2012
Filed: September 30, 2010

Abstract

A computer-implemented method includes receiving a request for a web page; retrieving information associated with the web page, wherein the information comprises a link and one or more link placeholders associated with the link; determining context information associated with the computing device; generating, based on the context information, behavior information for the link; and populating at least one of the one or more link placeholders with the behavior information.

When you click upon a link, sometimes the result is that you see a popup containing some additional information on that page, or some text being highlighted, or geographical information might be displayed. The patent filing tells us that Google might capture behavior information that “includes java script instructions that are executed upon a selection of a link.”

Takeaways

The inventors on this patent appear to have been members of the Google Place Page Team from a post they were co-authors on Make Google Place Pages Your Business Megaphone, and the patent filing includes a few different screenshots from Google Maps, including a Google Place page. It appears that they might have been tasked with trying to find a way to show Google Map enhancements differently based upon the kind of display device used to show them, whether a desktop computer with one type of browser, or a smartphone with a different version of a browser that might have a more limited means of displaying the results of clicking upon a link.

It’s possible that the approach they came up with may have been put into place to understand link behaviors on sites outside of Google as well, and possibly help to adjust how information from links might be displayed by Google browsers on devices that might not otherwise support and display such information.

The Google Webmaster Guidelines have long included at least one section warning webmasters about the difficulties that the search engine might have in crawling links other than HTML text based links, including java script based links, like the most recent version from those guidelines:

Use a text browser such as Lynx to examine your site, because most search engine spiders see your site much as Lynx would. If fancy features such as JavaScript, cookies, session IDs, frames, DHTML, or Flash keep you from seeing all of your site in a text browser, then search engine spiders may have trouble crawling your site.

There’s been some discussion over the years from Google and other sources about the search engine being able to find and crawl some java script links, and even surface content that’s behind AJAX links as well, but this is the first patent filing from Google that I can recall seeing where they explicitly indicate that they might be tracking and indexing this type of behavior for links on pages.

As I noted at the start of this post, given Google’s movement into providing a browser, and increasingly working upon providing mobile access to the Web, it makes sense for them to pay more attention to the linking behaviors that they see upon webpages, and understand how those links might be displayed, especially in services that they offer, such as the information overlays on Maps in Google Maps search.

It’s not completely clear how this more sophisticated analysis of links might impact search rankings and results, except possibly to make Google’s index more aware of information that might surface upon a page only when a link is clicked, but I think it helps to be aware of this deeper approach from Google of understanding how links work, and the behaviors associated with them.

Share

38 thoughts on “How Google Might Index Link Behavior Information”

  1. Excellent post Bill. Questions: Do you think this might also lead to Google also “peering into” analytics accounts? Also, do you think this explains why a Google engineer I contacted within the last week (on G+) Pierre Far who said, “..Actually we have 3 Googlebot-Mobile (2 for feature phones and one smartphone) and we don’t mind the URL structure (m subdomain or same URL or /m/ directory or separate domain). Hosting both desktop and mobile content on the same URL is the easiest for your users (e.g. no extra latency due to redirects) and easiest for us to understand.”

  2. I always wondered about the various link properties _blank etc, nearly each time I create a link, surely they use it for something but I don’t think it’s that dramatic in terms of signal weight. Typically I add _blank to all external links as I don’t want to lose my audience. I guess Google understands this type of link placement behavior similar to rel external.

  3. I’ve been wondering, if a link is on the same page as “spammy links” – is it possible for that link/site to be associated with the other links/sites on the page, so that off-page SEO is affected?

    I’ve read that Google now provides messages in Google Webmaster Tools if a webmaster is participating in link schemes in which Google doesn’t approve.

  4. Excellent post! Google has almost every SEO and Marketing person around the world watching their websites to see how the new update is going to effect their website and link-building strategies. You’ve mentioned a couple of key points which I will be sure to do some extra research on – But at this stage I have read a lot of Blog post and I must admit, Mostly its all speculation. At the end we’ll just have to wait for the update to launch and see the effects. Great post – I enjoyed the reading experience. Thanks for sharing.

  5. Some days back, Google announced that it is going to update the algorithm of link evaluation. But didn’t disclosed how it does that. This article can be helpful to understand the link evaluation method of Google as Google left us in a dark by not providing the actual way of link evaluation.

  6. Impressive and thought-provoking, Bill. I gather that what we’re talking about here is link “relevance”, or maybe better said, “link context”? If so, it sounds like the most-affected content would be low-quality and/or random in nature. Not sure that would be a bad thing, but I plan to monitor this for future reference. Thanks for all the great data posted here.

  7. Phil has a point, as you guys probably know, Google really concentrates on getting the websites with “high quality websites” higher in the searchresults. What Phil says might be the case…
    But if it hits random content, that wouldn’t be so good.
    So they really need to keep track of what they are doing.

  8. Really good information. We have noticed a few of our sites (clients)that we worked on not using much Anchor text, are doing surprising well. Maybe it’s the link context concept? Click here is surprising strong.

    Time will tell (we hope)

  9. If you thought that anchor text is the only relevant link information you have always been wrong. I wonder why there are still so many SEOs taking that much care about the anchor text even when the link is embedded in highly relevant content. But this one could change the game a bit. I think there will always be a trade-off between a link that should carry more weight and a simple link that wants to sell something. For readers a very relevant link that provides more information is often hidden with non describtive anchor text (like “here” or “for example” etc.) while a simple ad is placed prominently. What link should get more “juice” and weight? To me it’s definitely not the ad link most of the time…

  10. Really nice post, Bill. Even though I’ve been doing this for a while, I’m still amazed about how much data Google is collecting and processing. All the data you describe being collected for a single link – and there are how many billion links? Maybe this is a little bit of a tangent, but that’s what I was thinking while reading this post.

  11. Great post Bill. It makes a lot of sense that Google would now take these signals into account. Since the shift towards more “natural” looking links and in particular, anchor texts, it would seem that these types of signals are the ideal successors.

  12. Actually, there’s been statements made that they do rawl javascript links. Also, there’s talk about them using people browsing with chrome to crawl and index.

    There is also a portuguese ex googler who recently said that by the end of the year Google will be changing link evaluations significantly. I think Google places is already playing a significant role in ranking local websites.

  13. I agree with Mike above a lot of my clients are ranking high with very few back links in compared to their competition.In some of our tests we’ve noticed that just the mention of a domain; mysite.com (not a live link) seems to be helping a great deal.

  14. “Whether the selection of a link might launch a new application and/or a new browser window”
    I wonder if this will be considered the same
    <a href=”somewhere.html” target=”_blank”></a>
    and
    <a href=”somewhere.html” onClick=”window.open(this.href); return false”></a>

    From the screenshots they look different

  15. Hi Stephen,

    Thanks. I don’t think that this is an indication that Google might be using data from individual Google Analytics accounts. They have their own logs of activities of users regarding different queries and browsing activity and websites.

    But yes, it’s possible that a crawling program focused upon mobile might find more interest in link features like these, especially if it means that people visiting some pages might not be able to access information through some URLs if their mobile browsers don’t support features like onclick events.

  16. Hi Dan,

    One of the things that I found really interesting about this particular patent wasn’t so much that Google is working to index these particular features, but rather that they are indexing things like this that we might not expect at all. If a page is filled with links that might not work well on a mobile browser, might it not rank as highly on a mobile search, or might a site (or mobile browser) find alternative ways to display that information? I’m not sure.

  17. Hi Nikki,

    I wouldn’t think that just because a site linked to two different places, and one of those might be a”bad neighborhood” type link, that the other page being linked to would be harmed in some way. That by itself is a very limited set of signals. It would probably take much more than that for that kind of association to be made.

    But Google does index the links it finds on pages, and builds a link graph that describes how links are related to each other on the Web – that’s part of the process behind calculating PageRank, too. And that graphing of links between pages might uncover patterns that might indicate whether some kind of unusual link scheme might be happening, that could possibly be seen as a ring of links between pages that exists only to attempt to manipulate PageRank.

    Google did send out a very large number of warning messages about “unnatural linking patterns” to a lot of webmasters. Fortunately, I didn’t see any of those messages for any of the sites that I’m monitoring in Webmaster Tools.

  18. Hi Anton,

    Thanks. There is a lot of uncertainty now as to how Google might change rankings after a few messages from them that they are going to make changes to the algorithms they use regarding their webmaster guidelines. Exactly what they might be doing, we can’t be too sure of, but some sites have definitely been affected, and theres’s a lot of discussion over at the Google Webmaster Central help forums involving many of the sites.

  19. Hi Richard,

    My post only really describes one small piece of something larger. Google did announce last month that they would be no longer using a link analysis method that they had been using for a few years. Presumably because it was no longer needed or useful or perhaps because they came up with an alternative. A few days ago, they made another algorithm change that has had a fair amount of impact in terms of rankings for a number of sites.

    If you get anything out of this post, hopefully it’s that Google may be looking at more things related to links than we might have been aware of in the past.

  20. Hi Phil,

    Thanks. The thing that perhaps interests me the most about this is the fact that sometimes when you click upon a link or a few links on a page, you aren’t brought to another page, but instead you see more content on that page itself. Is Google concerned about that for indexing purposes? Maybe.

    That kind of content isn’t necessarily always low quality in nature either.

  21. Hi Michael,

    I suspect that if Google goes to index content that might only surface when a link is clicked upon, on the same page as the link, and it changes every time that link is clicked on, Google might not be so keen to index that content. It might consider it to be too transient in nature.

  22. Hi Mike,

    It’s just really hard to draw a conclusion like that without knowing too much about the other signals and features that might be causing those pages to rank well, and without knowing much about the competition that they have for those queries. But it is definitely worth trying to consider everything that might be going on to influence things like that.

  23. Hi Tom,

    Definitely. If the Google Reasonable Surfer patent taught us anything, it’s that there are a lot of different features and aspects of a link that might play a role in how much PageRank and hypertext relevance it might pass along, from font size and color and style to location on a page to how relevant it might be to the page around it, and much more.

    What this patent is telling us is that there might be some other features related to links that may also play some role as well, that we might have been not paying much attention to in the past, and that maybe we should.

  24. Hi John,

    Thanks. It is a little staggering about how much data Google does collect, and even more amazing to me is that they can find ways to use it in meaningful manners. I remember a number of years back at a “meet the crawlers” session, someone asking a Google Rep how they were able to indentify paid links, and the response he received was “we have lots and lots of computers.” They have a lot more than that now.

  25. Hi Sean,

    In part, I think Google needed to start paying more attention to these types of links because some of them might cause problems on mobile browser. Also, some of them trigger the appearance of content on the very same page as well, like when you click on the name of a business in a Google Maps search result page, and the balloon appears over the map showing more about the business whose listing your clicked. Those onclick type additions to content on a page (including many pages outside of Google Maps) seems like it’s information that Google might want to know about both for people using mobile browsers who might not see that information, and to possibly index that additional information that might be triggered by such a click.

  26. Hi Marcos,

    Very good point. Though if we go to the Google Webmaster Guidelines, we still see the following warning:

    Use a text browser such as Lynx to examine your site, because most search engine spiders see your site much as Lynx would. If fancy features such as JavaScript, cookies, session IDs, frames, DHTML, or Flash keep you from seeing all of your site in a text browser, then search engine spiders may have trouble crawling your site.

    We know that Google can see when a piece of javascript contains a “href=”URL” and is willing to crawl that link, and they’ve been doing that for at least a few years now.

    But it’s also likely that Google is triggering javascript events at this point to see what actually happens when they do.

    The statement about Google changing link evaluations significantly before the end of the year is interesting, and I saw that as well. Not quite sure what the implications of that might be.

  27. Hi Vince,

    I’m convinced that this grew out of the issues and problems that people working on Google Maps had when people viewed them through different devices that had different capabilities. But it can apply to a lot of other sites as well, and the patent is written in a way that expresses being able to understand link behavior on the Web where ever it might be found. So I’m not sure that we should limit what we think about it based upon its origin, alone.

  28. Hi Jeremy,

    One of the things that skews that observation though is that every link may carry different amounts of weight in terms of both PageRank and Hypertext relevance. I’ve seen pages rank amazingly well on the strength of single links from very high PageRank pages, for instance. I’d imagine in that instance that links from that particular page (pagerank 9) were probably worth more than many thousands of links from lower PageRank pages. Given that, it’s hard to tell what kind of impact anchor text might have.

  29. Hi CMSbuffet,

    The results of both of those are the same, but there are differences in each approach that might have implications for something like a mobile browser or a desktop browser in some cases even. The first one might cause a problem for a mobile browse that can only have one window open at a time. It might ignore the “target” and go to the page being linked to. The second one might be treated differently by a mobile browser that doesn’t use javascript, and nothing might happen when the link is clicked upon.

    Might that be a sign to Google that a page like that isn’t mobile friendly, and that the page should be ranked lower in search results on mobile sites? I don’t know, but it might be.

  30. Bill, what have you noticed since the Penguin update? I know there are a lot of rumors and speculation out there, but I seem to get a general concensus that Google has dramatically changed link ranking factors in this latest algo change. Thoughts?

  31. Hi Tom,

    I really haven’t noticed anything that might be related to what’s described in this patent application and the penguin update. The most unusual thing that I could potentially think of that this patent filing might impact is that content that might only appear after an onclick event might be included in search results for the same page, if the actual URL for the page doesn’t change.

Comments are closed.