Just What User Behavior Data Does Google Use to Influence Search Rankings?

In a blog post at the Official Google Webmaster Central Blog on Monday, High-quality sites algorithm goes global, incorporates user feedback, Google Fellow Amit Singhal announced some changes to the way that Google ranks web pages, including the spreading of the Panda update to all English language Google users, and the incorporation of data into search results about sites that have been blocked by users in those results.

The announcement also noted that “we’ve also incorporated new user feedback signals to help people find better search results,” but it didn’t provide details on which actual user-behavior signals those might be.

I’ve seen a number of references in the past to information about user behavior data in Google patents, and some descriptions about how that information might be used by Google when they rank pages in search results. I thought I would look through some of them and see what they had to say about how Google might incorporate user behavior data into search. I have no doubt that this list is very incomplete, but I thought it was worth sharing.

In Methods and apparatus for determining equivalent descriptions for an information need, filed in 2002, we learn about an early approach that Google may have followed to try to learn about synonyms for queries. A good part of the process involved looking through query log files from the search engine, and collecting information about people performing those searches:

In a preferred implementation, the query log contains, for each query, information about the user who submitted the query (i.e., a UserID), when the query was submitted (i.e., date and time), and the query itself.

In addition to the foregoing, the query log may also include a list of information that was provided to the user in response, a record of any action taken by the user on the search results (e.g., whether the user clicked on any of the results), as well as other data concerning the query and user behavior.

Google’s 2003 filing, Information retrieval based on historical data reads like a brain dump of ways to learn about web pages by looking at history based data.

If a document is returned for a certain query and over time, or within a given time window, users spend either more or less time on average on the document given the same or similar query, then this may be used as an indication that the document is fresh or stale, respectively.

For example, assume that the query “Riverview swimming schedule” returns a document with the title “Riverview Swimming Schedule.” Assume further that users used to spend 30 seconds accessing it, but now every user that selects the document only spends a few seconds accessing it. Search engine 125 may use this information to determine that the document is stale (i.e., contains an outdated swimming schedule) and score the document accordingly.

Google’s local search may benefit from user behavior data as well, as described in the 2003 Google patent, Methods and systems for improving a search ranking using location awareness

Still referring to FIG. 1, the location component 138 may determine a relationship between a topic and its location sensitivity. For example, location component 138 may analyze the query to determine a keyword, or a query topic.

Furthermore, it may determine the amount or extent to which geographically-based search results are relevant to the topic and a relevant geographic range for the topic, for example, by examining user behavior (e.g., user selection behavior, such as mouseover or click through) of search results 132 presented to the user.

Google’s 2004 patent, Accelerating user interfaces by predicting user actions, aimed at predicting which pages people might select when browsing pages, to help those pages load faster, in part by seeing which pages people tended to hover over with their mouse pointers.

While this doesn’t directly influence search results, I’m including it because another patent below describes how mouse pointer tracking could be used to reorder search results.

In a third embodiment, the predefined criteria for initiating a document request is that the mouse pointer is positioned over and either hovers over a hyperlink for at least a threshold period of time (e.g., a period of at least 100 milliseconds), or a mouse-down on the hyperlink occurs, which ever is first.

This embodiment takes advantage of a common user behavior, which is to do a mouse hover over a hyperlink before clicking on it. In yet other embodiments, other predefined criteria may be used. For instance, the predefined criteria may require a mouse hover, but the hover may be over any region within a predefined proximity of a hyperlink. Further, the predefined criteria may include multiple criteria.

In Google’s 2004 Reasonable Surfer patent, Ranking documents based on user behavior and/or feature data, a considerable amount of data about the ways that users behave on pages may influence how much weight each link on a page may carry.

Repository 430 may also store user behavior data associated with documents. The user behavior data may include, for example, information concerning users who accessed the documents, such as navigational actions (e.g., what links the users selected, addresses entered by the users, forms completed by the users, etc.), the language of the users, interests of the users, query terms entered by the users, etc.

Google may also pay attention to user behavior involving images shown in search results as well, as described in the 2004 patent, System and methods for detecting images distracting to a user

In an embodiment, the behavior analyzer 334 of the search engine 218 monitors the behavior of users in relation to the group of images that are sent to the user in response to a query.

The user behavior can include determining which images users select for further viewing, determining how many selections users make for a particular image for a particular query, determining how many different queries a particular image is displayed for, and determining how many selections a particular image receives over a large number of different queries that are unrelated to each other.

Google’s 2005 mouse pointer tracking patent, System and method for modulating search relevancy using pointer activity monitoring, explains how Google could look at where people place their mouse pointers on search results pages, including hovering over search results to potential re-order those results, and onebox results to determine if those are relevant and helpful to searchers.

In particular, a client assistant residing in a client computer monitors movements of a user controlled pointer in a web browser, e.g., when the pointer moves into a predefined region and when it moves out of the predefined region. A server then determines a relevancy value between an informational item associated with the predefined region and a search query according to the pointer hover period.

When preparing a new search result responsive to a search query, the server re-orders identified informational items in accordance with their respective relevancy values such that more relevant items appear before less relevant ones. The server also uses the relevancy values to determine and/or adjust the content of an one-box result associated with a search query.

The 2005 patent from Google, Determination of a Desired Repository, looks at “triples” of data about users (u), queries (q), and different data repositories (r) to determine what kinds of results to show searchers (aka, Universal Search).

That information can determine whether Google shows searchers web pages, images, news results, local results, and other kinds of results. User information taken from query logs could include such things as IP addresses, cookie information, languages used, prior queries and the time of day or day of week that those queries were provided to the search engine.

System 200 may include one or more devices 210 and a store of log data 220. Store 220 may include one or more logical or physical memory devices that may store a large data set (e.g., millions of instances and hundreds of thousands of features) that may be used, as described in more detail below, to create and train a model.

The data may include log data concerning prior searches, such as user information, query information, and repository information, that may be used to create a model that may be used to identify one or more repositories that may be desired by a user. In one implementation, the model may predict whether a user desires information from a particular repository when the user provides a certain query.

In the 2007 Google patent, Systems and methods for demoting personalized search results based on personal information, we see some other user-behaviors that might result in the reordering of search results during Google’s personalized search.

In some embodiments, information use for profiling a user may include the number of “clicks” or visits by the user to a particular website, webpage, or set of websites during a particular window in time.

Other characteristics of user behavior that can be used for user profiling include one or more of the following: the length of time that a user interacts with the website, the proportion of the website viewed by the user, actions (in addition to clicks) taken by a user while visiting the website (e.g., printing, bookmarking, cutting and pasting, annotating), and a user’s activity subsequent to the interaction with the website.

In the pending 2007 Google patent application, Presentation of Local Results, results shown to the users of mobile devices might be skewed to the presentation of local search results before web results, but other types of search results might be shown to searchers based upon user-behavior signals and the particular query involved.

Various techniques are described for ordering the result sets. For example, result sets may be ordered by a determined correlation between a particular query (including all of a query or part of a query) and a particular group, including by aggregated observations of user behavior in response to receiving query results.

For example, it may be observed that most users who query on “Marilyn Monroe” click an “images” control even if the initial results are provided as web results. Such user behavior may indicate that users associate the query closely with images and thus prefer to have images displayed first. The correlations between search terms (or, for example, portions of search terms) and search results (or portions or other attributes of search results) may be computed by a machine learning system, as described in more detail below.

When you’re provided with query revision suggestions, including spelling corrections, clicks on those suggestions may influence how Google provides information about those revisions, as described in 2010′s pending patent application, Query Revision Using Known Highly-Ranked Queries

In another embodiment, user satisfaction is defined by the quality of the query. In one embodiment, a quality score for a query is estimated from user click behavior data estimating the length of clicks on search results.

Conclusion

There are other patent filings that I could have included in this post on how Google may incorporate user-behavior data into what they do, such as:

  • Advertising based patents that focus upon personalizing ads and targeting specific users
  • Recommendation based patents that could help in recommending products
  • Other local search based patents that determine whether searchers see local results for certain queries and specifics about the results that they are shown
  • Google’s sitelinks patent

Chances are that the amount of information that Google has collected about how people browse the web and how they search for information dwarfs most of the other information collected by the search engine, including the index it has of the Web itself.

Much of this information is collected quietly in the background rather than through explicit actions like users creating profiles that show off their interests or the clicking on things like a +1 button.

There are likely checks and balances in place for these user behavior signals, and I wrote about one on query breadth in my last post, that attempts to mitigate some popularity based user behavior signals when there are potentially a large number of pages that may be very relevant for certain queries.

It’s possible that some user behavior signals may carry different weights based upon a possible reputation scores associated with the Google Accounts of specific users. Some signals might not be counted if they seem to follow certain patterns that may indicate that they are from automated programs or part of a conspiracy formed solely to manipulate search results.

Other user behavior signals might not influence rankings if they don’t meet a certain threshold of activity.

The patent filings that I’ve included above point out a number of possible user behavior signals that Google may be using to influence search results, and it’s likely that a good number of other undocumented signals have been tested as well.

In an API article from earlier today, Google hones search edge to stay sharp, we’re told that Google’s search evaluation team “tested ‘many more than’ 6,000 changes to its search engine in 2010, with 500 of them passing the grade to become permanent.”

If you own a website, hopefully at this point you’re asking yourself, “What am I doing for the users of my pages?”

Share

42 thoughts on “Just What User Behavior Data Does Google Use to Influence Search Rankings?”

  1. Wow, thank you for your summary, Bill. It’s quite difficult to imagine which signals really matters and which not, but I think that designing websites easy to use and filled with good content is always a primary goal to achieve, for users and for SE too, being user behavior harder to manipulate than other signals.
    I have to read all the resources you have linked to, as soon as I can…

    Great article as always :)

    Giuseppe

  2. Warning, this is a rant:

    The more I think about Google moving toward social ranking signals, the more I wonder about the future of their search strategy. I am concerned here because:

    1.) Asking for a +1 recommendation before a user has visited a site is strange. What exactly is the user recommending from the search results, the title tag and meta description?

    2.) The +1 button has been added to paid search ads meaning that sites with Google PPC can gain +1 recommendations more quickly than sites that don’t advertise with Google. To me, muddying the waters between organic and paid listings so that one can now influence the other is a serious change in the ethos of the company. Don’t want to create compelling content for SEO? That’s fine, inflate your social signals with paid search and rank organically as a result. Hmmm.

    3.) Social signals and Adsense don’t mix. Is it just me, or do sites with loads of Adsense ads tend not to be the greatest for garnering user recommendations? I worry about adding Adsense to my sites because of the spam perception, even though the ads are often really well targeted; too many spam sites have abused the public perception of Google ads! I wish that small publishers could have more ad customisation options to make ads feel more unique and integrated to avoid the perception of spam, but only content farms get those premium options (ha!).

    Rant over.

  3. I think that they actually collect far more data than they actually currently use for a very long time with their google toolbar and all their little free services like google reader, analytics etc, and nobody knows when they will use all that data, but one day it will become reality!

  4. I believe that as the algorithm gets more advanced so it gets better at weeding out content that is there to be there or simply for the sake of covering the landscape of words relating to the service/product; so serving no greater purpose to users and supplying no contribution to the body of human knowledge and that content which is novel, engaging and naturally link worthy. I think that the role of the SEO in light of this trend or increase of ability to tell the difference will more and more involve a stewardship of content quality and/or direction for copy/text content suppliers.

  5. I see clickthrough as being a definite factor – especially given recent experience. I’ve watched a very sensationalized search result with absolutely nothing going for it in terms of on site optimization or linking stay near the top for a term that has a lot of SEO’d content around it – and it just sits near the top unfettered. I’m sure it draws clicks due to its content and title, and this has finally made me a believer in clickthroughs for organic search.

  6. Hi Bill,

    This is just brilliant! Exactly what I need to build up the case of what I call PersonRank.
    We have to do our planned video soon. I’ll shoot an email.

  7. very interested to see the evolution of this discussion, my main concern is personal search, if a user’s search is essentially ringfenced by their previous searches it makes it very difficult for new companies to break through

  8. Cool! I wonder if you could outsource the clicking of the +1 button for your own website in organic search results by cheap labor in third world countries using a giant network of varying C-Class IP addresses (or something similar that would work) to synthetically manipulate organic rankings? I say we package it and slap a $97 price tag on it and sell it over at WarriorForum. They would love it!

    Mark

  9. @MJ – well put regarding the ringfence concept; it already feels frustrating. There needs to be a clear option to turn off all personaization, including when logged in – beyond &pws=0 and the current personalisation option.

  10. Pingback: Anonymous
  11. Hi Giuseppe,

    Thanks. Quality and usability are ideal goals to aim at with the construction of the pages of a site. Since the search engines are likely incorporating user behaviors into how they rank pages, any indications that users are having good experiences on pages can potentially help with rankings, and stand a good chance of helping a site owner meet the goals that they have with their pages as well.

  12. Hi Chris,

    Asking for a rating before someone even visits a page is odd. I don’t know if there’s an expectation that people would do that anyway, especially before they looked at another site or two and compared them and how helpful those might be. But if they do that, chances are that they are returning to the same search results.

    I would expect that Google would be able to track when someone added a +1 – before or after they visited a page. Would Google count the +1 after the visit more than the +1 before the visit? We don’t know, but that’s a possibility.

    It bothered me as well that paid advertisements would have +1 buttons on them. Does a vote for an advertisement for a page mean a +1 for the page or just for the ad? I don’t know, but that’s worth checking out.

    I don’t have adsense on this site because I don’t have any control over what they point to and where they lead people. But I also agree with you that they can help create the perception of spam pages, created solely to attract people to click upon ads.

  13. Hi Matthew,

    I’ve always believed that SEO involves building quality content, and helping create the best experience possible for the visitors to a page.

  14. Hi Charlotte,

    There are a lot of signals that the search engines use to determine rankings, and many of them are less obvious than we might believe. Click through rates may be influencing the high ranking that you are seeing for the particular result that you’re writing about but I’ve seen pages rank exceedingly well on the basis of a single link from a very high PageRank page with the right anchor text in the link. It isn’t always easy to see or isolate a signal like that.

  15. Hi MJ,

    The computer programmer who uses Java in their applications looking for a cup of Java while visiting the island of Java might have a tough time finding a cup of coffee. Most searchers really don’t have the breadth of previous searches to impact most of their searches, and the value of their previous search results probably diminish as they get stale.

    New companies on the web share the same cold start issues that new companies offline do – they aren’t well known and they have much more established companies competiting with them. A new company need to be nimble, to find market gaps where they can compete, and think about ways to overcome things like personalization. It may be challenging, but it’s not impossible, and I’d venture to say that it’s easier to do on the web than in the bricks and motar world.

  16. Hi Mark,

    With the +1 tied to a Google Account, it might be easy to discount the value of +1s when it comes to having them count as ranking signals, when those +1 votes come from what appears to be a conspiracy to get something to rank well from very thin Google Accounts. The cost of building many reputable looking Google accounts for each new time you want to vote something up might be more work than building a site that people would want to vote up anyway.

  17. Hi Bill,

    I mean this sincerely – your blog and the quality of its content is evidence of this ethos.

    What I meant to touch on though – is are we moving to an era of SEO/Copy/Content Generation overlap? Surely once the uptake of technical best practices becomes near universal the role of the SEO or an SEO wishing to be viable in the longer term will need to involve direct over site of content quality? If not – maybe – generation too?

  18. You’ve really caught my attention with this. I think that the role of the SEO in light of this trend or increase of ability to tell the difference will involve a stewardship of content. Thanks for the share.

  19. Great article thanks Bill. It would sure be great to have a better understanding of the ‘User Behavior’ signals that Google uses (and intends to use in the future). As you make clear above, at present we can speculate the various signals but don’t have a firm understanding of those signals.

  20. Quote: “Hi Mark,

    With the +1 tied to a Google Account, it might be easy to discount the value of +1s when it comes to having them count as ranking signals, when those +1 votes come from what appears to be a conspiracy to get something to rank well from very thin Google Accounts. The cost of building many reputable looking Google accounts for each new time you want to vote something up might be more work than building a site that people would want to vote up anyway.”

    If that is really the case, then I would be just fine with that, Bill. That means that your competitors would be discouraged from using it to one up you. Excellent post, as always Mr. Slawski…:)

    mark

  21. Hi Matthew,

    Thank you very much.

    I wonder if you gathered together a room full of people who knew a great amount about search and HTML and usability and conversions and analytics if you could get them to agree on “technical best practices.” I suspect that there might be agreement on a number of topics, but still disagreement over many others. I also think it’s important to try new things and innovate some, and a strict adherence to “standards” might limit that kind of effort.

    As for an SEO being involved in content creation, I hope that most already are in some manner. If keyword research is done correctly, it should involve some level of market research, and some amount of finding marketing gaps within an industry that others aren’t taking advantage of for one reason or another. It should point out some possibilities for the creation of new content. SEO should help point out underserved audience members for a site, and help in the creation of new content areas that might address those audience members. SEO should help generate ideas for new pages, new applications, new ways to interact with potential consumers of a site.

  22. Hi Andrew,

    An SEO should at the very least be an important and helpful advisor when it comes to the creation of content for the pages of a site, and in some instances stewartship of the content of the pages of that site might be appropriate. There are a number of things that an SEO can do to help improve the usability and quality of the pages of a site, including helping to teach the owners of that site how to use analytics to make positive changes to the pages of that site.

  23. Hi Gary,

    Thank you, Gary. We may not know exactly what the search engines may be looking at when it comes to user behavior data, but many of the things that we can do on a site to improve the experience of visitors on the pages can have positive payouts regardless of which signals the search engines may be looking at.

    So, if we can convince a visitor to view more pages, print some, bookmark others, fill out some forms, and so on, those things have some merit on their own in spite of what the search engines are doing. If those efforts benefit us in search rankings as well, then that’s all for the better.

  24. Hi Mark,

    I might be on the optimistic side there, thinking about the kinds of things that I would do if I introduced a potential ranking signal like the +1 button. I know that things that are introduced at Google, like the +1 button, go through experimentation and usability testing.

    I also know that anytime a potential ranking signal is introduced, that a lot of thought goes into how it might potentially be manipulated and abused. If that kind of analysis isn’t going on, then search results at Google would deteriorate quickly.

  25. It is no surprise user behavior has more influence in Google results. The recent Panda update seems to support that. With the +1 that google is promoting and all the other social media sites where people interact it just seems it would be a much better measure of how people view or like a site as opposed to spam links and search engine manipulation.
    On thing is clear it seems google places and business directories for real businesses are having more weight in search results.

  26. Hi Mark,

    There’s a good chance that Panda uses little to no user information directly when it does what it does in identifying features from known high quality seed sites to use to rank other sites in an automated fashion.

    The Panda algorithm would use many different instances of observational data about the pages that it ranks to build a prediction model of how people might use a website, how relevant they might find particular pages, and so on. It’s possible that Google may use user information data to compare to the predictions from the Panda process, as a feedback mechanism.

  27. Hi!
    I realize that this article is a few month old but I wanna ask you if you have seen any signs lately that Google has indeed implemented this into their algorithm? I think it seems like it. Would appreciate your take on it!

  28. Hi Thomas,

    I listed a good number of different papers and patents in my post, and chances are that Google has tried one or more of them out. The new “Google Plus Your World” that just came out introduces the social information into Google Plus, when you’re logged into your Google Account, but many of the processes I described above in my post aren’t that transparent, and many of them could be implemented without really seeing much on how they might or might not be impacting search results.

  29. Wow. Great blog post on Google’s user behaviour influencing search rankings. It was a hard read. You lost me at points but your chronological format clearly demonstrated the evolution of search ranking and how algorithm changings are impacting the way we do organic white hat SEO practices.

    Could you expand and explain the Google sitelinks patent? I still do not understand how this relates to search rankings.

    Daniel Tetreault.
    Victoria, BC

Comments are closed.