What Effect Might User Behavior Information Have on Page Ranking
Historically, search engines have ranked web pages in search results based upon a combination of information retrieval (IR) score based upon a matching of terms in a query to terms in a document, as well as a linked based score that calculates the quality and quantity of links pointing to a page, based upon a method like PageRank.
A new patent filing from Google explains some shortcomings of these approaches. It explains how a score based upon user behavior information might be used either in combination with those approaches or in place of them. For example, the patent tells us that term-based methods can be biased towards pages where the content or display of those pages has been manipulated to focus upon those terms. We’re also told that link-based approaches are limited in that relatively new pages have usually had fewer links pointing to them than older pages, so they often have a lower link-based score.
Instead, pages that are returned as being responsive to a particular query might be assigned a score based upon user behavior information and ranked based upon those scores or combined with IR and link-based scores.
The patent application includes examples of two types of usage data: frequent visits to a page or site and many unique visitors to a page or site. Still, it tells us that other user behavior information might be included as well.
Interestingly, the patent application was filed on February 24th of this year, the same date that the first Google Panda update went into effect. There may be a connection, but the information that we’ve been provided about Panda seems to go beyond what is included in this patent filing:
Methods and Apparatus for Employing Usage Statistics in Document Retrieval
Invented by Jeffrey A. Dean, Benedict A. Gomes, Krishna Bharat, Georges Harik, and Monika H. Henzinger
Assigned to Google
US Patent Application 20110179023
Published July 21, 2011
Filed: February 24, 2011
Methods and apparatus consistent with the invention provide the improved organization of documents responsive to a search query. In one embodiment, a search query is received, and a list of responsive documents is identified. The responsive documents are organized based in whole or in part on usage statistics.
How User Behavior Information Scores Might be Calculated
The patent description is fairly simple and provides an example of how user-based data might be used to rank search results, providing some details of how usage information scores might be calculated. Here’s how it works:
- Someone performs a query
- The search engine returns a list of results responsive to the query-based likely upon IR and link-based scores
- Once those pages are returned, they may be organized based upon usage statistics, in whole or in part
- Those usage scores could be assigned to pages at the time the query is performed or beforehand and may be based upon a variety of useful information
- Examples of usage information in the patent include (a) frequency of visit information and (b) unique visitor information
- The rankings of those pages could be combined with link information such as PageRank and/or query information involving how well the query matches up with terms and phrases on the page
- Other information, such as the length of the path of a document, could also be used in ranking pages
The patent filing includes a detailed example of how a total score for a page might be calculated based upon the usage score and an IR score:
In one particular version, the documents might be organized by a total score that equals the usage score multiplied by the square root of a query-term-based Information Retrieval (IR) score. For example, the usage score might be calculated using a frequency of visit score multiplied by a unique user score multiplied by a path length score.
The frequency of visit score equals log 2(1+log(VF)/log(MAXVF), where VF is the number of times that the document was visited (or accessed) in one month, and MAXVF is set to 2000. A small value is used when VF is unknown. If the unique user is less than 10, it equals 0.5*UU/10; otherwise, it equals 0.5*(1+UU/MAXUU). UU is the number of unique hosts/IPs that access the document in one month, and MAXUU is set to 400. A small value is used when UU is unknown. The path length score equals log(K-PL)/log(K). PL is the number of characters in the document’s path, and K is set to 20.
Frequency of Visits
The frequency of visits to a page could be calculated a few different ways:
- The number of times that a page has been visited
- The number of times that a page has been visited over a certain period of time (such as 100 visits in the past week)
- The change in the number of times a page has been visited over a given period of time (e.g., 20% increase during this week compared to the last week), or
- Any number of different ways to measure how frequently a document has been visited
These counts might be filtered to remove visits from robots or automated agents, or people affiliated with a page since those might not be considered to represent an objective usage of the page. (That would likely rule out visitors that someone might send from a service like Mechanical Turk as well.)
In addition, some factors involving the nature of the visits might also be considered, such as assigning a weighting factor based upon the geographic source for a visit. So, for example, a visit from Germany might count twice as much as a visit from Antarctica. Other types of information about the visit might also provide different weights to the visitor frequency score, such as the browser used to visit the site or other information about a user. Unfortunately, the patent doesn’t provide many details about these other types of user information.
Number of Users
Very similar to the frequency of visits above, but focusing upon the number of visitors rather than visits, calculated in several ways:
- The number of users that have visited a document in a given period of time (e.g., 30 users over the past week)
- The change in the number of users that have visited the document in a given period of time (e.g., 20% increase during this week compared to the last week)
- Any number of different ways to measure how many users have visited a document
Users might be identified based upon information such as:
- A user’s Internet Protocol (IP) address
- Their hostname
- Cookie information, or
- Other user or machine identification information
Again, the number of visitors may be filtered to remove automated agents or robots and people affiliated with a page.
And again, there may be additional weightings of the number of users based upon the nature of the user. For example, a user from Germany might count twice as much as a visitor from Antarctica for a certain page. Other information such as browsing history, bookmarked items, and others may also impact the score based on users’ numbers.
Other User Behavior Information
While the patent filing points to frequency of visits or number of visitors as user statistics that could be used to rank documents, we’re told that “those skilled in the art” will recognize that there are other types of user behavior information and techniques that are “consistent” with the invention.
We’re also told that instead of maintaining this kind of user behavior information for individual pages, it might be done on a site-by-site basis. The site usage information is associated with some or all of the pages on that site.
User Behavior Information Example
Three pages are returned on a search for [weather]:
- The first page was visited 40 times over the past month, with 15 of those 40 visits being by automated agents
- The second page, which is linked to from the first page, was visited 30 times over the past month, with 10 of those 30 visits coming from Germany
- The third page, which is linked to by the first two pages, was visited 4 times over the past month
Under a term-based search ranking approach, the pages might be organized based on how frequently the search query term (“weather”) shows up on each page. The second page includes three occurrences of the term “weather,” so it would show up first. The third page has “weather” in it twice so that it would be ranked second. The first page only has one occurrence of “weather,” so it would be listed last.
Under a link-based search ranking method, the pages might be listed based upon the number of other documents that link to those documents. Since the third page is linked to the other two pages, it would be first. The second page is only linked to by one other page so that it would be listed second. Our first page would be listed last because it has no links to it.
Under a usage information-based ranking approach, the pages might be ranked differently.
Looking just at a raw visit frequency, the pages might be organized into the following order: first page (40 visits), second page (30 visits), and third page (4 visits).
If those raw visit frequency numbers are refined to filter out automated agents and to assign double weight to visits from Germany, the order of the pages might change to the second page (effectively 40 visits, since the 10 from Germany count double), first page (effectively 25 visits after filtering out the 15 visits from automated agents), and the third page (effectively 4 visits).
The user behavior information might be combined with either or both the IR scores and the link scores.
Interestingly, this patent application was filed on the same day that Google launched the first Panda update, and it’s possible that some aspects of how the Panda approach works might be based upon user behavior information.
It’s also interesting that if you look at the list of questions in relations to Panda that Amit Singhal pointed out as” questions that one could use to assess the ‘quality of a page or an article,” they focus upon the “quality” of a user experience on a site. Panda does seem to focus on creating a ranking score based upon features found on a page or site rather than actual user behavior data, as this patent does.
We don’t know if Google is now incorporating user behavior data such as visitors’ frequency or numbers into how they rank pages. Still, it wouldn’t be surprising if they are (as part of Panda, or independently of it) or will be in the future.
120 thoughts on “How Google Might Rank Pages Based upon User Behavior Information”
Nice post, Bill!
My sense is that Google already includes some usage information in its SERP algorithm. Has SEOmoz not done some correlation studies of usage metrics?
Keep up the great work!
I know that if “user activity” metrics aren’t a large part of the algorithm, eventually, they will have to be.
That is one of the only metrics that can’t be manipulated, at least not easily.
I remember in one interview Matt Cutts said that bounce rate wasn’t a ranking factor. I am sure things have changed since then…
I’m sure they’re using a form of bounce rate- as best as they can calculate it by users returning with [back]. That one signal alone would tell Google if people think a site isn’t deserving of its spot.
As opposed to assigning weight for geo areas, do you think Google would be using Analytics Benchmarks data to assign metrics relative to others in the industry?
Another great find Bill! The date suggests some sort of link to Panda, and the part about assigning user data on a site-by-site basis is another similarity.
However I do wonder how active the use of user-data is in the actual ranking algorithm. Certainly the way it is discussed here is as a layer on top of the basics, possibly influenced by personalisation. But I also think that much of Panda is about machine-learning – those questions outlined in the post aren’t so much ranking factors as a guide that could have been used to identify ‘good’ sites against ‘bad’ sites that then went on to build models used in machine-learning. In this role, user data may not necessarily be used to change the results, but more as an influence on the results – i.e. if the algorithm is beginning to suggest sites that aren’t popular with the visitors then this data would subtly nudge the algorithm back towards the sites that were.
What you have discussed doesn’t really seem to suggest this use, but I think we do have to keep our minds open about just how much influence various factors have and at what stages of the process those factors are an influence.
How do we test this? Any ideas?
As long as google won’t have Google Analytics on every website, it’s almost impossible to use that kinf od data to improve search results…
I believe that SEOMoz may have looked at Alexa scores or similar, but correlation studies would be very, very poor because of the direction issue (do you rank because you have traffic or do you have traffic because you rank).
I think the strongest evidence of this type of data being used is the delayed, iterative nature of the Google Panda updates. The only reason why Google Panda updates would need to be distanced apart is because something other than computing time has become impossible for them to overcome. My guess is that is statistically significant traffic data.
Think about it this way. Let’s assume Google has now added click-through-rates from SERPs and Bounce Rates from SERPs as two new ranking metrics. The only way they get access to this data is if users actually search for that term on Google and clicks on it. Google cannot speed up the collection of this data, they have to wait for the natural flow of searchers to provide this. Thus, they are strapped to run Panda Updates only as frequently as they collect a large enough set of data to make statistically significant decisions.
Great post Bill. I agree with Chris (and with SEOmoz) that the Panda update was a more direct introduction of machine learning into the ranking system, and I would be shocked if there was not some correlation between this and the Panda update.
log 2(1+log(VF)/log(MAXVF) reminds me of the original page rank algorithm that evolved over time, which makes me believe that this will start to evolve (and probably has already) to take into account many other user metrics.
Do you think that this will ever be the primary metric, like links have been for many years?
The usage metrics discussed here i.e. frequency of visits and no. of users look basic and I believe that Google is now using something beyond them. Moreover, won’t these usage metrics be in favor of sites that already rank higher in the SERPS?
However, I believe that CTR and bounce are already used by the panda algos. But I think Google is computing (predicting) these metrics from the actual content, than by tracking the real user data. There are some research papers from Google on this.
That’s one of the questions that I usually ask myself when I run across something like this, “how do we go about trying to see if it is actually in use right now?”
Not completely sure right now, but it’s something that I’ll be thinking about.
I suspect that we will see more and more user behavior data flowing into how pages are ranked in search results. To a degree, even PageRank is supposed to stand in for user behavior and how someone might visit a site and browse to other pages by following links. It is interesting that the patent points out how it will filter out automated traffic and traffic from people “affiliated” with a page or site, so they are (and likely have been for a long time) considering how people would attempt to manipulate a system like this.
Google does have a number of different ways to collect user data information, from people using a Google Toolbar to people searching while logged into their Google Account, and Google collecting Web History. Google likely has an incredible amount of both search and browsing based user behavior data.
Google also has their search query logs as well, that they can explore to learn about how people search, and what types of queries might appear together in query sessions. Google Plus is providing a whole new set of signals involving people using the service, including their quality of interactions with others, categories related to what they post about or respond to, and more.
In effect, Google is able to create profiles of people and their interests based upon what they search for, what they browse and bookmark and set alerts for, what they link to, what they plus, and what they write about on Google Plus and on places where they’ve linked to with authorship markup.
Bill, please explain it in layman terms:
Sites getting more traffic get more traffic from Google or what?
We know for certain that Google does incorporate some user behavior data into personalized search results and customizations that they might show based upon recent search history for location and topic. There are a number of patent filings from Google that explore other areas where aggregated user data might influence the rankings of search results.
SEOmoz has done some correlation studies involving a number of different potential ranking signals, though I do like having primary resources like a patent filing directly from a search engine as a starting point to explore how different signals might be used by that search engine, which is one of the reasons why I’m happy to have run into this patent application.
“For example, a page providing relief assistance for a Tsunami that was created in response to the disaster might find itself with little PageRank but a lot of visitors, and may rank well based upon usage information associated with the site.”
Unlikely. Say bill.com/tsunami.html is very relevant for the 200x tsunami in …. All of the sudden it gets 10,000 visits. How did it get those visits? Almost all types of sharing leave traces: links, youtube mentions, FB shares, twitter, linked in, other shares…and of course LINKS which are the result of even a TV mention.
If the tsunami page got popular because Bill linked from the homepage then it should already be popular with search engines. Any site that can drive so much traffic with a homepage link has to be known to SEs as a quality site and as such its pages have some credo.
You may argue that twitter and some other links aren’t follow. True, but it will get enough follow links to show on the radar as people write about it.
Am I missing something?
Instead of the term “bounce rate,” think if the term “long click” to indicate the amount of time that someone might have spent between clicking on a search result and returning to the search engine to perform another search.
If someone arrives at a page, and doesn’t spend much time there, or doesn’t go to another page on the site, that may indicate that they didn’t find something of value on the page, or it might indicate that they found what they were looking for immediately – an answer to a question, a phone number to call, or something else.
The geographically related example in the patent (visitors from Germany count twice as much as visitors from Antarctica) is just one possibility of something that the search engine could use to give different weights to visits or visitors. Another one not stated in the patent filing might be how far down a page someone might have scrolled. I’d imagine that Google could experiment with a number of different user actions to see how much they might help improve the quality of results based upon a signal like this.
While the delay in calculating the Panda updates might rely upon collecting and updating user information, it may also have to do with updating and redefining training data to use in something like a decision tree process to see if the features selected during that process are effective in predicting actual user data collected for pages. For example, Google might want to collect data such as how many people blocked sites from their search results, and see if predictions from a decision tree process accurately matched up with those sites, to see how effective the training data and decisions made by the Panda algorithm matched that user information.
I agree that Panda seems more like a process that takes a seed set of sites that are considered high quality, and uses a machine learning process to identify different features on those sites, to classify other pages that aren’t manually reviewed. See: Searching Google for Big Panda and Finding Decision Trees.
Incorporating this kind of user information mentioned in this patent application into how pages might be ranked could definitely be a process that is independent of Panda. And both Panda and the process involving usage information both appear to be filters or rerankings of results like personalization or customization of results.
Yes, I’ve been approaching Panda as a machine learning process as well. See my link in the comment above, from about a week after the first Panda update.
I do suspect that other user metrics will be experimented with, and possibly were even at the time that this patent application was filed. The visitor frequency and visits are examples for purpose of the patent, but Google may be looking at other usage information as well.
I’m not sure that the use of link information between pages will ever go away completely, but I could see its role continue to diminish over time.
In some cases, pages that already rank well may benefit from any metric that is based upon some level of popularity. There are things that the search engine could try that might lead to having such signals carry less weight. See my post Time to Add Query Breadth to Your SEO Glossary?
In other circumstances, such as the introduction of new products or services, events such as natural disasters, terms that suddenly gain a lot of searches that might come out of nowhere, a new page or site might get as much visits as an old page that isn’t as relevant. For example, a page providing relief assistance for a Tsunami that was created in response to the disaster might find itself with little PageRank but a lot of visitors, and may rank well based upon usage information associated with the site.
The Planet paper that I wrote about (see link two comments above) which was co-authored by Biswanath Panda does seem to indicate that Google could use a decision tree ensemble process to explore features on manually selected high quality sites to predict things like visits to a page, rather than incorporating that user information directly into rankings. In effect, between Panda and this patent application, it seems like Google may be trying to come at the same problem (identifying high quality sites) with two different approaches.
Talking to yourself 🙂 ?
OK I get it, but still it isn’t the breakthrough we might think, correct? PR is supposedly updated live already or maybe at the end of the day. Google will see their page and most of their backlinks within minutes /hours given caffeine so they will rank quite fast. There might be some freshness /trending boost as well and their page will be linked as tsunami (let’s use that) or tsunami help, [country] tsunami page etc etc the same fashion that anchor text is used.
And would Google see and especially process visits faster than it sees links /text on page and other conventional ranking criteria?
These signals would not reveal anything about the quality of a user experience on a Website but they DO reveal the quality of the Website’s relationship with the searcher community.
People need to stop obsessing over Google Analytics. We have had clear proofs and documentation and disclosure from Google for years that they are tracking click data from their own SERPs pages. All the signals this patent discusses would be more reliably drawn from that data than any other source.
The Panda algorithm could certainly be looking at these signals. In all likelihood it is probably seeking correlations across all available signal profiles for indexed Websites with the hand rated/classified Websites in their training set(s). The significance of these and other signals would be determined on the basis of their statistical profiles.
Great points, and I agree completely.
No, the other way around.
Sites that wouldn’t get much traffic because they are pretty new might get more traffic inspite of a lack of links to them if they are optimized for terms that might be newly popular (trend terms like a natural disaster, or a new term for something that’s been around for a while, etc.) and start attracting some traffic to their sites because of that user information.
It’s an extreme example, but http://tsunamihelp.blogspot.com/ came from completely out of nowhere in 2004 because it was mentioned on every TV news program in the country as a site to go to if you want to donate money or help after a devastating Tsunami. It ranked pretty highly pretty quickly, but under an algorithm like the one described in this patent filing, it might start to rank well for a term like [tsunami donations] (or even [tsunami]) even faster because of user information associated with it. Given the ability of social network sites these days to potentially drive a lot of traffic to fairly new sites, usage data could potentially boost sites higher than they would without many links.
This sounds interesting, but I don’t see Google putting too much value in usage information. I feel as if Google might turn into a social media search engine if they did this, because only the popular and most visited sites would get listed first. That would take away from Google’s SEO system, where hard work pays off (to some extent at least). Interesting stuff nonetheless, I’d be curious to see this patent put to the test as a separate search engine based off of Google’s index data, just so we could see what the SERPs would look like.
Personally I have a strong suspicion that Google uses bounce rate in its web search algorithm and I think it has been doing this since early spring of 2010.
One thing I’d like to mention is that Google has publicly acknowledged that it uses clickthrough data for its News search algorithm. I’m surprised that isn’t noted more often in this discussion. Given that Google uses clickthrough data as a trust signal in News search, it seems quite easy to make the leap and imagine that Google is using it for web search as well.
This is interesting and something I would have imagined would have already been incorporated into the SERP’s.
With regards to the point that “Do you rank because you have traffic or do you have traffic because you rank” – there is a way of increasing the amount of traffic you receive through Google without having to optomise anything… If the increase in traffic changes the SERP’s – then it’s a pretty strong argument for using adwords…
@Daniel – I imagine they don’t need Analytics at all to score this data…obviously I could be way wrong but they must have the computing power.
@Bill This was a great find and I’d tend to think Google is going hard in this direction mainly because of the link spam gaming..it’s getting out of control.
@Jon I hear you about the hard work paying off, but my mention above about how easy it is to game Google’s algo with just a blast of links. The hard work route is starting to get frustrating at times!
This is a totally NEW direction that decreases the importance of the original pagerank or “reference” theory and instead relies on crowd sourcing.
This becomes much harder to game in the long run. Also, our job as SEO strategists are going to change significantly. We’re no longer link and anchor text machines, but social dynamic experts on how people behave and interact on the web.
I suspect through that the weight of the user metrics will increase VERY slowly over time, but pagerank and links will always continue to be a factor.
If google is using chrome and toolbar metrics, do you think it would be possible to beef up your metrics through something like a contest or giveaway? Say leave a comment or visit this page for 30 seconds then enter code that pops up, or something like that? Or if you had readers retweet for a contest entry, something along those lines?
Google has been playing with PR lately. Two of my sites are going up and down weekly. In fact, they just did a reverse both ways that is pissing people off big time.
As the algorithm becomes more and more sophisticated, nuanced & sensitive to real world trends or online user behavior so it becomes harder and harder to substitute for content that is of a high utility & can serve as a resource for visitors. Google are the leaders in search because they place the user’s interpretation or idea of relevance at the fore – so it makes sense that proxy metrics for usage data will more and more become incorporated into the algorithm.
I do though wonder – and please, if anyone has any practical information/insights on this one – say in a content niche where Facebook & Twitter share-ability dominates Google Plus more than most and seemingly will do for some time; do you see Google offering undue weight to “their” share-ability metric over the others? For example I have a site where fully 55% of my traffic stems from share-ability sources; but Facebook was outgunning our Google Plus by a landslide; would it still be worthwhile to keep the Google+ share-ability functionality to safeguard against Google using the +1 metrics (as per Webmaster Tools) as a significant/potentially significant in future ranking signal?
Thanks for the post Bill. As always a pleasure to read.
Whether SEO is different for each search engine? For example, I have optimized my keyword to First Page on Google, but the same keyword is on 9th page of Bing. What type of seo will work on Bing? For Google I have built back-links for my site.
This brings up another issue. If Google starts looking at usage, will they start requiring sites to have some type of cookie or code snippet in every website that wants Google to index them? Is that doable?
Good points. One of the potential problems with focusing too much upon user information is that sites listed highly in search rankings tend to get visited more frequently than sites that might be further back in search results. I suspect that Google’s Panda updates focus more upon features related to pages and sites that have little to directly do with user information, but might look at user information to provide feedback on those rankings.
It would definitely be interesting to see any data that Google might collect if they decided to experiment, and chose a data center to use this algorithm upon (possibly showing a certain percentage of searchers results influenced by this particular approach.)
Hope I’m not talking to myself. 🙂
Pagerank is supposedly updated in a fairly quick manner these days, but I don’t thing that it’s instantaneous. There are a lot of webpages out there to consider when assigning PageRank.
Google supposedly makes at least one change to its ranking algorithm on the average every day. Some of them might have a dramatic impact, like the Panda updates, and some may so slight as to only impact a small number of results. From our perspective, it’s hard to tell how much of a change an algorithm like this might have. The patent hints at the fact that it’s supposed to help pages where the main query term might not appear as frequently as it does on other pages, or pages that might not have quite the backlink profile as more established sites.
As far as we know, the process described in this patent might be aimed at helping fresh/trending pages rank higher for a while, at least when they are trending.
Google might process visit type data faster than crawling data, especially if it’s collecting that information from toolbars or browsers. Crawling can involve things like politeness protocols that might keep search spiders from requesting pages on a site too quickly.
It’s difficult to confirm or deny Google’s use of things like bounce rate. I’ve heard that speculation, but I believe that it’s something that Matt Cutts has stated that Google doesn’t use. A Search Engine Land interview from 2010 brought this response from Matt about the use of bounce rate as a ranking signal:
One thing that I wonder about when it comes to ranking signals for News Search and for Web search is that there are a limited number of sites that are included in News Search, and I know there are a number of ranking signals that may play a stronger role in News results. Are news sites in Google news more trusted than other sites that aren’t included? Is there a greater expectation for the reliabilty of news sites? Is a click through on majornewspaper.com a more trusted signal than a click through on somesiteyouneverheardof.com? Is that a possible assumption that a search engineer might make? I’m not sure.
I did write a post about a Google patent that explored some of the signals behind Google News rankings in Google News Rankings and Quality Scores for News Sources, and while it didn’t address that question, it did point out that Google may look at the collection of news sources it uses in Google News very differently than it does web pages.
There are ways to increase the traffic to a page that don’t involve directly optimizing a specific page or building links to a page. Sometimes those are intentional, and sometimes not. A mention in print or television or radio could have that effect. A newsworthy event could as well. A burst of interest in a topic that was little mentioned or discussed previously, but comes out of nowhere, or through a viral video, or through attention on a social network might create some buzz and traffic.
Would visits through sponsored search impact ratings if it brings visitors to pages? That’s a good question. No mention of that in the patent, and I wonder if the people listed as inventors even contemplated the idea.
It’s hard to tell if Google would incorporate this into their ranking algorithms, but I could imagine them thinking about it seriously enough to experiment with it. And if they spent some effort doing that, I could see them taking the next step of publishing a patent to try to get some value out of that research. Of course, that could be true of a lot of the patent filings that we see. 🙂
To a degree, linking has always been a social activity on the Web. Tools like WordPress and Blogger have made it easier for more people to self publish on the Web and link. Social networks make it even easier, though many social networks have built nofollow into their services. However, those mentions still may make their way to increased visibilty by the search engines (and will do so even more if Google returns real time search to us).
I’m wondering if user metrics might have a limited shelf life, in terms of only usage data from the last day or week or month might be used as part of the ranking algorithm. Would it make sense to accumulate visits or visitors for a page over a long period of time, or would that signal be more useful for a much shorter time period?
I do suspect that PageRank and link text will still hold a place in how Google ranks pages for a good while longer, but they might not carry as much weight as Google looks at additional signals as well.
It’s quite possible that Chrome and the toolbar could be used to measure this kind of user information.
I remember when Blekko gave away a bunch of t-shirts to draw attention to their search engine. They didn’t require any links back, but it did bring them some nice attention for a while. I don’t think that you need to impose conditions on visits if you do some kind of giveaway like that, and those types of things might actually sabotage those efforts.
Thanks. I’m not sure that Google would be concerned with whether a visitor arrives to one of your pages via Twitter or Facebook or through a blog post or a news article. It’s possible that they might count links or mentions or “sharing” from those sites and from Google plus differently if those are incorporated into a ranking algorithm, especially since Google has more data about the people using Google Plus, but if they incorporate user data based upon visitors or visitors, the source might not be as significant.
Not sure how much value you should give the toolbar PageRank that we are shown by Google. It’s only a snapshot of the PageRank of a page, and in the past, it has only usually been updated only 3-4 times a year. It’s possible that the first rollout of the two recent updates might have had some problems that were addressed with the second update. Then again, it’s possible that Google didn’t like having a PR 9 showing for their homepage, like it was for the first of those two recent updates.
Google and Microsoft definitely use different ranking algorithms, and there are some differences in how they rank pages. Microsoft had been using an algorithm called Ranknet for a number of years, though it’s possible that they’ve since moved to another. Chances are that many of the things that Google looks at when ranking pages, Microsoft also looks at. It’s also possible that Microsoft may not know about some of the backlinks that Google does as well.
Google has the ability to collect user information data without requiring sites to insert some kind of cookie or code snippet into their pages. Google can collect that kind of information from people who use the Google Chrome browser or who have the Google toolbar installed. They can also look at places like their query log file to see what people searched for and what links they clicked upon during those searches.
This is so interesting, and once again it confirms what I have been trying to tell my co workers when it comes to the user signals and how they interact with pages.
Cookies enable Google to track this and of course if Google Analytics is also installed on the visited website.
Great article, Bill. I’m going to be wild here and throw out a few assumptions. I guess the real questions here are:
1. What will be the sources of “usage data”.
2. Where they’ll use this data.
Usage from organic clicks won’t play a part
Why – If you rank #1 you’ll get more visits than the guy ranking #10. Does that mean you have a more valuable site? Not necessarily. One might argue that Google could give a score based on CTR in a certain position as an indicator compared to previous sites in that position. That is wrong. Just because your site had a better Title tag or snippet doesn’t mean you offered more value.
Toolbar, Chrome, Google Account history, etc will play a part
Why – Google can use this data to find what’s hot at the moment. Their index historically has been comprised of what is hot over a period of time (it used to take a while to get ranked organically in Google). They needed to find a way to find out what is “popular” on the web.
In the primary SERPs
Why – Usage will play a small part in the primary SERPs.
In a new section of the primary SERPs
Why – Google is all about innovation. My wild guess is they’ll have a few feature like “What’s Hot Now” or “Popular on the Web” or they’ll combine this with +1 to give users an option to find out what’s been popular based on their query.
My wild speculation is the following: They need to find a way to offer the best results without relying on links. Google’s been using non-link citations as a ranking factor. They’ve realized this is not enough and need something more. Non link citations + usage of a document = slightly better rankings in the primary SERPs OR top placement in a NEW section Google will reveal in the near future (or addition to existing feature).
Thanks. Google does collect user information in a number of ways, including toolbar usage, which can follow where you go on the Web, and when you’re signed into your Google Account, Google can collect a Web History for you. Google also has a query log file that collects a large amount of information about what people search for on Google and which pages they visit from those search results.
Google’s Matt Cutts has repeated stated that Google does not use Google Analytics data. See this video:
Is Google Analytics data a factor in a page’s ranking?
The short answer is, “No.”
I don’t think that Google Analytics information will ever be used in ranking web pages. Google collects so much information about how people search and browse and interact with others on the Web that they have a hard time figuring out what to do with all that data.
Good analysis. I agree with some of the points and do believe that Google will be looking to incorporate more user signals so that they can rely less on links. Right now the links have so much weight that it pretty much depends on what ad budget you have in order to get ranked high and become highly visible. Using the user signal approach, Google will be able to rank content based on what users want, rather than what links the content has.
Very good points about the sources of usage data.
I’m not sure if we will see an addition to search results to point out new and popular results. Google has definitely bumped up some results from time to time under what they refered to as a “query deserves freshness” approach, and what’s described in this patent filing could potentially fit into that approach. From a pure usability standpoint, would having two different sets of search results make things better for searchers or more confusing?
Non-link citations seem to have been a big part of local search rankings for a while, and I believe that they also play a role in entity association (which some people refer to, and by doing so sell short, as a “brand” enhancement since it can impact all kinds of named entities).
I think Google is striving to rely less upon links, though they’ve been working on that for a while now. From Google’s Peter Norvig telling us that “PageRank is over rated,” to Matt Cutts pointing out every so often when asked about PageRank that “Google has more than 200 ranking signals that it uses.”
I don’t think that advertising on Google has any role in how highly you are ranked organically, and I don’t think it ever will.
Long time no speak! Was on the hunt for more info ref page ranking and came across your website, once again! 🙂 Lots of useful detail as always, thanks again for taking the time to post. As an aside, as you kinda helped me out with some good advice all those months ago, I’m finally a 2/10 page rank with google myself, took a bit of time and effort but am happy with the result – this far.
All the best
Fascinating Bill! Ever since Webmaster Central started offering impression, click, and CTR data we’ve been going back rigorously every month and optimizing on-page factors. The next step is to include author and schema data to make SERP listings stand out even more. According to your geo-visitor factors (the Germany vs Antarctica example) do you believe local rankings are influenced by a higher volume and frequency of local visitors?
I think of all these “changes” are just smoke and mirrors. As long as you create unique high quality content and get good backlinks, you’ll rank, it’s as simple as that. It seems like everyday new “theories” are posted online but for all those people get affected by Google updates, I bet more than half of them are not doing everything can be doing to get their site ranked.
Thanks for posting this article Bill. This should be a wake up call to the thousands of webmasters displaying worthless information of their websites. There are a lot of websites sitting on the first page of Google that have useless content. Like my mentor taught me, ” Good Content is King.” Always write content with your reader in mind not the search engines!! Great post Bill!!
These guys think they can rule the web and own tiny websites.. common you guys suck… Google + is shit… Gimme a break
cool post, just found it per Seo United/Germany. We as a company always think Seo in the long run. Our question always was “what is the fundamental constant for search engines”. Of course the answer is 1) specific … 2) smart content concerning the subject. Therefore the race for top-positions is deceided editorially.
Thanks again for your thoughts
It’s good to hear from you. Good to hear that you’re seeing some improvement as well. Keep it up, and I suspect that you’ll grow even more.
It’s easy to get caught up in that data from Webmaster Tools, but I think it’s pretty useful as well.
I went through the steps for authorship markup when it was first announced, and took all the recommended steps, validating it in Google’s rich snippet tool as well, but have yet to see my profile image appearing next to my posts in search results. I may have to try a few more things to see if I can get that to happen, like try a higher resolution profile image.
I suspect that the locations of visitors could play a role in local rankings, as well as having visitors from a wide variety of locations helping your site be perceived by the search engine as a global destination as well. See:
Changing Google Rankings in Different Countries for Different Searchers
I do see a lot of theories that get posted online as well that there might not always be a lot of support for, which is one of the reasons why I like to look at sources like the patents that the search engines publish.
Most sites that I look at everyday could probably be doing much more than they are to rank better for the things that they cover. Creating high quality content (however you might define that), and getting quality backlinks is a good start, but they aren’t the only things that you should be doing. Another essential step is to make sure that you site is set up to be as search friendly as you can possibly make it, so that you avoid more than one URL for the same page, make sure the links to all the pages you want indexed can be followed by the search engines, and so on.
If you create engaging and usable pages and content, a goal that most webmasters who want their pages to be visited and used should have, you may also benefit if the search engines do consider usage information in rankings.
I’ve always amended that “content is king” approach to “context is king.” The right content at the right place and the right time helps meet a visitors expectations, and increases the likelihood that they will order something, or call, or bookmark and return.
Some people don’t like reading a lot of information about a topic, others want highly informational pages, others just want some clearly defined specifications or examples or instructions, others might like a lot of pictures. Can you please everyone? In some cases you can. That’s part of the challenge, and part of the fun of creating content on the web for visitors who may have differing needs and different preferences in how those are addressed.
I work with lots of people with tiny websites all the time, to try to make it possible for them to be found as easily as some of the huge businesses that are online. Tools like Google and Bing do make that possible.
Thank you. I think that anytime you create content for the web that you have to think about who your audience is, and how they are going to find you. Knowing some of the limitations of search engines as indexing programs helps. You need to use language on your site that your audience will use to find your pages and expect to see upon them. You also need to meet and fulfill their expectations with what you provide if you expect them to return or to order something or to refer your pages to others. SEO is more than providing information – it’s also about creating a positive user experience that meets the objectives of both site owners and site visitors.
Thanks Bill, will let you know when I’m up to 3/10! 🙂 take care and keep up the good work!
You’re welcome, Ross.
Hopefully that will be sooner rather than later. 🙂
Bill, you’ve truly one of the best SEO sites on the net. Seriously – I’m surprised G hasn’t tried to pick you up just so you stop doing the in-depth reporting on their patents.
I’ve been thinking for a few weeks now that perhaps the BEST panda metric is some type of query-by-query measurement of the “user satisfaction index” of a user visiting a web page. I think two very simple metrics could be in play to make this determination:
1) How often does a user visit a page, bounce back to the SERP result, and visit a subsequent result. (I’m going to call this the “Re-Search Rate”)
2) How much time elapses before the Re-Search occurs.
Key to making a judgment about the quality of a result relative to other results is to look at similar Re-Search rates for pages listed in the SAME results page. For example, let’s say I search for “green widgets” and a site ranks between #1 and #3 on a regular basis because of traditional on-page and off-page factors. The page’s Re-Search rate is 95%. Other pages lower in the SERPs have a Re-Search rate around 50% – that’s a clear indicator that my page satisfies the user less than the other pages.
I might also consider how long a user takes before assigning a weight to a particular Re-Search action. For example, if the user takes 15 minutes before going back to the SERPs and hitting another page, I might assume that the visit to the initial page was valuable and discount that negative action.
G could then look at this data on a site-wide basis… Do most of the pages on a site have a high Re-Search rate relative to their peers in the SERPs? If the answer is YES, one could make the judgment that the site as a whole is lower quality and assign some demotion factor to it.
It seems like the key to making this work is:
(a) normalizing the data across a wide variety of searches, which is why pages need to be compared to other pages in the same search results listings
(b) normalizing the data to account for differences in search ranking position. For example, it could be that the 6th result in any SERP page has a better change of receiving a lower Re-Search rate just because users give up, while the 1st has a higher Re-Search rate on average just because users want to see what else is out there, even after finding a good answer.
Assuming that this type of user behavior is a good indicator, it would solve a lot of other more difficult problems for G… The key is to make sure users derive value from the page and that they don’t go back to Google to find a better answer.
Thank you. Who says that Google hasn’t tried. 🙂
It’s possible that the Panda algorithm incorporates usage information in some manner, but it’s possible that they don’t use it directly. The algorithm seems to focus upon features associated with a site to predict usage, so it’s possible that usage information might be used to gauge how effective Panda is at predicting how relevant people might believe pages or sites are. The two different metrics that you describe could potentially be very useful in measuring that.
The time lapse period measure may be a little trickier. A very short visit – a few seconds, might mean that someone didn’t find anything of value. A very long visit (especially for a fairly short page) might indicate that a searcher was distracted during their visit, opened the page in a different tab or window and took a while to get to it, or something else unrelated to the relevance of the page. In many ways, this could potentially be noisy data. But, if there was a reasonable period of time that someone took on a page before they returned to search, and that usage data might be used in conjunction with other information, such as whether the page was saved as a bookmark or printed, or how far down the page someone scrolled, or other information, the mix of usage features becomes much less noisy.
if google was to base “relevance” on the metric bounce rate then there is no way they can differentiate the deference between organic clicks and clicks coming from sponsored ads – therefore people could improve their “relevance” or ranks with a good Adwords campaign & decent landing pages
I don’t believe that Google will use bounce rate as a relevance signal.
But if they were to do so, they would definitely track the path of that visit to see where the person came from, and chances are that they wouldn’t include clicks from sponsored advertisements.
Great article! I get so confused with how Google ranks sites, I get told by so many different sources how they do it, and what changes Google make every day and how it effects your PR, its a min field, and a lot of guess work, but this article proved helpful. Thanks.
you don’t really have to waste time trying focus or “get more” PR. it reports to you really late the truth about your relevance to a certain topic. So if you lost or gained some pagerank, that can be a result of a series of changes that have happened recently or maybe ages ago.
Hi Bill, I still believe to have seen evidence that Google DOES have to use bounce rate info for the bid pricing in Adwords so why should it not also use a high bounce rate to “understand” that if 1000 people searched for “cheese cake recipe” and landed on a site with NO cheese cake recipe just because it has this phrase in the URL or even meta title from old content. However but 94% bounce rate will tell Google that they didn’t find a “cheese cake recipe” so this result must be rubbish and must go down the SERP. I would love to be proven wrong to sleep better at night 🙂
Thank you. One of the reasons why I like to spend so much time looking at patents and whitepapers directly from Google is that I see so much guesswork out on the Web about SEO and Search Engines, and how they work and operate to index and rank pages.
Good points on how changes to rankings can come from many different directions, and can be the result of something that may have happened recently or something that happened a while ago.
As for bounce rate info for bid pricing in Adwords, there have been a lot of recent whitepapers from Google about new approaches that they are trying to take to identify click fraud and malicious advertising. This one was pretty interesting:
Incremental Clicks Impact Of Search Advertising (pdf)
I also feel Google does rank pages based on the usage information it might be one of the factors for its rankings and it might give more value to it in the near future because the usage info itself reflects how good the site is, but won’t it make a little difficult for new sites to rank as they won’t be having much traffic intially!
It’s possible that this approach may benefit newer sites rather than harm them, because it might help Google discover new sites that people are using that haven’t had the chance to start ranking well because they are new.
If a lot of people start using a new site, and it doesn’t have many links to it yet, it might climb in rankings because of that usage. For example, someone writes an article that a lot of people start pointing to in Twitter, Facebook, and other places where the links are nofollowed. It gets mentioned in newspapers, on television, and other places where there are no links for Google to index. But, because it’s being visited and used a lot, it might climb in rankings, at least temporarily, because of that usage information.
Yes I think I agree with you on this. It would be very interesting to see how much weight does Google put on this in the future.
Great stuff as always, Bill. I love the idea of reverse engineering Google’s patents. I always find myself fascinated by posts like this that break down Google’s intent and try to draw actionable conclusions based on the direction Google seems to be headed with their search algorithm.
The challenge is that there are often so many other things that might account for the rankings of pages. Take a soup pot, and add a couple of hundred ingrediants to it, and you may just lose the flavor of some of them when you sit down to taste it.:)
I love the chance to see Google explaining something that they’ve invented in their own words, the vocabulary they use, the “problems” that they set out to solve, the assumptions that they may make about how people search or how they build websites. We may not always get definite steps to take while building our sites or creating content, but even if we start asking a new question or two, or looking at things a little different, I think it helps. 🙂
I totally agree with @Donnie Cooper this algorithm must be similar to something more or less related to bounce rate.The lower bounce rate means good rank and vice-versa.
Hi Geek Revealed,
I’m not really sure that it’s a good idea to use bounce rate quite in that fashion for all sites.
For pages that are more informational, or that are aimed at getting visitors to do things like place a phone call, a bounce rate really doesn’t measure the success or failure of a page. For pages that are built primarily to get you to click upon a link on a page, then a bounce rate may be more meaningful.
“I donâ€™t believe that Google will use bounce rate as a relevance signal.” Hi Bill do you want to expand on your comment about bounce rate? I am a firm believer that it is a strong signal, although if you have data to change my opinion would love to hear it.
I know that Google’s Matt Cutts has referred to bounce rate as a noisy and spammy signal in the past. There’s some discussion of that in this webmaster world thread:
There are many reasons why someone might visit a page and not spend very long there and still find what they are looking for, such as a phone number, or a tidbit of information that they might be searching for. In those cases, those visits are very successful, and shouldn’t be interpreted as a negative for a page in search results.
Where a site owner might consider a bounce to be bad is when the purpose behind a page is to bring someone to a call to action, such as clicking upon an ordering page, or signing up for a newsletter, or going somewhere else that they might want them to. But, if a page is informational, or educational, or leads to some kind of action that can be done offline, that bounce can be very misleading as a signal about the success or failure of a search.
I do believe that bounce rate is factored into the Google algorithm, but with common sense. There are some pages that are just created to be bounced off of. For example, if I’m looking for a particular recipe, I’m likely to bounce off the page once I have found it. I don’t think such a page will be penalized for that.
But, if there is one page in that list of SERPS that consistently seems to hold on to readers so that they don’t bounce, then I think that page may be given a boost.
In the book “In the Plex”, rather than bounces they talked about “short clicks” vs “long clicks”. My goal is to make my content good enough that readers stay and enjoy!
In the Plex was pretty good, and I liked the distinction made between long clicks and short clicks. I agree that some pages are great resources and provide just the right information that people need, without them going to another page, and that shouldn’t be seen as negative that they might view one page and leave.
This makes me wonder if the Chrome login and Google Plus are going to be factored in as well, getting user’s data and determining page rank based on each user and their preferences and needs.
nice post, thanks for sharing !
I arrived to your site, trough looking for information about Page Ranking.
IT is not easy to keep up with all changes and updates from Google.
Obviously, it is not just always write “high quality content”, we must pay attention to many other SEO basic rules.
How much organic traffic in percentage do you think it will be “suitable”?
What is the average?
If you’re logged into your Google Account, the results that you see are often personalized in some manner, and that personalization is in part based upon user data.
I’m not quite sure that I understand your question about “How much organic traffic in percentage do you think it will be â€œsuitableâ€?”
The patent describes how Google might use user-based data regarding visits to pages on a site, but it’s written so broadly that I don’t think one could really come up with specific types of percentages. A percentage of organic visits compared to what?
thanks for your reply, and sorry to not be so clear
(unfortunately english is not my first language).
I meant “A percentage of organic visits compared to other sources of traffic, like social, referral and direct visits”.
What is in your opinion the average organic traffic that a good quality site should have?
(respecting all the SEO Google rules).
I think that different sites, with different audiences will vary in what might be the optimum types of traffic that arrives at their pages.
A site, for example, that has a really well known brand, such as ESPN or Sears should get a lot of direct traffic. That doesn’t mean that they should ignore attracting referral traffic or organic search traffic, but the percentage of traffic is going to be different.
A site that consistently publishes content that is picked up upon and linked to by others is going to have a very different footprint when it comes to the diversity of types of traffic to its pages, but that shouldn’t be an influence in how its pages are ranked, much as the direct traffic for the sites I mentioned above shouldn’t either.
The patent I wrote about really isn’t looking at these different types of signals, and using them in a quality analysis.
Some interesting research here bill, came across it looking for some information on Query freshness and how pages rank. In factors such as QDF and Bounce rate do come into play with some specific ranking elements overall.
It’s hard to say how Google fits signals like this into their present day algorithm. They have developed a number of approaches over the years that could work together or as alternatives of each other.
For instance, it’s likely that the Panda updates are an attempt to predict things like long clicks on pages, where people choose certain pages from search results and some spend time on those pages. But it’s not the actual user behavior that impacts rankings, but whether or not pages have “features” in common with a seed set of other pages that tend to rank well and attract those types of long clicks.
I have been trying to work our the correlation between visitors and rankings for a while and am glad I found this article. From my own experience I am pretty certain that there is more focus on bounce rate in relation to volume of visitors as appose to actual numbers of unique vistors.
Interesting article, thanks Bill. It goes to show that while Panda is basically a good thing IMHO, the wheel’s still in spin and we still don’t know all the implications.
It sounds like that patent could be an important thing for SEOs to have in mind in the future. It will be interesting to follow the evolving of the update. 🙂
Matt Cutts announced at SMX in his keynote yesterday that Google doesn’t consider bounce rate as a ranking signal. Regardless of what he said, and whether or not you might believe him, bounce rate is really a terrible signal to base rankings upon. It’s both a noisy signal and just not very useful.
There are too many reasons why someone might not choose to follow a link on a page they land upon. The most common one might be that they found what they were looking for upon the page that they landed upon. That could be a phone number or email address or a tidbit of information that the page addressed, which may have completely satisfied the informational or situational need that brought them to the page in the first place.
Bounce rate also varies in importance based upon the purpose of pages as well. On a page which acts as a landing page to lead you to a specific action, like buying something or finding out more information about a product or service, or signing up for a newsletter, bounce rate has more meaning than on a page with an objective of educating someone or building a stronger brand or influencing someone to take action elsewhere on the Web or offline.
And there are many other signals that are potentially better ones that could be used to determine how well a page should potentially rank for a specific query term.
Thanks. We don’t know how much of a connection this patent may or may not have to Panda, but it does show us that Google is exploring many different possible ways of deciding upon the quality of web pages, including looking at user data related to pages.
If we attempted to make a list of the more than 200 signals Google might use to rank pages, it’s possible that some user behavior data might be included within it, but the Panda update seems to be more focused upon features found on websites that might predict user behavior rather than an approach that includes user behavior. Those types of features, from Amit Singhal’s post about questions to ask yourself about your web site, seem to be things that improve the user experience on sites.
Thank you. I’m not sure how much of a role this particular patent might have in what Google does, but I’d definitely recommend keeping in mind that some aspect of user behavior may be playing a role in Google rankings.
Thank you so much for clarifying this for me.
It’s a good point you raise Bill about bounce rate being a “noisy signal” although I think over time and with enough data you can start to gauge trends and make adjustments.
Great article and I’ll be checking back here for more info!
Interesting post, I completely agree that Google (and the other search engines) are going to start looking more and more towards usage information to serve content. I think this is already very much in action on mobile search and regular search, when considering page load speed and it’s inclusion into the Webmaster tools suite last year. Some interesting points included in your article that I hadn’t thought about before though. Thanks for sharing!
You’re welcome. I don’t think that Google would rely exclusively upon usage information, but having multiple kinds of signals that can reinforce each other sounds like a good plan to me.
As for Pagespeed, making it easier and faster for Google to crawl and index pages they find by helping people with including site speed in Google Analytics, and providing tools like the PageSpeed tools makes a lot of sense. Giving people an incentive by telling us that the speed of pages may be a ranking signal for some sites probably influenced a lot of people to take action as well.
Comments are closed.