Many years back, I remember being humbled by a homework assignment crayon drawing by a friend’s son which listed what he was thankful for and included his parents, his sister, and shoes that Thanksgiving. We take so much for granted that we should be thankful that we have. A few friends and I had gathered over my friend’s house, and we were all knocked somewhat silent by the picture when he proudly showed it off to his father. Thank you to everyone who stops by here to read, to learn, to share, and to add to the discussion. Thank you too, for the chance to share the things I find and the things that I learn from you all.
On Monday, I wrote about a recently granted patent from Google that described How Human Evaluators Might Help Decide Upon Rankings for Search Results at Google. Interestingly, this week Google was granted a patent that describes an automated method they might use to check the quality of specific sets of search results.
When Google responds to a searcher’s query, it presents a list of pages and other kinds of documents such as images or news or videos. The patent’s filing date is from before Google’s universal search but probably does a good job of describing something Google might do with web page based search results.
The results that searchers see are ranked in order so that the most relevant and/or highest quality pages should be listed at the tops of the results. Google may monitor the quality of those results to try to identify general trends of “improving” and “declining” quality amongst them and to identify specific problems with them. As this patent notes, “Manual evaluation of search quality can be laborious and time-consuming, typically allowing for a small number of searches and search results to be evaluated to determine the overall quality of a search engine.”
The solution described in this patent is to learn a certain baseline of behavior for searchers based upon using some time as a training set, by recording user behavior relating to each of the search results, and then comparing future user interactions against that baseline.
The patent is:
Systems and methods for determining a quality of provided items
Invented by Alexander Mark Franz and Monika H. Henzinger
Assigned to Google
US Patent 8,065,296
Granted November 22, 2011
Filed September 29, 2004
A system may provide items during a period and determine the quality of the items provided during the period using a time series model.
Google may take user data for the results from a specific time, such as 100 days, and choose one or more metrics related to those results to monitor for that period. It may then look for changes to how people interact with those results to come up with a prediction of how well those results might fulfill the needs of searchers.
These measures might include something like the percentage of searches for which the highest-ranking result (or one of the highest-ranking results) is selected by searchers.
If user interactions meet that prediction, then the set of results might be said to meet expectations. If not, then they might be considered to be below expectations, and these determinations can be done without manual intervention.
The description provided in this patent focuses upon identifying the quality of search results, but it could also be used with other items such as the quality of advertisements
User Metrics for Automated Evaluations
The time used, such as the 100 days I mentioned above, would ideally be a period where no major quality problems exist. The metric or user behavior measurement that would be used to reflect the quality of the search result could be many things or combinations of them. The patent provides some examples, such as:
- The percentage of searches in which the user selected the first result (or one of the top results) in the list of search results
- The average first click position (i.e., the numerical position within the list of results)
- The percentage of searches that had long clicks (i.e., the percentage of times that a user selects a link to go to a result page and stays on that page for a long time, such as more than 3 minutes)
- The percentage of searches that did not have another search within a short time
- The percentage of searches that did not have a reformulated search (i.e., a search where one or more search terms in the original search are added, deleted, or changed) within a short period
- A combination of different metrics, and/or the like
The time, referred to as a “time series model” in the patent, may reflect things like trends and seasonality during the period in which data is collected, including patterns such as:
- User behavior on weekdays compared to user behavior on weekends
- User behavior at night compared to user behavior during the day
- User behavior on Mondays compared to user behavior on Tuesdays
- User behavior on fixed or moving holidays, etc.
If this system indicates that the user behavior involved falls outside of expectations based upon the previously recorded training period data, then some kind of remedial actions might take place.
That can include a notification being sent to a system administrator. It could mean that the last change or a recent change to a server might be automatically rolled back, or manually reviewed and then rolled back. Those types of changes might include a change in data or a change in programming code or other changes.
The changes that would trigger a remedial action would have to be statistically significant as an indication of changes.
The patent was originally filed back in 2004, but it’s hard to imagine that Google hasn’t had some kind of process like this in place since at least then.
Google’s search results have changed in a good number of ways since then, including the introduction of additional types of data from different data repositories such as Google Maps, Image and video results, news and books and music results, of the kinds that I described in my post How Google Universal Search and Blended Results May Work. These additional results aren’t some separate properties that Google is artificially promoting in search results, but rather information of different types that Google has decided are relevant to a query someone types into search results.
These blended results do make the process described within this patent more complicated than could be handled by a system that might use a period of 100 pages or so to monitor user behavior to assess the quality of search results.
The Google Caffeine update also introduced a new infrastructure to how Google indexes and processes content found on the Web and makes it more likely that changes will take place to search results much more rapidly than in the days that this patent was originally filed. Again, a time of 100 days sounds unrealistic as a period to collect data about specific search results to use to compare against present-day user behavior in the days after the Caffeine update.
Google has also been working on more personalization within search results as described in a post today at the Google Inside Search blog, titled Some thoughts on personalization. Google also will provide customizations to search results for different searchers based upon their recent past search history and their location. We’ve been also seeing a push towards displaying fresher search results, and hints of things in Google patents like the possibility that they might demote some search results seen by searchers when those results appear in subsequent search results for very related queries within the same query session.
With all of these possible changes to the search results people see, and others, can an automated system like this which monitors specific user behavior and predicts future behavior for specific queries work well?
With the large number of experiments that Google supposedly performs on search results every year, cited by Google’s Peter Norvig recently as numbering in the tens of thousands, you might expect that a way of continuing to predict user behavior when changes are made would be something that Google would pay attention to.
The Google Panda update, focusing upon improving the quality of pages that show up in search results, may have been partially inspired by the rapid changes to search results that Google experienced after the Caffeine update since Caffeine drastically increased the number of changes that could happen to search results. Panda also seems to be a system aimed at predicting user interactions such as clickthroughs and long clicks to pages.
What kinds of automated measures do you think Google might be using to monitor the quality of search results these days?
Thanks for reading, everyone.
18 thoughts on “How Automated Evaluations Might Help Decide Upon Rankings for Search Results at Google”
Thanks for the article, I did enjoy reading this.
But I am wondering why all the posts seem to be about Google Patents now? The patents are interesting in seeing what the search companies are thinking about and what they think is important as a competitive advantage, but I rather liked reading about the research papers published by Microsoft, Yahoo, and Google that you used to post. The research papers seemed to be more current (since the patents are filed many years ago) and the major thing is they had EXCITING things like experiments and data and results; they had charts and statistics and prototypes!
I would love to see you bring back the research papers and even better would be posts that tie together the research and the patents. Maybe something like “this is the patent they filed in 2007, and the recent published research by them was a prototype that used the patented technology and they discovered a 5% improvement in nDCG.”
Thanks for your kind words, and for your feedback.
I do like looking at whitepapers from the search engines as well, and I have been keeping an eye out for interesting ones to write about. Unfortunately, I do only have a limited time to blog every week, and I do tend to spend a lot of time on the posts that I write. Going through a patent and trying to capture what it’s about and what it might mean, and how the search world might have changed since it was written can be a pretty time intensive activity.
The research papers are often more current than the patents, but I have written about patent applications and granted patents as they are either published for the first time, or granted, and even though some patents may be from a few years ago, sometimes the things they describe are things that are just happening now. For example, a post I wrote in 2006 about a Google patent described the possibility of Google Instant, which we didn’t see until last year.
Let’s take the patent that I wrote about in this post. The process that it describes is likely very close to something that Google may have been using at one point, but it’s also very likely that Google has changed the way they have automated the evaluation of search results. In my conclusion, I point to a number of those reasons why the system described in the patent is probably less effective than it would be in the past. It’s not something that Google has written a whitepaper on, and it would be great if they did, but I’m not expecting them to any time soon. I’m also not expecting them to write a white paper on the topic of my previous post about how they might manually evaluate search results.
SEO doesn’t have an official manual, nor whitepapers that describe how best to do it step-by-step, so when something like some of the patents I’ve been writing about recently such as the following show up, it’s hard not to write about them:
– How Google might be doing automated and manual evaluations,
– How Google might impose limitations on the value of exact match domains when those contain commercial terms,
– How acquired patents and technology from acquisitions from Katango and Apture might influence what we see at Google in the future,
– How Google analyzes the hierarchy of a site and may decide to show breadcrumb navigation for a page,
– How Google might attempt to understand when the same pages are being displayed with and without a “www” in their URL,
– How Google might identify when a site has been acquired by someone else and transformed into a doorway page
Not only are there no whitepapers on these topics, there’s very little information about them anywhere else on the Web.
When I write about topics like these, I do try to see if the inventors listed in those patents have written more on the topic, or if others have written something relevant. It’s great when I find something from someone that expands upon a topic that I’m writing about. Unfortunately, I only see whitepapers related to patents in a small percentage of cases. I’m more than happy to see someone point to one in a comment for a post, so if you do see a post I’ve written about a patent here and you know of a whitepaper that covers some of the same topic, please feel free to mention it. I’m a big fan of collaborative learning, and I do hope that my posts inspire others to join in and discuss what I’m writing about, which is part of the reason why I spend so much time responding to comments left here.
I promise, I will continue to search for and try to find whitepapers that are interesting, that give us some idea about what search engineers are looking into, and that give us current information about what the search engines are doing. I love seeing those and reading them and trying to put them into context myself.
Thank you for sharing that emotional story about your friendâ€™s son and drawing.
I know that google already has some kind of algorithm that determines the quality of the search results. For example if you click on the first result and unhappy with the site you hit the back button and click the next website in list, the first website gets a â€œthumbs downâ€.
I think the new updates would bring an even more accurate result for the searches.
Iâ€™m not sure about the â€œtime seriesâ€ how and how is going to work though.
One thing Iâ€™m not happy is the â€œmanualâ€ intervention.
â€œWhat kinds of automated measures do you think Google might be using to monitor the quality of search results these days?â€
I think some of the things we discuss here have at least already been tested. Google is always changing its algorithms and we know for sure that they are after personalized searches (as you said depending on location, browser activity etc)
Thank you. Google does collect an incredible amount of information about how people interact with search results, and they may use that data in a number of ways, that could influence the rankings of pages shown within search results.
I do believe that they wouldn’t rely too much upon any one single user-behavior metric if possible, since it might be hard to determine sometimes why someone behaves the way that they do. For example, if someone clicks on a link in search results to a particular page, and then returns quickly that isn’t necessarily an indication that the page isn’t a good response to their query. Instead, it could mean: (1) they’ve seen that page before, (2) they want to see some other pages to compare against that first one, (3) they wrote down a phone number they found on the page, (4) they are the owner of the page and they are just checking to see that it’s in the search results and working right, (5) they are a competitor or interested in competing in that space and they are checking to see what else is there, (6) they were interrupted during their search by a phone call or something else, and ended the search, (7) they wanted a service or supplier that might be closer geographically, etc.
Chances are good that Google would try to look at other signals as well to get a better picture of what is going on with search results. For example, if someone does click on a result and return almost immediately, what do they do next? Do they click on another result from the same query? Do they search for a very related term?
A long time period like 100 days just doesn’t seem like it fits, but I wonder how necessary it might be with all the searches people perform at Google these days.
I’m wondering what the likelihood of automated evaluations are for becoming a greater part of the Adwords Quality scoring system? Or does your research suggests its only being used for organic listings right now?
I enjoyed the information in this article…but the width of the column makes it very difficult to read. Merely from a readability perspective, your readers will find it easier to read your advice if your template could make the text area narrower.
If that isn’t possible, shorter paragraphs make it easier for the eye to ‘return’ to the next line of text.
Keep up the good work.
Chances are that really good that Google does a lot of automated evaluations of advertisements and of landing pages, and has been for a few years. Here’s a post that I wrote in 2007 about a patent they published at that time describing the kinds of things they might be looking for when they do that:
How Google Rejects Annoying Advertisements and Pages.
In some ways, there’s a similarity there to what Google seems to be doing with their Panda algorithm as well.
Thanks for the feedback.
The template is fairly fluid, and it does make the text area narrower as you make your browser window smaller. I did just tweak the template so that it can get even a bit narrower, and hopefully that will help your experience.
I do also try hard to write with shorter paragraphs. I try not to write too many paragraphs here longer than 2-3 sentences.
Google is really doing alot of new stuff lately with their search patterns. I’ve tried putting a google +1 on a website that ranked very far from the first page for a keyword. After it received my +1, when i searched using that same keyword again, it went up so many places and went to page 1. That was so crazy.
I’m seeing a lot of changes to search results on a regular basis as well.
Curious as to whether you were logged into your Google Account when you saw the page you +1’ed move up in rankings, or had repeated that search since then without being logged in. Did Google move that result up only for you, and only when logged in, as a form of personalized search? Or did the +1 affect everyone else who is seeing the results for that search. 🙂
Oops yeah I forgot to say that I was logged in to my gmail account at that time, so it’s most likely a personalized searched. When I searched using a different browser it was back to it’s normal place as it should be. So it’s likely that any site you google +1 will move near your page 1 of results while you’re logged in to the gmail account that did the +1
OK, Good to hear. I think it’s good that a single +1 doesn’t have the power to switch around search rankings that drastically for everyone.
I think it’s possible that a number of +1s may potentially help boost a result in rankings for everyone though, but it’s less likely that one or a handful may.
@Gil, Bill: I don’t understand why would Google boost a result that has a lot of +1s. If that is true, then everyone will tell their friends to give “+1”, so it will be an contest between who has more friends, like those contests of Facebook.
I tested it (tell some friends to give “+1”) and, fortunately, I didn’t have any results. I hope that Google won’t change that.
Since you wrote your comment, Google has now launched their Search Plus Your World update. One of the things that I’ve thought about since they have is that if I “plus” something, or one of my connections on Google Plus does, those things are going to show up in the search results I see.
I’m hoping that people choose to plus things that they like because those things are high quality, rather than because they were asked to by friends. But, I’m guessing that a lot of people will plus something from their friends regardless of the quality.
Comments are closed.