Search Experiment Papers from Google’s Mad Scientists

We’re not often given too much insight directly into how a search engine like Google might check on the quality of their search results, and the algorithms that achieve those results. When we are, it can be interesting to look at some of the processes that their researchers might use, the assumptions that they follow, and the conclusions that they find.

What kinds of experiments would you perform if you were from one of the major search engines, and you wanted to compare two different algorithms that provided similar quality search results? Or you wanted to learn more about how people use the search engine, and if small changes might impact that use?

A couple of recent papers from Google describe experiments that the search engine performed.

Search Task Time and Searcher Satisfaction

A paper presented by a couple of Google researchers at this year’s SIGIR 2009 conference describes one experiment that Google used where they assigned search-related tasks to a number of paid participants and timed how long it took those partipants to complete those tasks, using two different search algorithms.

The paper, Evaluating Web Search Using Task Completion Time (pdf), asked users to look for answers to questions such as the following:

Example Task #1:

I once heard a song in the ending credits from a movie about a group of young lawyers or college students from back in the 80’s. Jami Gertz and Kirk Cameron were stars in this movie. I think the song is called “Forever Young” but I want to know what the movie is called and who sings the song.

Example Task #2:

I’m trying to find out what Washington State governor served the shortest term in the past hundred years.

These questions, or tasks, were compiled by asking 150 people to describe difficult tasks finding information online that they had recently attempted to perform. From those descriptions, the researchers came up with 100 tasks which were then assigned to 200 other paid participants. The 200 were randomly split into two groups of 100 each, and one group was assigned Search Algorithm A, and the other group used Search Algorithm B.

The 200 users each were to go through all the tasks, one at a time, and keep searching until they either felt that they had found the answer, or until they reached a point where they believed that a typical searcher would have given up. At the start of each task, they would click on a “start searching button,” and when they completed, they would click on a “finish searching” button.

In addition to recording the time it took each person to complete a task, the searchers were asked to indicate how satisfied they were with their search experience with each task by indicating their level of satisfaction with the following choices:

1) Very Dissatisfied
2) Dissatisfied
3) Neutral
4) Satisfied
5) Very Satisfied

If you were to guess that the longer it took someone to complete a task, the less satisfied they were with the experience they had in searching for answers, you would be correct.

We aren’t told much about the differences between Search Algorithm A, or Search Algorithm B in the paper, but we are told that it did take searchers between 1 percent and 17 percent longer to complete the same task on Algorithm B than it did on Algorithm A, which could be considered significant.

Google Speed Experiment

Google also recently reported on an experiment where they introducted slight delays in how long it took search results to load for searchers, to see if those delays influenced how people would react to longer times. the paper is Speed Matters for Google Web Search (pdf).

We’re not told how many people were involved in this experiment, though it seems that the people who participated probably weren’t aware that they were participants. Two groups were chosen – one impacted by delays and a control group, and the experiment seems to have been done in two parts (with different groups of users), each over a 6 week period of time where searchers were subjected to a delay of either 200 milliseconds (ms) or 400 ms, and then a period of 5 weeks after the delay was removed.

Not surprisingly, people who face delays started searching less. People who were subjected to the longer delays (400 ms) continued to search less in the five weeks after the delays were removed.

Conclusion

I’ve heard of other experiments from Google on how they have make slight changes to what they present to users to see what kind of impact those changes might have, such as increasing the font sizes shown to viewers on Google Maps, or moving the display of the Map shown in Google Maps results from one side of the page to another.

Chances are that if you use one of the major search engines its possible that you may have been subjected to one experiment or another at some point, without even knowing it.

Testing and measuring the impact of those tests is something that every web site owner should consider doing.

If you run a site, what have you learned about your web site, and the way that people use it recently?

Share

22 thoughts on “Search Experiment Papers from Google’s Mad Scientists”

  1. Let me get this straight, subjects (who did not know that they were subjects and we do not know the nature of the control group) started searching less because their search results were .2 to .4 seconds delayed? Now that is attention deficit if I’ve ever heard it. I maintain that speed of results display is a metric contrived by computer science because they cannot really measure the actual relevance of the results. I’m just sayin’.

  2. Nice comment Marianne, seems ridiculous in the extreme – could you really even tell me the difference between 2 searches with a time to display results 0.2 seconds apart?

  3. Well…I know 200 ms is an eternity in some of my favorite twitch-games, but it doesn’t seem significant in terms of retrieving and reviewing web content. Then again, what use is my subjective opinion against raw data? And while the data looks a little noisy, the trend-lines are obvious and almost perfectly proportionate to the level of delay.

  4. Aloha Bill,
    That’s the beauty of having access to the actual algorithms, you can combine grey box testing with ad hoc user acceptance testing or just about anything else. Looking back on the days when I wrote test plans and test cases for testing software, the one sticky area was that when you gained performance in one area you usually lost it in another. Adding that the outcomes of the tests are easily dictated by manipulating the engine powering the data.

    Realizing this may be a bit like walking on thin ice for some companies, I would rather see more data on performance benchmarking bounced against other competing search engines.

    Charles

  5. Even better would be for the Search engine companies to do something like MIT’s Autonomous Robot Design Competition. Pit the various engineers in a public contest for speed and accuracy/relevance. That would be way more fun than some dull seo expo.

  6. Hi Marianne,

    The speed test wasn’t tied to the relevancy of search results as far as I know. We aren’t told very much about that experiment though, such as how many participants were included, so it’s difficult to critique the conclusions.

  7. Hi Steven,

    I remember back when we would see search results from sources like Altavista where the amount of time it took to see those results were measured in seconds rather than milliseconds. I guess we expect more of search engines today. I find it interesting that the major search engines continue to tell us how long it took for them to perform a search.

  8. Hi Charles,

    I’d love to see some comparisons from the different search engines. I don’t know how helpful those might be, since we don’t know what processes are actually going behind the scenes, but it would still be interesting.

    As for measuring speed and relevancy, there’s often a trade off between precision and recall in results, as well that likely is different from one search engine to another. Add things like duplicate content filtering, and possible filtering of other things such as spam, use of caches for search results for queries, and other factors, and it might be very difficult to make those comparisons.

  9. Theyve done it to me today with some bigger search text! Erm, not sure if it makes me happier with the search process or not. For about 20 minutes I thought it was an error in the CSS or the new FF was doing funny things with my view zoom.

    I’ve noticed the GA text keeps getting larger and smaller as well. I thought this was probably another CSS or zoom error but maybe theyve been testing that on me as no else seems to get the same issue?

  10. Hi Matt,

    Google does have a good number of people in their usability testing area. I’m not sure how often they might pull experiments like making changes in font sizes or doing things like moving the map in Google Maps from the left side to the right side, but those kinds of experiments do happen. It’s interesting that Google will sometimes bring these experiments live to the public, without any announcement or fanfare, but I guess with the search volume that they have, they can gather a lot of data about those changes fairly quickly.

    When Google recently moved their ads from the right column a little closer to the search results, I thought something was wrong with their CSS, or with my browser as well.

  11. Hi Bill,

    Seems like was part of the speed testing group conducted earlier, with the search query speed pretty slow i must say and now its back to normal. It really do affects one motivation to continue searching more.

    And recently, seems to find the query search bar longer than usual on my google search bar, another test done by google i suppose ?

  12. Hi Deric,

    I don’t know if I was one of the “participants” in this study, but I have started paying a lot more attention to the messages that the search engines show about how long results for my queries have taken since I read that paper. Sorry to hear that you were one of the text group.

    I have seen a number of reports on a longer query search bar. I’ve also noticed larger fonts being used in the predictive search results that appear under the search bar, over the past couple of days.

  13. I have noticed larger text experiments and as we are now aware, Google ‘Caffeine’ is so quick it is mind blowing. I guess some of this testing was designed for ‘Caffeine’. Experimentation is the main key to the success of a website online. We all know a ‘call to action’ should be above the fold (above the fold); thus enticing the reader/client to do something that you want them too. Wy do we know this, it is a proven technique. Google are just taking experimentation further – “it’s all about speed or Caffeine!”.

  14. Hi Lee,

    It’s interesting to see Google perform some of their experiments live. I know that they have a good sized usability testing lab, and they seem to take advantage of the expertise of the people who man it. I don’t think that it’s all about speed, but that does seem to be an issue with Google – the simple front page of their site has changed over time, but it’s still very simple compared to many other search engines.

  15. Good article Bill, very interesting to see what Google might be planning to implement. I’m almost certain I have fallen into one of tests on more than one occassion.

  16. Hi SJL Web Design,

    It always pays to keep your eyes open when you search. You never know when you might see something that one of the search engines is doing something different.

  17. I think this study was directly related to their new roll-out of the appearing homepage. I had heard that they reviewed speed tests and with their new homepage were hoping to decrease the time between getting to Google.com and find the answer to your question.

  18. Hi Juggle,

    You may be right about the speed study being related to the new homepage. Shaving even just a little bit of time off a site that gets so many visits has to make a difference.

Comments are closed.