Search Experiment Papers from Google’s Mad Scientists
We’re not often given too much insight directly into how a search engine like Google might check on the quality of their search results, and the algorithms that achieve those results. When we are, it can be interesting to look at some of the processes that their researchers might use, the assumptions that they follow, and the conclusions that they find.
What kinds of experiments would you perform if you were from one of the major search engines, and you wanted to compare two different algorithms that provided similar quality search results? Or you wanted to learn more about how people use the search engine, and if small changes might impact that use?
A couple of recent papers from Google describe experiments that the search engine performed.
Search Task Time and Searcher Satisfaction
A paper presented by a couple of Google researchers at this year’s SIGIR 2009 conference describes one experiment that Google used where they assigned search-related tasks to a number of paid participants and timed how long it took those partipants to complete those tasks, using two different search algorithms.
The paper, Evaluating Web Search Using Task Completion Time (pdf), asked users to look for answers to questions such as the following:
Example Task #1:
I once heard a song in the ending credits from a movie about a group of young lawyers or college students from back in the 80’s. Jami Gertz and Kirk Cameron were stars in this movie. I think the song is called “Forever Young” but I want to know what the movie is called and who sings the song.
Example Task #2:
I’m trying to find out what Washington State governor served the shortest term in the past hundred years.
These questions, or tasks, were compiled by asking 150 people to describe difficult tasks finding information online that they had recently attempted to perform. From those descriptions, the researchers came up with 100 tasks which were then assigned to 200 other paid participants. The 200 were randomly split into two groups of 100 each, and one group was assigned Search Algorithm A, and the other group used Search Algorithm B.
The 200 users each were to go through all the tasks, one at a time, and keep searching until they either felt that they had found the answer, or until they reached a point where they believed that a typical searcher would have given up. At the start of each task, they would click on a “start searching button,” and when they completed, they would click on a “finish searching” button.
In addition to recording the time it took each person to complete a task, the searchers were asked to indicate how satisfied they were with their search experience with each task by indicating their level of satisfaction with the following choices:
1) Very Dissatisfied
5) Very Satisfied
If you were to guess that the longer it took someone to complete a task, the less satisfied they were with the experience they had in searching for answers, you would be correct.
We aren’t told much about the differences between Search Algorithm A, or Search Algorithm B in the paper, but we are told that it did take searchers between 1 percent and 17 percent longer to complete the same task on Algorithm B than it did on Algorithm A, which could be considered significant.
Google Speed Experiment
Google also recently reported on an experiment where they introducted slight delays in how long it took search results to load for searchers, to see if those delays influenced how people would react to longer times. the paper is Speed Matters for Google Web Search (pdf).
We’re not told how many people were involved in this experiment, though it seems that the people who participated probably weren’t aware that they were participants. Two groups were chosen – one impacted by delays and a control group, and the experiment seems to have been done in two parts (with different groups of users), each over a 6 week period of time where searchers were subjected to a delay of either 200 milliseconds (ms) or 400 ms, and then a period of 5 weeks after the delay was removed.
Not surprisingly, people who face delays started searching less. People who were subjected to the longer delays (400 ms) continued to search less in the five weeks after the delays were removed.
I’ve heard of other experiments from Google on how they have make slight changes to what they present to users to see what kind of impact those changes might have, such as increasing the font sizes shown to viewers on Google Maps, or moving the display of the Map shown in Google Maps results from one side of the page to another.
Chances are that if you use one of the major search engines its possible that you may have been subjected to one experiment or another at some point, without even knowing it.
Testing and measuring the impact of those tests is something that every web site owner should consider doing.
If you run a site, what have you learned about your web site, and the way that people use it recently?