Last Wednesday, I pointed to a paper jointly written by researchers from Stanford and Cornell on How Task Types and Gender May Influence How People Google. That paper looked at how a small group of people interacted with the search engine. Collecting data about how people search on a larger scale can be problematic.
We saw that with the controversy over a release of user queries from AOL a couple of months ago, which raised many concerns over the privacy of the people who submitted those searches. A few reports about the release of that data noted that user query data from some other search engines had been shared with researchers on a smaller scale in the past, including data from Excite.
One of the researchers from that paper, Bing Pan, co-authored a paper for a presentation this year in front of the The Travel and Tourism Research Association. This research examined user query data from the Excite search engine from searches conducted in the year 2001.
The paper is Real Users, Real Trips, and Real Queries: An Analysis of Destination Search on a Search Engine (pdf), and it comes up with some interesting information and conclusions.
The authors of the document manually identified, from a random sampling of a percentage of Excite queries, those involving travel, looking in total at “539 unique user sessions related to travel search, and collectively comprising 1,788 unique queries.” They had information from a previous study(pdf) (Spink, et. al.) involving the Excite queries, to compare to their travel related query sessions and queries.
Here are some of their findings:
1. Travel searchers used an average of 3 terms per query when searching for destination related information, as opposed to 2.6 terms from the Spink research on general queries.
2. The average search query session involved 3 queries, as opposed to an average of 2.3 for general queries in the Spink data.
3. The travel searchers clicked through an average of 2.1 pages per query, as opposed to 1.7 reported in the Spink research.
4. The content of queries was looked at carefully, and segmented into different types of information that searchers might be interested in – “six aspects of trip planning: destinations; hotels; restaurants; transportation; attractions; and activities. ” A flowchart and table shows percentages for those different types of content.
5. The query content was also looked at based upon levels of generalization, so that a state name would be considered more general than a city name, and the word “restaurant” would be more general than a specific name of a restaurant or a restaurant chain. One of the conclusions reached from this analysis was that the name of a city was the most commonly searched term, noted in 46.4% of searches.
6. Query sesssions were reviewed to see how travel searchers might change what they search for within a single query. A common approach appears to be that they “frequently switch their search to seek information about multiple levels of geographical areas.”
The authors note in their conclusion that they would like to use more current data, from a more sophisticated search engine like Google.
Related: following a link from a footnote in this paper, I arrived at a page which examined some query data from Altavista collected in from 1998 to 2002 – A Temporal Comparison of AltaVista Web Searching, which compared changes in queries submitted to Altavista over time. One of the researchers involved was Jan Pedersen, presently with Yahoo. Some interesting data in that paper, too. For example, they tell us that the percentage of three-term queries increased from nearly 28% in 1998 to 49% in 2002.