One of the short posters at the recent WWW 2007 Conference in Banff, Alberta, Canada, provides an indepth look at classifications of search queries after sampling more than 5 million queries, taken from transaction logs from three different search engines.
They use that data to come up with a classification algorithm, which was then used on a “separate Web search engine transaction log of over a million queries submitted by several hundred thousand users.” The results are interesting.
The article is Determining the User Intent of Web Search Engine Queries, from Bernard J. Jansen and Danielle L. Booth of Pennsylvania State University, and Amanda Spink of the Queensland University of Technology.
Their findings indicated that approximately 80 percent of the queries classified were informational in nature, with the remaining queries being split almost equally between navigational and transactional queries.
As a followup, they manually coded 400 more queries to compare to those results, and note that their accuracy in classification was about 74 percent. They tell us that within the remainding queries, “the user intent is generally vague or multi-faceted, pointing to the need to for probabilistic classification.”
As part of this process, they defined characteristics for the different types of queries: informational, transactional, and navigational. For example, here are a few of the characteristics that they noticed for informational queries:
- Uses question words (i.e., “ways to,” “how to,” “what is”, etc.)
- Queries containing informational terms (e.g., list, playlist, etc.)
- Queries where the searcher viewed multiple results pages
The “separate Web search engine transaction log” that they reviewed was from Dogpile, and they point to another longer paper that describes the study of that transaction log, which goes beyond identifying classifications for search queries. The cited paper is:
Jansen, B. J., Spink, A., Blakely, C. and Koshman, S.
forthcoming. Web Searcher Interaction with the Dogpile.com Meta-Search Engine. (pdf) Journal of the American Society for Information Science and Technology.
They compare the results of this study of Dogpile queries to studies of non-meta search engines. Some interesting statistics from that study, which are shown in a table within the paper. Here’s a glimpse at some of them:
1 query – 288,231 – 53.9%
2 queries – 88,875 – 16.6%
3 queries – 157,401 – 29.4%
Results Pages Viewed Per Query
1 page – 1,052,554 – 69.07%
2 pages – 253,718 – 16.6%
3 pages – 217,521 – 14.2%
The rest are worth a close look.
Understanding user intent during a search can be an important aspect of delivering relevant results to searchers. The percentage of informational search queries from this report is higher than in previous studies I’ve seen on the subject. We aren’t told if that is because the logs used were from a metasearch engine or not, but it’s still an result worth considering.
Other papers cited as references in the WWW 2007 document:
- Baeza-Yates, R., Calderon-Benavides, L. and Gonzalez-Caro, C. 2006. The Intention Behind Web Queries. In Proceedings of String Processing and Information Retrieval (Spire 2006). Glasgow, Scotland, 98-109.
- Broder, A. 2002. A Taxonomy of Web Search. (pdf) SIGIR Forum. 36, 2, 3-10.
- Jansen, B. J. and Spink, A. 2005. How are we searching the World Wide Web? A comparison of nine search engine transaction logs. (pdf) Information Processing & Management. 42, 1, 248-263.
- Jansen, B. J., Spink, A., and Saracevic, T. 2000. Real life, real users, and real needs: A study and analysis of user queries on the web. Information Processing and Management. 36(2), 207-227.
- Lee, U., Liu, Z. and Cho, J. 2005. Automatic Identification of User Goals in Web Search. (pdf) In Proceedings of The World Wide Web Conference. Chiba, Japan, 391-401.