A Taxonomy of Rewriting Search Terms
When I’m looking for information on a topic, I’ll rarely stop at one search regardless of how good or poor the information I find on the topic might be.
I’ll look at some of the results that I receive from my search, and possibly change the words I use in my search based upon what I see in those search results. Sometimes I’ll ignore those results and try out other terms. I might add a word or two to better focus my search, or remove some words to better target what I’m looking for. I might use an advanced search operator, such as a minus sign immediately in front of a word, to try to filter out some results that aren’t relevant to what I’m trying to find.
A couple of researchers from the University of Washington have published a paper to be presented at The 18th ACM Conference on Information and Knowledge Management (CIKM 2009) in November 2009, that takes a close look at how people search on the Web, and how those searchers might reshape and rewrite the query terms they use when trying to find information on a subject.
If you’re a searcher, knowing some of these strategies might help you find information on topics that you might be having troubles finding. If you’re a site owner, having some knowledge about how people search might help you think about how people might find your pages through search engines.
The authors, Jeff Huang and Efthimis N. Efthimiadis, looked closely at query logs from AOL released a couple of years ago, to capture information about search sessions from individual searchers, to come up with classifications on how people might change the words that they use when going from one search to another at a search engine. Those query logs contained records of 36,389,567 queries, and the classification method that the researchers used identified 3,411,706 of those as reformulations of previous queries in the log files.
These classifications are presented as a list of “reformulations” or re-writings of query terms, though it’s not a complete list. For example, the authors tell us in the paper that they didn’t try to differentiate between query terms where searchers may have added or included geographical information in their searches. I also don’t see included within the list attempts by searchers to refine a query by including temporal information – such as adding a “2005” to a search for “world series.” The paper also doesn’t discuss the use of advanced search operators, such as a minus sign to filter out some search results.
We are told in the paper that some of the ways that people re-wrote their queries took place more often when searchers didn’t find much in search results that were helpful to them, and that other ways of reformulating query terms were used after it appeared that they did find useful information during their search.
Here are the classifications that the authors of the paper came up with for reformulations of search queries that they saw happen commonly in the query log data that they studied:
Does the order in which words you type as a query matter for a search? People searching for [seattle pizza palace] might change their query to [pizza seattle palace] to find results that they might not have seen with the first search.
Changing Whitespace and Punctuation
Might changing how whitespace and punctuation in your search show you different results? If you search for [ice-cream new york] or [icecream new york] or [ice cream new york], will you see different search results?
If you type in a three or four word long query, and after looking at the results you see perform the search again after removing one or two words, you might see a broader range of results. For example, if I search for [cincinnati bengals ohio], I might miss out on a good number of results that I would see if I just searched for [cincinnati bengals].
Sometimes search queries can be too broad, and it might be helpful to add a word or more to focus a search better. A search for [virginia mortgage] probably isn’t as focused at [virginia mortgage rates], and would give me too broad a set of results if it was mortgage rates that I was interested in exploring.
Sometimes people type or copy the URL, or web address, of a page into a search box rather than their browser address bar. They may then remove things such as “.com”, “www.”, and “http” from their original query. If you do this on Google these days, it will usually deliver you to the page for the URL that you’ve type in rather than showing you search results. If you want to actually search for the URL, you need to put quotation marks around it.
Stemming means stripping a word down to its roots, For example, a search for [fishing over bridges] might be rewritten as [fish over bridges] or [fish over bridge]
Someone searching for the [National Aeronautics and Space Administration] might decide to use the acronym for the organization (NASA) in their next search.
Someone searching for [NASA] might decide to use expand that acronym for the organization [National Aeronautics and Space Administration] in their followup search.
Where something may be removed to the front or back of a search as a prefix or suffix. For example, the query [is there spyware on my computer] might be reduced to a smaller string such as [is there spyware].
Where something is added to the front or back of a search phrase as a prefix or suffix, such as expanding a query for [nevada police rec] to [nevada police records 2008]
Using or Expanding Abbreviations
Where words within queries may be lengthened or shortened, such as changing a query of [shortened dict] to [short dictionary].
Words within a query might be substituted with semantically related words. Those relationships might be synonyms, hyponyms, hypernyms, meronyms, or holonyms. Synonyms are words that have the same meaning, such as “car” and “automobile.” A hyponym is a word that is a specific instance of an original word (or query term), such as the word “scarlet” instead of “red.” A hypernym is where you have the more narrow term, such as scarlet, and replace it with the broader related term, such as “red.” A meronym is a word that names part of some larger whole, such as “finger” for “hand.” A holonym is a word that names a larger whole, rather than the smaller part, such as “hand” for “finger.”
While this seems evident on its own, the researchers only counted misspellings when the amount of editing those spellings was fairly small.
In addition to the classifications above, the researchers also noted that sometimes searchers will change more than just one of the things listed above at a time, such as adding new words, changing the order of words, and others. Some reformulations of queries can be too difficult for a computer algorithm to capture as well, and may require more context or knowledge of popular culture. They give the example of a query reformulation from [how to calculate nutritional values] to [weight watchers calculator].
The patterns in query reformations that are seen in these classifications might help searchers, site owners, and even search engines to find or to provide better results to searchers.
If you find yourself searching for information about FEMA, you might want to try a followup search for [Federal Emergency Management Agency] to see if you can find some results you otherwise might have missed. Adding words to a query can help better focus a search. Removing words from a query can make an orginal search that might be too narrow become broader and possibly more useful.
If you’re a site owner, trying to use words on your site that you think your visitors will expect to see, or may use to search for and find your site, understanding that searchers may rewrite their query terms in the ways described above may give you some ideas while you’re writing or editing the content for your pages. For example, if I’m writing about NASA, I’m going to make sure that I include the full name of the agency as well as the acronym.
I mentioned above that the classifications above don’t include the addition of geographic terms, or terms that might add some sense of time to a query. I like to use the advanced search operator minus sign in front of some words, to filter out some search results, and that kind of query reformulation also isn’t included. (I’d love to see a study from one of the major search engines on how often people use advanced search operators in their searches, such as a minus sign or quotations around a phrase.)
What strategies do you use when you search that might not have been included in the classifications above?