How a Search Engine Might Handle Singular and Plural Queries

When you’re searching for something on the Web, does it matter whether you use the singular or plural version of a word in your search?

For example, let’s say that you are looking for a new pair of sneakers to go jogging in, and you want to find the right combination of comfort and support, so you decide to look into the best sneakers for running. Does it make a difference in search results when you type in running shoes or running shoe in a search box?

If a search engine just returned results to you based upon your choice of a singular or plural search term, would you get the best results? Should a search engine explore both versions, and try to provide you with a mix of results based upon what it believes are the best results, after looking at results from the singular and plural versions?

A quick look at the top ten results at Yahoo and at Google for both “running shoe” and “running shoes” (both searches without the quotation marks) showed some overlap in pages returned for singular and plural versions at each search engine, but the vast majority of search results seem to focus upon returning results for the plural version of the word, instead of the singular version.

So it does seem like both Yahoo and Google are looking at both singular and plural versions of a query term when someone enters one or the other.

Are the search engines performing queries for both the singular and plural versions, and showing a mix of the most relevant results? Are they adding more weight from one set of results over the other when they present those results? Are they looking at singular and plural versions of all words in a query (running or runnings and shoe or shoes) , or somehow just picking out certain words to look at the plural and non-plural forms of when presenting search results?

I performed searches at Google and Yahoo for these pages with the singular and plural versions of shoe, and also separate searches for those terms with quotation marks around them, which the advanced search pages at Google and Yahoo tell us should return matches for the exact phrase searched for by the search engines.

Using the quotation marks provides an “exact” search result at Google or Yahoo. Without the quotation marks, the search engine returns a “findall” or “find all” set of results The difference between exact and findall search results is that the query terms in a findall search might appear on a page, but not as a phrase (for instance, “I went running for my shoes”). However, it’s still interesting to compare exact and findall results, as well as singular and plurals.

Here are the URLs that I received in my searches for running shoe, “running shoe”, running shoes and “running shoes” at Yahoo and Google.

Yahoo – running shoe,7122,s6-240-400-0-0,00.html,7122,s6-240-319-0-0,00.html

Yahoo – “running shoe”,7122,s6-240-400-0-0,00.html,7154,s6-240-325-329-0-0-0-0-0,00.html

Yahoo – running shoes,7122,s6-240-319-0-0,00.html,7122,s6-240-400-0-0,00.html

Yahoo – “running shoes”,7122,s6-240-319-0-0,00.html,7122,s6-240-400-0-0,00.html

Google – running shoe,7122,s6-240-400-0-0,00.html,7120,s6-240-400–12623-1-1X2X3X4X5X6-6,00.html

Google – “running shoe”,7122,s6-240-400-0-0,00.html

Google – running shoes,7122,s6-240-400-0-0,00.html

Google – “running shoes”,7122,s6-240-400-0-0,00.html

A new patent application from Yahoo explores how a search engine might handle the singular and plural versions of words entered as search queries, and convert those query terms to plural or non-plural forms to provide the most relevant results while also limiting how much computation a search engine has to do to return those results.

Word pluralization handling in query for web search
Invented by Fuchun Peng, Nawaaz Ahmed, Xin Li, and Yumao Lu
Assigned to Yahoo
US Patent Application 20080189262
Published August 7, 2008
Filed: February 1, 2007


Techniques for determining when and how to transform words in a query to its plural or non-plural form in order to provide the most relevant search results while minimizing computational overhead are provided.

A dictionary is generated based upon the words used in a specified number of previous most frequent search queries and comprises lists of transformations from plural to singular and singular to plural.

Unnecessary transformations are removed from the dictionary based upon language modeling. The word to transform is determined by finding the last non-stop re-writable word of the query.

The context of the transformed word is confirmed in the search documents and a version of the query is executed using both the original form of the word and the transformation of the word.

The authors of the patent filing tell us that:

Up to 50% of queries directed to web search engines possess at least one term in the search query that may be transformed either from singular to plural form or plural to singular form.

However, among these 50% of queries, only 25% would benefit from pluralization or de-pluralization.

So, it seems that sometimes providing results that are singular or plural will provide more relevant results for a searcher than if the search engine had just returned results for the version that a searcher entered into a search box.

Determining when and how to transform an original query term to its plural or singular form is important to obtain the most relevant search results with minimal overhead.

1) First, a dictionary is generated, based upon the most frequent previous search queries.

2) Once a query is received from the user, in this example “running shoes”, a determination is made to find the particular word to transform.

3) Finding the head word makes that determination, and in this example, the head word is “shoes”.

4) The selected head word is examined in the dictionary to find the transformed non-plural form of the word. The dictionary may or may not contain the transformation because transformations may be removed if they are found not to be relevant.

5) Finally, a version of the query is created using the transformed word and the original form of the word. To the user, this transformation is not visible and only the original submitted query is observed.

The authors also collaborated in a paper titled Context Sensitive Stemming for Web Search (pdf), and it provides a slightly different look at issues involving pluralization, and other variations of words.


Author: Bill Slawski

Share This Post On