When you’re searching for something on the Web, does it matter whether you use the singular or plural version of a word in your search?
For example, let’s say that you are looking for a new pair of sneakers to go jogging in, and you want to find the right combination of comfort and support, so you decide to look into the best sneakers for running. Does it make a difference in search results when you type in running shoes or running shoe in a search box?
If a search engine just returned results to you based upon your choice of a singular or plural queries, would you get the best results? Should a search engine explore both versions, and try to provide you with a mix of results based upon what it believes are the best results, after looking at results from the singular and plural queries?
A quick look at the top ten results at Yahoo and Google for both “running shoe” and “running shoes” (both searches without the quotation marks) showed some overlap in pages returned for singular and plural versions at each search engine, but the vast majority of search results seem to focus upon returning results for the plural version of the word, instead of the singular version.
So it does seem like both Yahoo and Google are looking at both singular and plural queries when someone enters one or the other.
Are the search engines performing queries for both the singular and plural versions, and showing a mix of the most relevant results? Are they adding more weight from one set of results over the other when they present those results? Are they looking at singular and plural versions of all words in a query (running or runnings and shoe or shoes), or somehow just picking out certain words to look at the plural and non-plural forms of when presenting search results?
I performed searches at Google and Yahoo for these pages with the singular and plural versions of shoe, and also separate searches for those terms with quotation marks around them, which the advanced search pages at Google and Yahoo tell us should return matches for the exact phrase searched for by the search engines.
Using the quotation marks provides an “exact” search result at Google or Yahoo. Without the quotation marks, the search engine returns a “findall” or “find all” set of results The difference between exact and findall search results is that the query terms in a findall search might appear on a page, but not as a phrase (for instance, “I went running for my shoes”). However, it’s still interesting to compare exact and findall results, as well as singular and plurals.
Here are the URLs that I received in my searches for running shoe, “running shoe”, running shoes, and “running shoes” at Yahoo and Google.
Yahoo – running shoe
Yahoo – “running shoe”
Yahoo – running shoes
Yahoo – “running shoes”
Google – running shoe
Google – “running shoe”
Google – running shoes
Google – “running shoes”
A new patent application from Yahoo explores how a search engine might handle the singular and plural queries, and convert those query terms to plural or non-plural forms to provide the most relevant results while also limiting how much computation a search engine has to do to return those results.
Word pluralization handling in query for web search
Invented by Fuchun Peng, Nawaaz Ahmed, Xin Li, and Yumao Lu
Assigned to Yahoo
US Patent Application 20080189262
Published August 7, 2008
Filed: February 1, 2007
Techniques for determining when and how to transform words in a query to its plural or non-plural form to provide the most relevant search results while minimizing computational overhead are provided.
A dictionary is generated based upon the words used in a specified number of previous most frequent search queries and comprises lists of transformations from plural queries to singular queries and singular to plural.
Unnecessary transformations are removed from the dictionary based upon language modeling. The word to transform is determined by finding the last non-stop re-writable word of the query.
The context of the transformed word is confirmed in the search documents and a version of the query is executed using both the original form of the word and the transformation of the word.
The authors of the patent filing tell us that:
Up to 50% of queries directed to web search engines possess at least one term in the search query that may be transformed either from singular to plural form or plural to singular form.
However, among this 50 % of queries, only 25% would benefit from pluralization or de-pluralization.
So, it seems that sometimes providing results that are singular or plural will provide more relevant results for a searcher than if the search engine had just returned results for the version that a searcher entered into a search box.
Determining when and how to transform an original query term to it’s plural or singular form is important to obtain the most relevant search results with minimal overhead.
1) First, a dictionary is generated, based upon the most frequent previous search queries.
2) Once a query is received from the user, in this example “running shoes”, a determination is made to find the particular word to transform.
3) Finding the headword makes that determination, and in this example, the headword is “shoes”.
4) The selected headword is examined in the dictionary to find the transformed non-plural form of the word. The dictionary may or may not contain the transformation because transformations may be removed if they are found not to be relevant.
5) Finally, a version of the query is created using the transformed word and the original form of the word. To the user, this transformation is not visible and only the original submitted query is observed.
The authors also collaborated on a paper titled Context-Sensitive Stemming for Web Search (pdf), and it provides a slightly different look at issues involving pluralization, and other variations of words.
48 thoughts on “How a Search Engine Might Handle Singular and Plural Queries”
I think that the natural singular/plural version may come into play in this instance. Shoe and shoes are very difficult to diagnose because, very rarely, do you purchase or hear of a single shoe, more likely that someone would be referring to a pair, so the singular/plural version results for such a query might be (or possibly should be) weighted more on the action of the user after the query results are visible.
However, another singular/plural that might be easier to diagnose might be a plural version where the root word is adjusted when plural, like company and companies or index and indices. When these plural versions are used in queries, they more readily show the user’s intent. Company could be used in a search within a categorized data set, whereas companies could imply that the user is looking for a list of specific companies within a categorized data set.
From my personal experience, I have seen that the blending of singular and plural results appear to be returned on less competitive search phrases. When you submit a heavily fought after term, your results are likely to be different.
Thanks for the research, insight and the great find from Yahoo!
Thanks, Mr spoton
It is interesting that if you write something with a singular version of a word in a phrase in mind, and the search engine decides that the version with the plural usage of the word is more likely what a searcher wants to see, then it may include results with the plural version ahead of the singular version, even if the search query used the singular version.
Or, the same could happen where you use a singular version of a word in a phrase, and the search engine decides that results with the plural are better.
How much do those estimates or numbers of results for a query term tell us, when the search engine might show plural results for a singular search, or singular results for a plural version of the search?
You’re welcome, and thanks for sharing your observations and thoughts on singulars, plurals, and the blending of search results.
I chose the shoe/shoes example because it was used in the patent filing as an example. You’re right that using a word that changed more in its singular and plural forms may have been easier to diagnose, and see blending in action.
I thought that it was important to show that the simple addition or subtraction of an “s” in a query at a search engine could mean that you might receive results that were blended instead of based upon the version of the keyword that you decided to use in your search. Where the singular or plural version might more clearly indicate a searcher’s intent, I would hope that there would be less blending of results.
Interesting stuff though – compare results on a search for “matrix” and “matrices.” A search for matrix pulls up a lot of results for the movie series. A search for “matrices” shows a top result being the wikipedia artice on “Matrix.” Is it the most relevant result for “matrices,” or a blended result? It’s difficult to determine, but it’s helpful to know that a search engine might be performing that kind of blending.
I have looked into this quite a bit and your findings are spot-on. As an SEO it usually makes sense to optimize for the plurals as you should be able to get both the single and plural phrases to rank well. Typically, plural phrases will be much less competitive. For example, â€œcancer treatmentsâ€ has 2,520,000 search results as apposed to 27,400,000 for â€œcancer treatment.â€
I seem to be spending a lot of time lurking around this interesting site 🙂
Stemming has always been useful basically because of easier pattern matching, and a smaller index. And you know what? This crude approach works well for IR systems overall. There are exceptions to the stemming rules, like “scissors” for example which shouldn’t be stemmed. But low and behold….Google picked up on the band “the scissor sisters” (good work). “Trousers” however seem to have been left alone (good too).
This reminds me of the “meaningful stopwords” post because it’s the same idea. What to leave in and what to leave out. There was a time when everyone stemmed stuff because that’s the way it worked, same way everyone strips stopwords. Crude…but eficient. Maybe we’re being forced into a little more sophistication these days and rightly so.
I had to write a stemmer that stemmed to the correct spelling of the word, because it wasn’t an IR system but a conversational system. The way I built it, it couldn’t fomulate an answer properly because well it had to be used as a full word in a response. Although for the IR side the old way worked fine.
I like your “matrix” example. With Matrix you get basically: Film-Film-Maths and with “Matrices” you get Maths-Maths-Maths-Maths….. Though notice that the wikipedia result with “Matrix” in the title comes up before the wikipedia result with “Matrice” in the title if you search for “Matrice”. And that the 2nd result for “matrice” is actually an article about a town in Italty (the first being- Maths).
This is going to work nicely in a personalised system though no? Anything that can determine context to your query is going to help. The keywords alone, as a unit of meaning, are no longer the most important thing, because they give way to a whole load of other concepts which are far more important, units spanning entire constructions even.
Google have very fortunate (I find the harder you work the luckier you get) to have a large large large database and many many many users to collect data from and to test with. They can use pattern matching techniques to pre-empt what the user may be looking for which would influnce results.
I maintain that there would be a lot less of this kind of hassle if we simply used natural language systems because of the additional content, however they come with a whole other set of problems.
Thanks for your thoughtful comment. I had the “meaningful stopwords” post in mind when I was reading this patent, and saw what I thought were some similarities, too.
Determining the context of a query does seem like it’s more important than a strict matching of keywords. It’s interesting to see some of the approaches that are being taken.
Important information to know. Thanks
I always use singular to search, and always make an analysis of what are people searching more in my keywords.
Love your posts 😀 very interesting
Interesting post on a subject that has interested me for some time. I agree with the comment that natural language systems would be interesting to investigate –
Another great post Bill. I’ve always found the different result set to be fairly amusing especially when it comes to plurals where the spelling varies even more such as company vs companies. Most people target the singular, where the plural drives more traffic however fitting the word companies can end up being difficult when talking about an individual company without making the content look spammy.
Thanks for the analysis. I’ve always been curious about this issue.
In the past, I’ve taken Mr. spoton’s approach and gone for the plural when optimizing web pages’ title tags for the same reasons he cited in his comment above. Then to cover the bases, I’ve included the singular version of the word somewhere within the text content of the page.
If Google and Yahoo! haven’t already integrated searcher behavior into the mix, they will be soon. The search engines should be able to match the user’s query with the links they follow in the SERPs and use the data to better predict the majority of users’ “intent” when entering similar phrases.
Once they solve that, they need to address the issue of listing results for “Eric Cunningham” in the SERPs when I type in: Erik Cunningham.
Hi Daryl (SEO Canada),
I have seen writing get a little twisted and convoluted when someone tries to write content for a web page using a singular version where a plural version might fit better, or a plural version where a single version is more appropriate.
Looking at the differences between singular and plural search results can be pretty interesting and kind of fun. I think I’ve seen some of the same thing with some amusing looking results when the spellings of the singular and plural versions vary more than just the addition or subtraction of an “s.”
Thank you for your comment, and for a couple of interesting side issues.
I do think that searcher behavior is playing more and more of a role in how search engines consider and respond to the queries that they see people enter into a search box.
That kind of feedback does have the potential to provide context and information about intent to a search engine, but with caveats – not every searcher entering a term into a search box is looking for the same thing as other searchers, even if they have had a history of sharing similar interests and choosing similar search results with others.
It does look like Google is blending the “Erik Cunningham” results with the “Eric Cunningham” results. At some point, you may see that change, especially if a lot more “Erik Cunningham” results are indexed by the search engine.
Hi Happy Camper
We don’t know how much the researchers at Google might be doing with natural language processing, but they definitely are doing something with it. Google’s Director of Research, Peter Norvig, stated earlier this year in an interview:
Thank you. I’m not sure that I can say with any certainty that I search in only singular or plural. I know that I rarely will search both singular and plural versions of words in my search phrases. When doing research on words to use when writing, I do try to look at both singular and plural versions of words when appropriate.
I have always been interested in finding out just what the plural does do to results and these have really helped – I too have noticed differences but also many overlapping sites which show that plural keyword targeting will work better than singular.
Assuming this statement is true….” However, among these 50% of queries, only 25% would benefit from pluralization or de-pluralization. ”
I think that not treating variations differently would be a mistake by the search engines.
That statistic is also repeated in the paper that I linked to at the end of the post, located at:
It appears to be from a log of 25 million queries, so that seems to be a meaningful amount of results to draw from.
I agree – the search engines ignoring different variations would be a mistake.
Thanks. It is an interesting topic. It’s funny how the context of a word can change considerably from singular version to plural, or based upon past, present, or future tenses, or in other ways that a word might be transformed.
I think it helps actually looking at the search results for each word, to see what kinds of pages and meanings behind the pages appear, before just choosing to target the singular or plural version of a word in what you write and create.
This is a quandary that we encounter all the time as an seo services company. Which form to you go after from an optimization standpoint when you are building targeted anchor text within backlinks for SEO. With high variations in word types, this is critical to get right!
Honesty, I think that variation is as much a component to SERP penetration then plural planning. Using “Customized Wedding Invitations” and “Elegant Wedding Invitations” as much as “Wedding Invitations” as backlinks.
Conducting the research within the sector, and targeting long tail phrase themes, is much better in my mind then just the singular and plural versions of your main target
Thank you. Some very good points.
Paying attention to long tail phrases in the right circumstances can be extremely helpful. I agree completely. Another area that can supply some odd blending of search results are compound words, that might be written as one or two words (icecream vs ice cream, for instance).
This is quite an interesting discussion. But of late we are hearing some concepts such as semantic search. can someone explain how this plural issue fits into a semantic search. What happens to plurals that doesnt end with an s to a singular. I was made to believe that sematic search is something that is something more than plain vanilla algorithms. If so does this plural issue is of any relavance?. Sorry for my ignorance.
Thanks. I think asking about semantic search and plurals is a really good question.
There’s a nice article about semantic search here:
Instead of matching words in queries with pages on the Web, a semantic search attempts to match the meaning of a query with pages where the concepts behind the query are discussed.
The singular or plural versions of a word may have meanings associated with each which are very different from the other.
It is also interesting to consider what happens with “putative plurals”. Words that change sense when you obey normal pluralisation rules. I’ve got a few examples:
* car cover ( insurance cover) versus car covers (weatherproofing)
* golf club (social environment) versus golf clubs (instruments of evil or divine revenge)
Optimising for each form is important, and stemming would cause mismatched intent problems…
I note that AdWords matching rules, using Broad Match, seem somewhat weaker. The latest announcement about Quality Score changes may affect that.
Thanks very much for bringing in the paid search perspective. It’s going to be interesting to see what impact the quality score changes might bring.
Understanding different word senses, like you might find at Word Net are worth keeping in mind when creating content for web pages. The meaning of a word or phrase can sometimes change substantially when you make one or more of the words plural, as your example shows so well.
Making sense of singulars and plurals as query terms involves a lot more than just adding, changing or removing a few letters from a word. That is the kind of challenge that really makes this all interesting, though.
There are both pros and cons to those algos
Some plurals will turn up more directory listings than their singular counterparts.
This may be information overload for some searchers.
So, for example, doing a search for a service for and adding Services, Firms, Companies etc to the query, may lead the searcher to a listings directory which would be more time consuming than the organic SERPs to view.
One saving grace is that if a city or state is added to the search term – it may bring up the local listing before the SERPS.
It is a tough call, since many searchers searching in the singular, do in fact mean plural
Hi New York,
Those are excellent points. The difficulty is that the search engines are trying to identify context of a search, and a searcher’s intent from a very small number of words in a query.
They can try to do that by looking at data involving previous searches and search sessions for the same words, and seeing how people might have refined their searches, or selected pages from search results, whether or not they stayed at those pages selected, and for how long, and other user behavior information. The search engine can try to blend together singular and plural versions of words from a query in the search results that they show, and change the results shown based upon what pages people actually select, and seem to spend time viewing and using.
A search for plurals of terms like services, firms, and companies does seem like it might evidence a desire on the part of a searcher to see a site that provides a the ability to see a number of choices – but it’s difficult to just make an assumption like that – it’s better to test, and see if that is what people actually select.
I suppose it depends on what your searching for, whether it will be done in the singular or plural form.
But I have to say that having results for both shown would be a the best.
Who searches for both the singular and plural?
I’m not sure that it’s always easy to guess as a searcher which version might provide the better results. Having the search engine blend them, when it seems appropriate, might be a good idea.
It’s not unusual for people to change their query terms when they are searching and are disatisfied with the search results that they see, but like you, I don’t know if they would consider changing a singular to a plural in their search phrases.
IME, from both PPC and web analytics perspectives, searchers tend to be conservative about search terms. One thing that paid search excels at is seeing impressions. That means we can track repeat use of low frequency search queries. A common pattern is to have a keyword that attracts few impressions – averaging less than one per day. When it does gets impressions, they tend to batch. Typically up to eight searches, in a period of less than an hour, and then it all goes quiet for days or even weeks. Google/Yahoo/etc don’t pass on the stats so I can’t be absolutely 100% positive that this is all the same person. But it is suggestive.
Additionally, web server log file analysis for some clients has shown that some searchers use the exact same search string over a period of hours, returning to the site again and again. The record being one person who used a $9/click advert to revisit the clients’ site seven times over five hours, before deciding to make a web inquiry, all using exactly the same search. We’ve picked up similar patterns in organic – but there was real interest in finding out whether this behaviour was a symptom of PPC wickedness (click fraud) 😉
Oh, and one other thing that paid search can do – you can tell whether it was a fresh search, or a page that the user keeps returning to. Google’s advert can be given a unique ID for each impression – the “gclid”. So you can see whether it is the same advert, or a new impression. 😉 In the “extreme repeat” sequences we investigated, users were performing searches again, rather than reusing existing results pages.
In the context of this thread, it shows that users appear less likely to switch between singular and plural forms *intentionally*. If they do switch, it is probably because they forgot what they originally searched for. So search engine query stemming is probably useful.
Some very interesting experiences and observations. Thanks for sharing them. The intent behind finding out why some people will perform the same search and return to a page can be hard to gauge. I found this Microsoft paper interesting, which explains some reasons why people might revisit pages:
Large Scale Analysis of Web Revisitation Patterns
Some of the examples they cite early on in the paper on why people might return to pages, either over a short period of time, or over longer time periods:
1. To monitor changing content contained on a page
2. To use a site as a hub to navigate to linked pages
3. To check for sales at shopping sites
4. To find a new product, such as a book or DVD/CD
They go into more detail on different patterns for revisiting pages, and provide some interesting results.
The study covered five weeks of Web interaction collected from user logs taken from the Microsoft toolbar, with the permission of the people browsing pages, and it included 612,000 users. I wonder if they they have considered looking at search behavior and advertising traffic in the context of revisits, with attention paid to the queries used over that time.
This is something I’ve dreamed of for a long time, many other people too I am sure. The problem is that the relationship of singular/plural varies widely by keyword. For instance, “digital cameras” might be more likely to imply a search for retailers, vs. “digital camera” which might capture that but probably skew a bit to informational resources, reviews, etc. as well.
Ditto something like “office supplies” which is pretty obvious, but “office supply” might imply companies specialized in office supply, and reflect a different intent.
Of course, personalized search could probably make a dent in all this. Surely a new day is coming.
Given the observations for exact and findall in the original article, and the way that Google uses Broad match (versus the way that Yahoo uses Advanced Match), I think this patent may have more application to paid search than to organic.
Part of what sustains the value of paid search on Google, other than the sheer mass of advertisers, is that Broad Match is the default. It brings more advertisers to the auction if you can stretch to more search queries for each broad matched keyword. Yahoo certainly used to do stemming (the car cover/car covers problem was developed as a publicly accessible test for a client observed problem where I didn’t want to give away the details, before Panama) – but I haven’t spent much time investigating other pluralization inclusion and exclusion mechanisms on Yahoo. Since Yahoo took some time to convert to the generalized second price auction, this patent may mark the next refinement; realization that the new auction requires new revenue maximization techniques, such as recruiting more advertisers to each auction. Effective pluralisation techniques do that.
It is interesting how making a word singular or plural may indicate that a query might be more informational or transactional in nature. Intent is difficult to gauge on the basis of a few words. Will personalization make a difference? It might, but it’s making a future prediction on the basis of past actions, which may not be helpful in many cases.
The patent itself is geared towards web search, but the topic is one that is just as important to paid search.
The exact and findall language is something that I added to the post to get readers to think about ways to look at and think about how a search engine works.
It definitely does hold some interesting implications for paid search.
With your info in mind, it still seems to me that you might be better off focusing the search engine optimization of your site on singular terms (assuming that based on good keyword research they are searched for more frequently) so that you are highy ranked by the search engines for singular terms, and allow their blending techniques to take care of getting you in front of people who use the plural keywords. I would think that optimizing your site for both singular and plural keywords might water down your optimization for both sets, thereby weakening your overall optimization. Of course, for non-competitive search terms, it might not matter, but for terms like “plastic surgeon” vs “plastic surgeons”, couldn’t it make or break you? I guess, ultimately, the jury’s still out on this one.
Hi El Juano,
Thanks for your thoughts on this topic. Seeing search engines blend results like this means that we are moving away from a strict matching of keywords in search results for the terms in queries.
I’d definitely recommend looking at the search results that show up for both singular and plural versions of terms to see how differently a search engine might be treating the results for both, and to see how much blending might be going on, as well as trying to understand which version people might actually be searching for.
I suspect that Google’s new query suggestions will override almost everything that has been said here.
Do you all think that google suggest will change the face of SEO and how we go about doing SEO?
Good question. While this patent filing is from Yahoo, the idea that Google might present results that could blend together singular and plural versions of words is something that we can’t rule out, even if they do it in Google Suggest.
The possibility does exist that Google might perform a query expansion before returning results. I wrote a brief post last year about how Google might perform multi-staged query processing which might have them return results for variations of terms based upon their stems, as well as terms that might be synonyms and other terms that might commonly appear with the original query terms on many pages on the Web. We don’t know if the patent filing behind that kind of mult-stage query processing is something Google is presently using, but it is a possibility.
The predictive search results that we see in dropdowns under the search box have been around for at least a couple of years, and have been used and tested in the Google toolbar. A Google patent application from 2005 details how some of the terms that appear in the dropdowns end up there. I do think that it’s worth looking at and paying attention to what appears in that dropdown list as you start typing in a query term, and it has the potential to bring change to the way that SEO is performed.
Are returned results based upon the statistic that the users input a singular or plural search term?
The results that are shown would be based upon a statistical language model. Take a look at the document that I linked to at the end of my post – “Context Sensitive Stemming for Web Search.” It explains how that language model might work in much more depth.
It has been a while since I’ve visited this side. I generally find that a plural that simply adds an “s” to the word being entered into the search engine (primarily Google in my instance – they really dominate the South African market, to the tune of 95%+ !!!!) will return similar(ish) results, whilst plurals that significantly change the spelling of the main keyword being searched for will progressively return less relevant results.
That’s my ten cents worth from the African wilderness!
Catch you later!
Good to see you.
That’s an interesting observation.
I can understand how some plurals could, and possibly should change the results that you see. For instance, if you are searching for a “lawyer,” then the results should show individual lawyer pages. If you search for “lawyers,” then I wouldn’t be surprised if you see more legal directory type results. But that an example where just an “s” has been added. I’ll have to do some testing of what you’ve seen.
Excellent post. When using google adwords keyword tool I’ve found that the words they pluralize have the same search statistics (versions with and without “s”). Words google does not pluralize have different search statistics.
Thank you. That’s a very interesting observation. I appreciate your sharing it. It’s definitely something I want to check out.
Mr. Slawski I thought this old post needs some love so here are my 2 cents.
A point that escapes many people is cultural bias. People from different cultures do their search differently. Some of my colleagues were from South East Asian countries like China, Thailand, Malaysia and Indonesia. I was teaching them basic on-page SEO and when it would come to entering keywords, they would enter them as singular. For example, “motherboard with ISA slot” whereas I would be expecting them to enter “motherboards with ISA slots”. First I thought maybe it was an anomaly with one guy but it was the same with all of them. Their thought process worked in their language first and produced singular results. They simply translated it into English as singular. Over time, I started to understand and respect that.
It opened up a whole new issue though as to how our target audience is going to conduct it’s search. If we are selling merchandise to South East Asian communities, our pages should be heavy on singular keywords. If we are targeting people whose first language is English, then we better shift our weight to plural.
Those are good points, and it is important to understand how cultural bias might impact both search and web design. There’s a great case study called Metaphors and Website Design: A Cross-Cultural Case Study of the Tide.com Stain Detective (pdf) that gets that point across very well. Tide decided that they should try a metaphor used in a very popular US version of their website in India. The website showed different rooms of a house, and the typical stains associated with that room. So, the stains from a kitchen might be very different than the stains from a nursery or a home office. From the paper:
As for singular and plural terms, Google will often include results for both in a query that includes one version or the other, when it thinks that it is appropriate. I would guess that they might not take the cultural difference that you point out into consideration though, and I wouldn’t rely upon them to do that.
Comments are closed.