Some words you might search for at a search engine may have more than one meaning and are known as polysemous words. For example, the word fencing can mean a sport involving swords, an artificial barrier enclosing an area, or activity to profit from illegally gained goods. In addition, words or phrases that can have two or more are sometimes called polysemous words.
Polysemous words can pose challenges for:
- Search engines – trying to identify the intent behind searches.
- Searchers – seeing results unrelated to what they were trying to find.
- Site owners – finding their pages in search results surrounded by sites offering something very different from what they offer
- Advertisers – who may bid on certain words or phrases as sponsored results for searchers who may have absolutely no interest in those ads
If someone enters the word [fencing] into a search engine, the search results they see will likely be filled with pages related to all of the different meanings of the word such as electric fences, local search maps for fencing companies, Olympic moments relived at the United States Fencing Association web site, the Wikipedia entry on Fence (criminal), and others.
The chances are that the person searching was only interested in finding information about the sport, the barrier, or criminal activity.
How would you solve this problem if you were creating the algorithms behind how a search engine worked?
The Problem with Query Refinement Suggestions
One solution might be for a search engine to show query suggestions along with search results, adding more words to the original query based upon previous queries from other searchers.
A search engine could look at its query logs to see how often searchers made changes to their queries and associate those follow-up queries with the original ones. For example, many searchers, faced with a mix of results like those for fencing, might change their original query to include an additional word or words to their search that might make it more likely that more of the search results will be relevant intended to find.
So, someone searching for information about the sport of fencing might add a word like “epee” to the search. Epee is a kind of sword that a fencer would use when they compete against someone else. If enough people refine their searches that way, then a search engine might start showing [fencing epee] as a link within search results for a search for [fencing].
So, what’s the problem? It could take months for a search engine to collect enough data to decide which additional query refinements to show. And it’s possible that a search engine might only show query refinements for one type of meaning for a word. For example, the suggested query refinements presently at Google on a search for [fencing] are all related to the barrier meaning except for the first one listed:
- history of fencing
- wire fencing
- aluminum fencing
- wood fence
- yard fence
- privacy fence
- chain link fence
- building a fence
Using Related Words to Understand Meaning
Another approach might be to look at documents on the Web where polysemous words appear and look at the other words near them to find words or phrases related to the different meanings of those polysemous words.
For example, one meaning of the word “Saturn” is the name of a planet, and it may be possible for a search engine to figure out that is how the word is being used if it appears near words such as “Earth,” and “Mars,” and “Jupiter” and “Solar System.”
Another meaning for the word “Saturn” is the make of a car, and a search engine might glean that meaning if it sees words such as “Ford,” or “Lincoln,” or “Mercury,” or “automobile,” or “engine.”
Distinguishing between the different contexts with more than one meaning is a significant undertaking because a search for Saturn, the car, is a different search than one for Saturn, the planet.
A patent application from Yahoo describes a process that could be used to determine if a word or phrase has more than one meaning based upon the words that tend to co-occur in documents with that word.
Method and Apparatus for Discovering and Classifying Polysemous Word Instances in Web Documents
Invented by Richard Michael King
US Patent Application 20090157648
Assigned to Yahoo
Published June 18, 2009
Filed December 14, 2007
Abstract
A method and apparatus for discovering polysemous words and classifying polysemous words found in web documents. All document corpora in any natural language have multiple usage contexts or words with multiple meanings.
Semantic analysis is not feasible for classifying all word occurrences in all documents on the web, containing trillions of words in total. Besides, semantic analysis typically cannot distinguish multiple usages of a given meaning of a given word.
In one embodiment of this invention, polysemous words in natural languages can be discovered by analyzing the co-occurrence of other words with the polysemous word in web documents. In one embodiment, the multiple meanings and usages of a polysemous word can be determined by analyzing the co-occurrences of other words with the polysemous word. In one embodiment, counting over-correlations is achieved probabilistically to minimize the use of network bandwidth.
The patent filing provides details on how they might count the frequency of words that appear in different documents on the Web, how often those words appear with other words, and how infrequently they may appear.
There are some interesting twists to this process. One of them involves breaking down the words found within documents into smaller blocks. An example given in the patent application is of blocks of 200 words. There are a few reasons for breaking down a document into smaller, overlapping blocks.
Some pages may be very long, and relating words found on those long pages may create relationships between words that aren’t necessarily related.
Another is that a page from a blog may contain many different entries that aren’t related, and if a whole page of different blog entries was analyzed, that might provide results that aren’t very useful.
Conclusion
I’ve kept my look at the process behind the patent to a fairly simple overview, and the patent application goes into much more depth on how it might distinguish between different meanings of words based upon how frequently those words appear near related words on pages upon pages upon pages the Web.
Searchers, web publishers, and advertisers should keep in mind that there are many polysemous words, words with more than one meaning when they perform searches or create content for web pages or choose keywords to advertise with.
It can be helpful to think about words that might be related to those multi-meaning or polysemous words when searching or writing content for pages or using a keyword-based advertising system. Yahoo shows us with this patent filing that they may start taking advantage of such relationships between words.
Last updated June 6, 2019
I’m curious to see if a single search term DOES evolve into a “Did you mean?” with a list of search phrases appropriate for the polysemous meanings. It’s too difficult and even irrelevant to rank for just fencing. But if all of the fencing traffic related to the sport is altered by the search engine to “sword fencing,” does that mean that the top ranking site for the term “sword fencing” stands to gain a significant boost in traffic?
The rich will end up getting richer, so to speak, if the short tail search results are compounded by the root polysemous term. Does that make any sense?
Hi jlbraaten,
Some very good questions. The patent filing described how the search engine might identify different meanings of polysemous words and phrases, but didn’t take the next step and describe how that understanding might be used to present search results to searchers who might be interested in the different meanings.
Like you, I don’t think that the ideal solution is to present a set of “did you mean” query refinements that boost results by adding related keywords, but I wouldn’t mind seeing results that may still be limited to the original query term, but focusing upon the different meanings.
Another way to show results for words with different meanings might be to segment the results, with each segment providing a broad category, a couple of example results, and a link to “more results like this.” So a search for fencing might show a sport category segment, a building materials segment, and a criminal acts segment.
Providing links to different categories across the tops of search results for polysemous words might be another approach.
If Yahoo did develop this process, just what would be the ideal way for them to use it in their search results?
Oddly I just found myself searching for ‘fencing’ on Google. Looking for some nice pre-made stuff and the first result was for the ‘waving pointy sticks’ around type and then found this site for an utterly unrelated reason a few minutes after.
Understanding the different contexts of a word in a document is achievable, but understanding the intent of the searcher is another matter – in any case, surely this is already happening (at least on Google) where an polysemous (good word!) term is used with no other context and the SERP is subdivided by the different meanings (although I can’t find a query that shows that right now)
Dealing with polysemy is always a difficult question. In professional patent searching, this issue is often resolved when skilled searchers limit results with a patent classification, or search in a database with a controlled vocabulary index. Of course, that’s just too much to ask of a casual search engine user.
It’s too bad that website owners can’t use some kind of widely used industry classification code, such as NAICS (although it’s limited to North America!) to tell search engines what subject area they deal with. Either that or they could specify codes which would be appropirate for advertisers to have. That might help reduce the number of unrelated advertisements showing up on the page. (Maybe a system like this is already in place – I’m not totally sure!)
Hi Bob,
Funny coincidence. 🙂 Understanding the intent of a searcher is a difficult task. Google seems to be trying a number of approaches to try to handle that better including a number of different ways to personalize search. I’m not sure that I’ve seen Google subdivde or segment search results into different meanings, but I do think it’s possible that they may try to show some diversity in results, so that if we have a term like “fencing,” as a query, they might try to show results in the top ten that they return that include the three different meanings.
Hi Kristin,
I think one of the difficulties behind using some kind of classification system on the web that allows website owners to use a classification code is that there are so many different sites, with different owners who may not quite grasp the classification scheme, or whose sites may bridge more than one classification making it difficult to choose which to use. Some web site owners may also provide classifications that just don’t match their pages, from one reason or another. Classification can be a lot easier when it’s limited to a single database, with someone who is knowledgeable about the use of classification systems.
Hi Pat,
That’s a great example – thank you. It really can be a challenge when one of the most important terms for your site shows up in search engines under a different meaning.
Looking at the search query suggestions on Google, it doesn’t seem like they account for different meanings for the term very well, or different intents behind the search, such as looking for a manufacturer of salsa or recipes. It might be more helpful if they did.
We have worked on a recent project that dealt with this problem. The company is a salsa manufacturer, yet the vast majority of searches for ‘salsa’ were dancing related.
We were very liberal with our modifiers: gourmet, fresh, homemade etc. While we were not worried that the site may show in salsa dancing serps, we wanted to make sure that the ‘theme’ was evident.
From the user perspective, I would love to see a quality “Did You Mean?” drill down.
Thank you for coming up with this post.Its easy to get lost with the evolution of words these days.And for keywords which you think that have more meaning, its always best to pair it with something that would lead directly to your site
Hi Vince,
You’re welcome. I agree with you. Attempting to rank well for a single word can be difficult, and if the word has multiple meanings, it’s possible that when people search for it, a good percentage may not be interested in your page anyway even if you do rank well for that word.
makes me wonder … is this how google changed the rankings for brand names?
Hi Michiel,
That’s an interesting question. I believe that Matt Cutts responded in a video to a question a couple of months ago about changes to search results that seemed favor brands more in those results with a statement that if it had, it wasn’t intentional. He said that they had made another changes (one of three or four hundred a year), and if brands now tended to rank better, that was a totally unintended side effect. He mentioned that they look for things like trust, authority, PageRank, high quality. The change that they made could be the result of changes like the one that I described in this post:
Boosting Brands, Businesses, and Other Entities: How a Search Engine Might Assume a Query Implies a Site Search
Though it’s possible that some other change created the “brand” effect that you’re writing about. The only clue that Matt really gave us about that specific change was that it was referred to as “Vince’s change” named after a person at Google nicknamed “Vince” who did a lot of work on it.
Tell you what, I live and work in a small town called Bath here in the UK, getting to find any decent information on the web in certain circumstances is a nightmare: Try ‘designer bath’ ‘house for sale bath’ ‘hotel bath’ Some times it work better than others.
Great article, thanks, Matt.
Although it does present a problem – everyone is in the same boat – it might be irritating for your company, but it’ll affect the competition in exactly the same way.
If a user searches for “surfing” then they might only find references to “surfing the web” rather than the actual sport – however as none of the results were relevant, they’ll narrow down the search until they see the results they need – as long as you use a few keyword tools to find relevant and related searches based upon a keyword, then you can’t go far wrong IMO.
Hi Matt,
That does sound like it has the potential to be pretty confusing. Place names potentially have their own sets of problems when it comes to showing relevant search results, but a place name that has a number of other meanings, and I imagine searches could be tough – even local searches when the “where” of the search can easily be confused with what you are searching for.
Hi Techdesigns,
Good points. I do wonder about searches by people who may not know too much about the topic they are searching for information about, so that it can be difficult for them to refine their queries in a meaningful manner. Some kind of segmentation of results, or intelligent query refinements that take into account different potential meanings for polysemous words may be helpful in that case.
A very interesting topic I’ve never thought about before, but I don’t think it’s as big a problem as you think.
The research shows that people who are searching will continually refine their search terms until they find what they are looking for.
So if I typed ‘fencing’ and got a whole bunch of sites not related to the fencing I was looking for, I will go back and refine my terms. I will keep refining my terms until the first few results are relevant to what I want.
The users of search engines themselves have evolved sophistication to make this problem not so major. I see it only of impact to people new to using search.
That being said, I expect to see a ‘Did You Mean’ function on major search engines to distinguish between words that possess different meanings within 2 years.
Regards,
OZ
Hi Oz,
Thanks. Awareness of a potential problem, and the approaches that a search engine might try to use to address that problem is worth paying attention to.
I’ve seen a fair amount of research on how people search, and refine the queries that they use. I think it can be frustrating for searchers who might not know much about a topic, and aren’t sure of the right terms to try to use to refine their queries. Rather than spending time trying to find the right combination of magical words to return the results that they are hoping to find, many may just give up.
We are seeing more “did you mean” type query suggestions and refinements showing up in search results – when I look at those, I’m seeing that they often don’t capture a variety of different meanings that a searcher might be interested in. A very recent couple of patent filings that I haven’t had a chance to write about yet cover some of that area.
Great post Bill. This is a question I’ve been asking myself before like how search engines can know what’s the user real intention when making a search. It does apply to certain keywords which can mean different things.
An example is I have a website about “weight training”. I wrote an article about “building arms” and it can happen that a person is making a search for “building arms” but in reality he wants to know how to create weapons, so he won’t get too many relevant results. Why I say this is because if you do a search for “building arms” on Google, you will see more results about building arms(related to body) plus few results for arms as weapons in the front page. So the user needs to browse for the relevant listings.
Hi ZineGuru,
Thank you. Good question. I look at it this way – knowing that a word or phrase might contain multiple meanings is something that it’s helpful to consider before making an effort to optimize a page for that term or phrase. It should be part of the keyword analysis that you perform to begin with. It is essential to look and see what kinds of pages and topics associated with your keywords show up in search results, in addition to looking at things like search volumes and how competitive a term might be.
I doesn’t hurt to go places online where your audience or audiences might hold conversations, such as forums and blogs and get an idea of the language that they actually use when they discuss topics like weight training and building stronger and better defined and toned arms, and see what language they use. Some choices might be things like “building bigger biceps,” or “developing triceps” or “bigger arms.”
Your term “building arms” might be a good choice if it’s something that your audience would use when they discuss weight training. Since most of the results that you see tend to involve weight training, that’s not a bad sign. I’d still look around to see if that’s something that people discussing that particular topic choose to use, and how good of a choice it might be compared to phrases that might be a little less ambiguous such as “bigger biceps.”
It isn’t necessarily a bad idea to compete for terms that might have more than one meaning. But it doesn’t hurt to know that if you end up ranking well for that term that some of the searchers who may search using those phrases are going to be looking for something that has nothing to do with what you offer.
Also, when I look in Google’s keyword suggestion tool for “building arms” and I see a very large search volume, I don’t know if the people searching for “building arms” are looking for pages about weight training or about guns and munitions. So, if you’re relying on a keyword suggestion tool like Google’s keep that in mind when you do your keyword research – the numbers associated with that search volume may not be giving you a good indication of how many people are actually searching the words as you mean it – involving weight training. It’s more likely that a search volume number associated with “bigger biceps” is a better indication of people who are interested in weight training. That understanding should weigh into your choice of keywords to optimize for.
Thanks for your insight Bill. I know what you mean and it totally makes sense. Indeed double meaning keywords need more analysis. I have written an article about arms related to arms muscles but I also wrote others including biceps and triceps separately to be sure I get all of the most relevant traffic. Thanks for pointing it out though.
I believe the arms article can get two types of traffic although right now I see only 1 type that is people looking to build muscles in the arms. The title of the article also includes muscles just after arms, and I noticed that people who are looking for these information include muscles as part of the keyword phrases, so I know it’s targeted traffic. But in case someone reaches my site by a keyword phrase which has “arms” and “build” or “building” in it, then I will still have a question mark in my mind like whether that person was looking for muscle building or building weapons.
I think we will never know for double meaning keywords because this is in the user’s mind but like you said, it does not hurt to compete for double meaning terms although more emphasis should be made on targeted terms which you are guaranteed to get the traffic that you really want.
It has been great too to discuss this with you Bill and sharing ideas and opinions. Yep indeed, the title and the description should clearly point out what this page is all about, this way, you won’t get the non targeted visitors because I’ve seen before a listing on SERPs which does not really tell you what this page is really about.
You would think you might get what you’re looking for but when you land on the page, it’s something different. The webmaster didn’t put meaningful titles and descriptions.
True for the content part, it’s wise to add sentences to give the double meaning keywords their real meaning so that there is no ambiguity. Search engines sometimes don’t take your meta descriptions and use some part of your content as search result snippets.
Who knows I might get some weapon builders as subscribers haha 🙂
Hi ZineGuru,
You’re welcome. It’s good having some examples to discuss in more depth about a topic like this, so your questions and your thoughts are much appreciated.
I agree with you on emphasing targeted terms that are more likely to get the traffic you want, and then considering terms that might have multiple meanings. Good to hear that you are focusing upon some of the other terms that are less ambiguous.
Another issue with using keywords that have more than one meaning is making sure that page titles and meta descriptions being used make it clear which meaning those pages cover. I also try to do that in sentences within the content of a page where that keyword might appear, in case the search engine might decide to use content from the page for a search result snippet. I’m guessing most site owners probably do as well, but I think it’s a point worth making. To keep on using your articles as an example, it’s possible that at least some people interested in weapons might find a page titled “Building Arms” about weight training interesting, but if it’s really not what they wanted, chances are they might look quickly and leave. 🙂
Thank you, ZineGuru
Yes indeed. Traffic to your site is great, but you want visitors who are interested in what you offer coming to your site – making the context of your use of keywords clear in search results is worth the effort. And those weapon builders who are also interested in weight training may visit knowing they aren’t going to a site about weapons, anyway.
This is amazing post. With the evolution of internet ,every now and then we see new words are being added to Internet, so its really become difficult to know each of their meaning.
Hi
Interesting question. Thank you. Languages evolve over time, and the meanings of words can change as well. It might make sense for a search engine to update what they know about the meanings of words on a somewhat regular basis, and look for trends in the usage of words. A system like the one described in this patent filing should be able to account for new meanings and usages of words.
I think this is also a good reminder to use more than one search term when optimizing a page. For example if you have a page about Florence….are you talking about Florence Italy, or Florence Alabama? When we go broad with our SEO, we can often fall into the trap of over-generalization and hence miss our target audience.
Hi guyfromseattle,
I agree – if the possibility of confusion exists, it’s not a bad idea at all to make it clear, so that we do reach the audience we want to connect with.
You describe this fact very clearly. There sure are so many keywords with ambiguous meanings. Sometimes, I have also encountered some weird keywords that have different meanings in different languages. As you say, the right keyword used with some other related keywords can help bring the targeted traffic.
Hi Arvind,
Good point. There are words that have more than one meaning when used in different languages. Words can also have different meanings in the same language in different regions of the same country. For example, if I travel to the midwesternern part of the United States and go into a store and ask for a “pop,” I’m not looking for someone’s father, but rather some soda.
Related words on the same page as an ambiguous term may help search engines learn which meaning is relevant.
This is a really helpful post. I totally agree that words with ambiguous meanings make a difference in search. Sometimes it is difficult to rank for some uncommon keyword in your language just because it is a popular one in some other language.
Hi Bidhan,
Thank you. It can be difficult when just dealing with one language regarding words that have more than one meaning. Considering Google is aiming at being a multi-language search engine, where you can find relevant results in other languages in your searches, that can make it even more difficult, especially when searchers might not understand that while a word might be spelled the same way, it may mean something very different.