When you search for something at a search engine, the search engine might not just try to find pages on the web which match the keywords that you searched with, but may first try to expand upon those keywords by finding similar or related terms, and may substitute search terms for the ones you have chosen.
This kind of substitute search terms are most visible when one of the query terms that you use is a misspelling, and a search engine might display results with the correctly spelled words if it is pretty confident that one of the terms is misspelled.
How does a search engine know that a term is misspelled, or that there might be related phrases that might provide better and more helpful results to a searcher?
One way is for the search engine to look at its query logs to see if previous searchers might have corrected or rewritten their queries after doing an initial search for the original search terms.
Another might be for the search engine to look at an outside source of information – such as a dictionary that defines different senses of words and terms that are meaningfully related in some manner.
This kind of query term reformation or substitute search terms may not only affect the terms that you see in search results, but also the advertisements that are shown along with those search results. If a possible substitution for a query is related in a meaningful enough way, ads matching the substitute search terms might be shown in response to the original query.
A recently published Yahoo patent application explores query reformation and substitution, and it shouldn’t be a surprise if other search engines are practising similar processes.
Missing “Similar” Search Results
Most people who use a search engine understand that when they perform a search, the search engine will look for documents on the web that contain the keywords used in that searcher’s query.
But a strict matching of keywords by a search engine may mean that pages which are relevant to the search may be missed because search terms similar to the ones used may provide better results.
To solve this problem, a search engine might consider displaying search results or suggestions for searches “of search terms that are similar or related in meaning to the search terms that a user provides to a search engine.”
Generating Related or Suggested Queries
Previous searchers who may have used the same search term may reformulate their search queries to find better results – and keeping track of those reformations may help the search engine identify related or similar search terms.
A search engine might also look at statistics which show other phrases that tend to show up in documents with the original query, or dictionaries that identify different the use of different senses of words and phrases, like Wordnet, to identify related or similar query terms.
The Yahoo patent filing is:
System and method for generating substitutable queries on the basis of one or more features
Invented by Rosie Jones, Benjamin Rey, Marco Zagha
US Patent Application 20080114721
Published May 15, 2008
Filed November 15, 2006
Some Examples
Let’s say that a large number of people who search for the term intellectual property then go on to search for the term patent attorney with their very next search, or within the same search session.
The search engine log files would uncover that such an association exists, and the search engine might explore how common it is for searchers to search for that second phrase. If it happens frequently enough, the search engine may start suggesting patent attorney as a suggested search to searchers along with a display of search results for the term intellectual property.
Searchers who look for cellular phones at a search engine may commonly perform searches for wireless technology within a short period afterwards (within 20 minutes or within an hour), and that may suggest to the search engine that the query wireless technology is a candidate reformulation of a query term with respect to the query cellular phones.
It’s possible that for some query reformations, instead of offering a query term as a suggested search, a search engine might show results for the related query mixed in with search results for the original query term.
When I perform a search at Yahoo for the phrase note book computers, the following text appears above my search:
We have included notebook computers results – Show only note book computers
So, pages for both note book computers and notebook computers are mixed together in the Yahoo search results on a search for just note book computers.
If you look at the Yahoo search results for notebook computers, you will see a couple of search suggestions or reformations offered at the top of the search results that haven’t been incorported into the actual search results:
Also try:cheap new notebook computers,best notebook computers,More…
Some Implications
The patent application goes into some significant detail involving how reformations might be identified and ranked, including queries that involve geographic locations, and how those might be reformed and substituted for, or offered as search query suggestions.
Why might it be helpful to spend some time on understanding how this reformation of queries works?
My search for notebook computers tells me that it is likely that many people then go on to search for cheap new notebook computers after searching for the first term. If I take a look at the search results for Cheap new notebook computers, I see some more suggestions for query terms, at the top of the search results, including one for refurbished laptops.
If I am a site owner who sells notebook computers, it may be helpful to me to know that Yahoo is suggesting a search for cheap new notebook computers, and a further suggestion for refurbished laptops on the query cheap new notebook computers.
In some instances, instead of suggestions for different queries being offered by the search engine, results from the new query term or terms might be incorporated directly into the results for the original term.
Keep in mind that Yahoo is getting those suggestions either from user data that it has collected or from an alternative source such as collected statistics about phrases that co-occur within the same documents, or from a trusted source of information about the meanings of words, like Wordnet.
Added (2008-05-18, 8:19pm est): There’s a discussion on this post at Cre8asite Forums: Top Three Query Returns Increase Value, As Exact Term Positions Decrease
This is really very interesting and timely as well. I was just this weekend looking at some terms I didn’t think belonged together but after reading this I can see how the suggestions would be made. Time to check where I rank for those “related” terms.
I love these “smart” suggestions. My research shows that it will help older (insert baby boomers here) internet users the most. While most younger people and savvy net users know how to find what they are looking for, older users and occassional users will benefit greatly from the search suggestions.
I will however be watching the closely. They have potential for commercial value for the search provider. I certainly won’t be using any search provider who spams me with poorly disguised advertisements in their link suggestions. While certain searchers will certainly be appopriate for commercial results, if they are paid results, the search will lose value for me.
this is a very insightful article
closely realted terms are def something that I will be paying more attention too! thanks
I’d definitely recommend to anyone doing keyword research, from experienced SEOs to small business owners trying to do their own keyword research, that they actually perform searches with the terms they are considering optimizing for, and see how a search engine may display results for those searches, from offering suggestions or blending in results from more popular related queries, to whether or not images and videos, and other non-web page results are being shown. It’s worth paying attention to.
Hi Michael,
It’s fascinating how some terms end up being related to others, especially when they are discovered by a search engine tracking the way that people search. It can provide a glimpse into what searchers may be thinking about when they use certain queries. It certainly doesn’t hurt to look and see what a search engine might show in their results, when the possibility exists that they will display additional search suggestions, or blend the results of searches together.
Hi Steven,
That’s an excellent point about how search suggestions can benefit older users of a search engine. 🙂
There’s another class of searchers who benefit from “smart search suggestions” and my next post is going to go into that topic.
Hi Dazzlecat,
You’re welcome.
Hi Bill,
This is interesting. But i guess, this is mostly restricted to “english language phrases” (correct me if i am wrong).
There are times, when i know a certain word is spelt in a different manner in a particular country/industry, but i may not be knowing the exact spelling.
during such times, most of these “related” phrases are either totally wrong or way out of mark.
since i do not know the exact meaning, i naturally depend on the “suggested” word. but if the suggested word is also wrong, then because of the “click count”, i think the same word would keep repeating even though that is also wrong 🙁
If only that is also taken care of, this would be really nice.
Hi Praveen,
Nothing within the patent filing from Yahoo suggests that it would be limited to English only, or to dialects specific to different regional spellings.
But I think that you identify an issue that is a problem for the search engines, and one that shows that search is still a pretty young field.
Part of the difficulty that a search engine may face is in identifying the language (or dialect) that a query might be written in.
I wrote about four patent filings from Google at the end of last December that discussed looking at queries in different languages and also handling non-latin characters in searches (or the lack of such characters on some keyboards). The post is:
How Does a Search Engine Know the Language of A Query? Google Explores Character Mapping
They don’t quite address your concerns head on, but may provide part of a foundation for steps that a search engine can take while attempting to offer search suggestions matching the language that the original query was written within.
Thanks Bill.
I missed that. Will take a look at it tonite.
This happens every time I search my name. “Do you mean Stephen Miller?” They still mix the Stephen’s in with it. I never get any respect.
Hi Stephan,
I use the nickname “Bragadocchio” at Cre8asite forums.
It took a couple of years and thousands and thousands of forum posts before Google stopped asking me if I meant a different spelling of that nickname.
Keep at it. Maybe someday they’ll figure it out. 🙂
This kind of thing will make keyword research difficult, I am already dubious about the results in word tracker when compared to my own research on PPC. I guess this kind of thing will just add to the general confusion.
Hi Pete,
It will likely add an additional element of difficulty to keyword research, but it’s something that Google, Yahoo, and Microsoft are already doing in one form or another. And it may provide some benefits, too.
I have found that it really helps to pay attention to how a search engine might treat a keyword in search results before attempting to use that phrase in optimizing pages. Looking at those results may provide some ideas for other phrases to look at when conducting keyword research, because the terms shown are ones that people actually use in query sessions when they are searching.
hey bill
as usual great article… you know i found myself here about 3 days in a row just reading everything you have to say… I mean i have been in the seo field for a pretty long time(and i think i know everything lol 🙂 but you really have alot of interesting stuff to say.
thanks again for the great article…
also please get that spam out from the comment over top of me…
Thanks!
Appreciate your kind words. 🙂
I’m not sure if you are referring to the Van SEO Design trackback as spam, but it’s actually a nice roundup of blog posts from the past month’s on a variety of topics.
PubMed has been doing this for years. In Google, I would like to see difference tenses of a search term show up in searches. But, beyond that I think that Google acting like it knows more about what I am trying to find than I do is absurd, and not appreciated.
Hi John,
Thanks for stopping by, and commenting. I’m not familiar with PubMed’s approach to providing alternative query suggestions to searchers. Are those based upon an analysis of use log data a query revisions during query sessions?
The Yahoo patent filing that I link to above sounds much like what I’ve read about Google’s approach to providing query refinement suggestions, too.
I do think that performing query expansions to include different tenses is something that Google is capable of, and could be doing. Google described how they might do that in a patent application titled Systems and methods for improving search quality. The section starting at [0045] on “Inflections” describes how queries might be expanded in the search engines. Here’s how they define inflections:
I understand completely how it can be frustrating when Google makes unwarranted assumptions about what you might want to see based upon your query. Where that bothers me the most is in Google’s “patent search.”
I’d ideally just want a list of patents that include the keywords that I want to see, and some other sorting options. Instead, they supply the patents that they think are “most relevant” for my query, without telling me how they determined relevance. That’s a dangerous approach for something like patents, where due diligence might require a more complete set of search results than one based upon a mysterious and undefined ranking of patents for keyword phrases.