When you search for something at a search engine, the search engine might not just try to find pages on the web which match the keywords that you searched with, but may first try to expand upon those keywords by finding similar or related terms.
This kind of expansion of search terms can be most visible when one of the query terms that you use is a misspelling, and a search engine might display results with the correctly spelled words if it is pretty confident that one of the terms is misspelled.
How does a search engine know that a term is misspelled, or that there might be related phrases that might provide better and more helpful results to a searcher?
One way is for the search engine to look at its query logs to see if previous searchers might have corrected or rewritten their queries after doing an initial search for the original search terms.
Another might be for the search engine to look at an outside source of information – such as a dictionary that defines different senses of words and terms that are meaningfully related in some manner.
This kind of query term reformation or substitution may not only affect the search terms that you see in search results, but also the advertisements that are shown along with those search results. If a possible substitution for a query is related in a meaningful enough way, ads matching the substitution might be shown in response to the original query.
A recently published Yahoo patent application explores query reformation and substitution, and it shouldn’t be a surprise if other search engines are practising similar processes.
Missing “Similar” Search Results
Most people who use a search engine understand that when they perform a search, the search engine will look for documents on the web that contain the keywords used in that searcher’s query.
But a strict matching of keywords by a search engine may mean that pages which are relevant to the search may be missed because search terms similar to the ones used may provide better results.
To solve this problem, a search engine might consider displaying search results or suggestions for searches “of search terms that are similar or related in meaning to the search terms that a user provides to a search engine.”
Generating Related or Suggested Queries
Previous searchers who may have used the same search term may reformulate their search queries to find better results – and keeping track of those reformations may help the search engine identify related or similar search terms.
A search engine might also look at statistics which show other phrases that tend to show up in documents with the original query, or dictionaries that identify different the use of different senses of words and phrases, like Wordnet, to identify related or similar query terms.
The Yahoo patent filing is:
System and method for generating substitutable queries on the basis of one or more features
Invented by Rosie Jones, Benjamin Rey, Marco Zagha
US Patent Application 20080114721
Published May 15, 2008
Filed November 15, 2006
Let’s say that a large number of people who search for the term intellectual property then go on to search for the term patent attorney with their very next search, or within the same search session.
The search engine log files would uncover that such an association exists, and the search engine might explore how common it is for searchers to search for that second phrase. If it happens frequently enough, the search engine may start suggesting patent attorney as a suggested search to searchers along with a display of search results for the term intellectual property.
Searchers who look for cellular phones at a search engine may commonly perform searches for wireless technology within a short period afterwards (within 20 minutes or within an hour), and that may suggest to the search engine that the query wireless technology is a candidate reformulation of a query term with respect to the query cellular phones.
It’s possible that for some query reformations, instead of offering a query term as a suggested search, a search engine might show results for the related query mixed in with search results for the original query term.
When I perform a search at Yahoo for the phrase note book computers, the following text appears above my search:
So, pages for both note book computers and notebook computers are mixed together in the Yahoo search results on a search for just note book computers.
If you look at the Yahoo search results for notebook computers, you will see a couple of search suggestions or reformations offered at the top of the search results that haven’t been incorported into the actual search results:
The patent application goes into some significant detail involving how reformations might be identified and ranked, including queries that involve geographic locations, and how those might be reformed and substituted for, or offered as search query suggestions.
Why might it be helpful to spend some time on understanding how this reformation of queries works?
My search for notebook computers tells me that it is likely that many people then go on to search for cheap new notebook computers after searching for the first term. If I take a look at the search results for Cheap new notebook computers, I see some more suggestions for query terms, at the top of the search results, including one for refurbished laptops.
If I am a site owner who sells notebook computers, it may be helpful to me to know that Yahoo is suggesting a search for cheap new notebook computers, and a further suggestion for refurbished laptops on the query cheap new notebook computers.
In some instances, instead of suggestions for different queries being offered by the search engine, results from the new query term or terms might be incorporated directly into the results for the original term.
Keep in mind that Yahoo is getting those suggestions either from user data that it has collected or from an alternative source such as collected statistics about phrases that co-occur within the same documents, or from a trusted source of information about the meanings of words, like Wordnet.
Added (2008-05-18, 8:19pm est): There’s a discussion on this post at Cre8asite Forums: Top Three Query Returns Increase Value, As Exact Term Positions Decrease