One way to help in that process of organizing the Web is to use what people do on the Web.
– Ricardo Baeza-Yates, from a presentation on Extracting Semantic Relations from Query Logs
How might different search queries be when they share several pages in search results, and searchers tend to click upon those shared results more than other results?
If you go to Yahoo’s search and perform a search for the term [wcca], the first result that you see in the search results is a page titled “Wisconsin Circuit Court Access.” If you search for [wisconsin circuit court], you’ll see the same page at the top of the search results. If many people searching for each of those terms tend to mostly click on the link for that page and no other pages, Yahoo might start considering those query terms to be very closely related.
Because of such semantic relations, the search engine might start offering searchers a query suggestion for a related term at the top of the search results for an original query.
A recent Yahoo patent application explores these types of semantic relations and tells us that it might learn a great deal from comparing which search results searchers click upon. It describes three semantic relations for query terms, based upon click data found in its query logs, where it keeps track of which results searchers choose for specific queries.
Synonyms (close relationship) – Query terms that share a substantially equivalent set of clicked search results.
Lesser but included (inclusive relationship) – Where the set of clicked results for one query term is smaller in size than another, and those clicked URLs are substantially included in the clicked URLs for the second query.
Related (lesser relationship) – Where the clicked search results between two queries overlap, but not quite to the same level as the two relationships above – synonyms and lesser but included.
In my example above, if people searching for [wcca] and [Wisconsin circuit court] mostly click upon that first search result for “Wisconsin Circuit Court Access,” the search engine might consider those query terms to be synonyms.
The choices of which pages searchers click upon are viewed as implicit user feedback – searchers aren’t explicitly stating that these queries are related somehow. Still, when they choose shared pages in search results for those queries, it’s assumed that the terms are related.
What would a search engine do with this information?
It might offer query suggestions at the top of search results for a related query or reformulate or expand search results to include results that are also relevant for the other query term. The search engine might also use these relationships to match queries with advertisements and in other possible ways. We’re told about this process, that:
Embodiments can detect the slang of the Web (e.g., a taxonomy used by users to perform searches on the Web).
The semantic relations patent application is:
Extracting Semantic Relations from Query Logs
Invented by Ricardo Baeza-Yates and Alessandro Tiberi
Assigned to Yahoo
US Patent Application 20090164895
Published June 25, 2009
Filed: December 19, 2007
There is a white paper on this topic from the listed inventors on the patent filing available to subscribers of the ACM portal at Extracting semantic relations from query logs. If you’re not a subscriber, there is a video presentation on it from Ricardo Baeza-Yates, which I also linked to at the start of this post.
There are three yahoo research papers co-authored by Ricardo Baeza-Yates which cite that paper and are worth looking at if you’re interested in how search engines might use query logs:
- Search, Web 2.0, and the Semantic Web The importance of search (pdf)
- Clique analysis of query log graphs (pdf)
- The anatomy of a large query graph (pdf)
I’ve written a few posts about synonyms in search. Here are some of those:
- 2/19/2006 – Multi-Stage Query Processing at Google
- 5/25/2007 – Refining Queries Using a Local Category Synonym
- 12/29/2008 – How a Search Engine Might Use Synonyms to Rewrite Search Queries
- 1/23/2009 – Google to Expand Language Search and Shrink Our World?
- 6/29/2009 – Semantic Relations from Query Logs
- 12/22/2009 – Google Search Synonyms Are Found in Queries
- 1/19/2010 – Google Synonyms Update
- 1/27/2010 – Paid Search Results and Query Expansion using Synonyms and Related Concepts
- 2/16/2011 – More Ways Search Engine Synonyms Might be Used to Rewrite Queries
- 8/12/2013 – How Google May Substitute Query Terms with Co-Occurrence
- 9/27/2013 – The Google Hummingbird Patent?
- 12/8/2013 – How Google May Rewrite Queries
- 9/9/2013 – How Google May Reform Queries Based on Co-Occurrence in Query Sessions
- 10/15/2013 – Googleβs Hummingbird Algorithm Ten Years Ago
- 12/21/2015 = How Google Might Make Better Synonym Substitutions Using Knowledge Base Categories
Last Updated July 4, 2019.
This is an interesting search insight from Yahoo that could really make their search engine “smarter”.
Hi People Finder,
I agree with you that it is an approach that could make Yahoo’s search smarter. I do like the idea that they are exploring ways to use data about how people search to make those searches better. I’ve been seeing them do that in some other areas as well.
Wow, interesting synopsis on this patent. I think this is a smart move on Yahoo’s part. Instead of simply correcting spelling, improve the search relevance or offer alternate suggestions that are commonly effective for a specific term. Thanks for the update Bill.
You can see why Bing (Microsoft) wanted to group with Yahoo. Google offer this type of association with their ‘suggestions links’ at the top or bottom of a search results page dependant on how it feels. Again Bill a quality article. when do you get time to sleep?
Hi Joel,
You’re welcome. It is interesting to see Yahoo actually looking at user interactions with search results to draw conclusions about how related those results might be, so that they can do things like come up with query suggestions. I like the idea in theory, and am wondering how effective it is in practice.
Hi Lee,
Thanks. There may be some rhyme or reason why Google presents the questy suggestions that it does, where it does.
I think the idea behind their placement of suggestions is that suggestions that are more likely to lead a searcher off in a different direction are placed at the top of results, and the suggestions that provide good “additional” or “alternative” information to the results that are shown are placed at the bottom of results.
Sleep? π