Results Clustering Patent Application from Microsoft

Sharing is caring!

A new patent application from Microsoft considers ways to present search results to searchers in clusters, with meaningful names. This kind of results clustering is also sometimes referred to as domain clustering, because it normally involves grouping together search results from the same domain.

Published on February 2, 2006, the results clustering patent application was originally filed on July 13, 2004, and is assigned to Microsoft Corporation.

Query-based snippet clustering for search result grouping
Inventors: Hua-Jun Zeng, Qicai He, Guimei Liu, Zheng Chen, Benyu Zhang, and Wei-Ying Ma
US Patent Application 20060026152


A clustering architecture that dynamically groups the search result documents into clusters labeled by phrases extracted from the search result snippets. Documents related to the same topic usually share a common vocabulary. The words are first clustered based on their co-occurrences and each cluster forms a potentially interesting topic. Keywords are chosen and then clustered by counting co-occurrences of pairs of keywords. Documents are assigned to relevant topics based on the feature vectors of the clusters.

This results clusteringprocess is intended to make it easier for searchers to find what they are looking for by clustering results into groups, and providing a user with the ability to quickly determine whether or not different clusters match what they are looking for. In addition, clusters that may be more relevant for a query used by a searcher are ranked higher than ones that aren’t.

There are some search engines that display groupings with the results that they show searchers, but those differ in that those clusters are assigned based upon content similarity.

These groupings are created dynamically, from an analysis of the results originally returned upon a query, looking at common language used in titles and snippets. Interestingly, the results clustering patent application notes that words and phrases from titles will likely be given more weight than from snippets.

Sharing is caring!