Some new Microsoft patent applications about the web, and indexing pages, from last week.
Keep in mind that these are just patent applications and have not been granted as patents. They may be avenues of approaches that aren’t developed in the future or maybe the direction Microsoft takes. They might be challenged by claims of the prior art or maybe unique approaches to processes that may improve the way we find information on the web.
Regardless, they are all interesting as pieces of insight into how one search provider might address search in the future.
More efficient rankings
The first one takes the PageRank concept discussed in The PageRank Citation Ranking: Bringing Order to the Web, reviews some efficiencies gained in an approach described in a paper titled Adaptive Methods for the Computation of PageRank (pdf), and builds upon those concepts to try to make indexing of pages more efficient.
Efficient computation of web page rankings
United States Patent Application 20060004811
Inventors: Frank David McSherry;
Assigned to Microsoft Corporation
Published January 5, 2006
Filed: July 1, 2004
Abstract:
Methods and systems are provided for efficiently computing page rankings of web pages or other interconnected objects.
The rankings are produced by efficiently computing a principal eigenvector of a page ranking transition matrix.
The methods and systems provided herein can be used to produce page rankings in a distributed and/or incremental manner and can be used to allocate computing resources to processing page rankings for those pages that most demand them.
Fighting search engine spam
This next patent application looks at using external data to prevent pages from ranking highly when the creators of those pages attempt to manipulate a search engine deliberately. It brings in some ideas that have been developed in fighting email spam to be used to detect pages that are attempting to manipulate rankings in search engines.
Search engine spam detection using external data
United States Patent Application 20060004748
Inventors: Bama Ramarathnam, Eric B. Watson, and Janine Ruth Crumb
Assigned to Microsoft Corporation
Published January 5, 2006
Filed: May 21, 2004
Abstract:
Evaluating an electronic document in connection with a search.
An external source provides data for use in evaluating an electronic document retrieved by a search engine.
A first confidence level of the electronic document is determined based on the externally provided data. The first confidence level indicates a likelihood that the electronic document is undesirable.
A second confidence level of the electronic document is determined based on attributes of the electronic document. The second confidence level indicates a likelihood that the electronic document is unsatisfactory concerning a search.
A rating for the electronic document generated as a function of the determined first confidence level and the determined second confidence level is used to categorize the electronic document as unsatisfactory in connection with a received search request.
Assigning categories to search results
This patent application discusses presenting search queries in categories so that users of a search engine can find what they are looking for easier.
Dispersing search engine results by using page category information
United States Patent Application 20060004717
Inventors: Bama Ramarathnam, Gregory N. Hullender, Darren A. Shakib, and Nicole A. Hamilton
Assigned to Microsoft Corporation
Published January 5, 2006
Filed: July 1, 2004
Abstract:
Systems and methods for dispersing search engine results by category.
A search engine application queries a searchable index of document data associated with a plurality of electronic documents in response to a search request to identify one or more electronic documents having document data matching data included in the search request.
The search engine application disperses identified electronic documents according to category data included in the document data for display to a user.
Identifying and organizing topics of documents by sentence patterns
This document looks at how documents might be identified under different topics by identifying general sentence patterns related to a topic sentence. It then describes a way to cluster those documents together to create a directory of clusters.
Method and system for clustering using generalized sentence patterns
United States Patent Application 20060004561
Inventors: Benyu Zhang, Wei-Ying Ma, Zheng Chen, and Hua-Jun Zeng
Assigned to Microsoft Corporation
Published January 5, 2006
Filed: June 30, 2004
Abstract:
A method and system for clustering documents based on generalized sentence patterns of the topics of the documents are provided.
A generalized sentence patterns (“GSP”) system identifies a “sentence” that describes the topic of a document.
To cluster documents, the GSP system generates a “generalized sentence” form of the sentence describing each document’s topic.
The generalized sentence is an abstraction of the words of the sentence.
The GSP system identifies clusters of documents based on the patterns of their generalized sentences.
The GSP system clusters documents when the generalized sentence representations of their topics have a similar pattern.
Ranking results based upon tracking user preferences
This document looks at the past use of a search engine by a user to personalize future searches. For example, if a searcher typically chooses .edu sites in many or most of their searches, sites with .edu top-level domains might be given more weight in results served to that user.
System and method for ranking search results based on tracked user preferences
United States Patent Application 20060004711
Inventor: Ramez Naam
Assigned to Microsoft Corporation
Published January 5, 2006
Filed: June 30, 2004
Abstract:
A method and system are provided for ranking search results based on user preferences.
The method includes monitoring user selections in response to user receipt of search results and tracking metadata related to user selections for user selections that exhibit a threshold satisfaction level.
The method additionally includes storing the tracked metadata as user preferences and adjusting a ranking mechanism to increase the weight of user preferences to increase a ranking for search results that exhibit user preferences.
The method also includes storing the user selections and the keyword search to determine that the user selections exceed a threshold satisfaction level.
The method may utilize the stored user selections and keyword search upon receiving a repeat search to alter new search results to the user.
Filtering the presentation of search results to prevent undesirable content
While search results for most search engines can filter the pages they present to searchers (using a “SafeSearch”) for undesirable content, sometimes the titles and snippets presented to searchers from the filtered results may still contain information about such things as pornography or drugs or violence. Therefore, this patent application also discusses filtering the information served to someone using the search engine in a filtered mode.
Presentation-level content filtering for a search result
United States Patent Application 20060004716
Inventors: Oliver Hurst-Hiller, and Jamie Paul Buckley
Assigned to Microsoft Corporation
Published on January 5, 2006
Filed: July 1, 2004
Abstract:
Presenting a search result to a user.
One or more electronic documents are identified based on a search query received from a user.
A search result is generated in response to identifying one or more electronic documents.
The search result includes presentation data regarding each of the identified electronic documents.
An undesirable content of each of the presentation data of each of the identified electronic documents of the search result is identified.
A format attribute of the presentation data of the identified undesirable content is modified.
The search result, including any modifications, is then provided to the user.
Building personalized portal pages
System and methods for constructing personalized context-sensitive portal pages or views by analyzing patterns of users’ information access activities
United States Patent Application 20060004705
Inventors: Eric Horvitz and Corin Ross Anderson
Assigned to Microsoft Corporation
Published January 5, 2006
Filed: July 27, 2005
Second version
This patent application describes creating a portal page that can use some predictive approaches to bring a viewer information that they might want to see based upon a combination of user selection and past viewing habits.
Abstract:
The present invention relates to a system and methodology to assist users with data access activities, and that includes such activities as routine web browsing and/or data access applications.
A coalesced display or montage of aggregated information is provided that is focused on a plurality of sources to achieve substantially one-button access to the user’s desired web or data source information/destinations to mitigate efforts in retrieving and viewing such information.
Past web or other type data access patterns can be mined to predict future browsing sites or desired access locations.
A system is provided that builds personalized web portals for associated users based on models mined from past data access patterns.
The portals can provide links to web resources and embed content from distal (remote) pages or sites, producing a montage of web or other type data content.
Automated topic classification is employed to create multiple topic-centric views that a user can invoke.