Determining how a term or phrase may be used in the context of a page can be helpful in deciding how relevant that page is in responding to a query from a searcher.
A patent application from Google was published this week which looks at possible ways of considering the context of those words, and describes a multiple stage process to determine relevancy and find results to a search.
larger image (new window)
The document is fairly complex, but some possible actions that can be taken during the different stages described are:
a) Deletion of stop words.
b) Term Stemming
c) Expansion of queries to use things like synonyms and related terms that commonly co-occur with them.
d) Relevancy scores are created between query and each document computed one or more scoring algorithms, such as:
….the presence or absence of query term(s), term frequency, Boolean logic fulfillment, query term weights, popularity of the documents (e.g., a query independent score of the document’s importance or popularity or interconnectedness), proximity of the query terms to each other, context, attributes, etc.
Adjacency and Proximity of terms are used to rank documents
Term attributes, such as whether terms are in titles, headings, metadata, and have certain font characteristics, are reviewed.
Generation of snippets to return with results.
Other relevance feedback algorithms might be used, such as:
….pseudo-relevance feedback algorithms based on a full document approach (pseudo relevance feedback based on a whole web page), Document Object Model (DOM) segmentation, Vision-based Page Segmentation (VIPS), conceptual relevance feedback using concept lattices, etc.
The patent application
Inventors: Jeffrey Adgate Dean, Paul G. Haahr, Olcan Sercinoglu, and Amitabh K. Singhal
US Patent Application 20060036593
Filed: August 13, 2004
Published February 16, 2006
A multi-stage query processing system and method enables multi-stage query scoring, including “snippet” generation, through incremental document reconstruction facilitated by a multi-tiered mapping scheme. At one or more stages of a multi-stage query processing system a set of relevancy scores are used to select a subset of documents for presentation as an ordered list to a user. The set of relevancy scores can be derived in part from one or more sets of relevancy scores determined in prior stages of the multi-stage query processing system. In some embodiments, the multi-stage query processing system is capable of executing one or more passes on a user query, and using information from each pass to expand the user query for use in a subsequent pass to improve the relevancy of documents in the ordered list.