Google looks at multi-stage query processing

Determining how a term or phrase may be used in the context of a page can be helpful in deciding how relevant that page is in responding to a query from a searcher.

A patent application from Google was published this week which looks at possible ways of considering the context of those words, and describes a multiple stage process to determine relevancy and find results to a search.

Multi-Stage Query Processing Description Flowchart

larger image (new window)

The document is fairly complex, but some possible actions that can be taken during the different stages described are:

Stage 1:

a) Deletion of stop words.
b) Term Stemming
c) Expansion of queries to use things like synonyms and related terms that commonly co-occur with them.
d) Relevancy scores are created between query and each document computed one or more scoring algorithms, such as:

….the presence or absence of query term(s), term frequency, Boolean logic fulfillment, query term weights, popularity of the documents (e.g., a query independent score of the document’s importance or popularity or interconnectedness), proximity of the query terms to each other, context, attributes, etc.

Stage 2:

Adjacency and Proximity of terms are used to rank documents

Stage 3:

Term attributes, such as whether terms are in titles, headings, metadata, and have certain font characteristics, are reviewed.

Stage 4:

Generation of snippets to return with results.

Other relevance feedback algorithms might be used, such as:

….pseudo-relevance feedback algorithms based on a full document approach (pseudo relevance feedback based on a whole web page), Document Object Model (DOM) segmentation, Vision-based Page Segmentation (VIPS), conceptual relevance feedback using concept lattices, etc.

Multi-stage query processing system and method for use with tokenspace repository

The patent application

Inventors: Jeffrey Adgate Dean, Paul G. Haahr, Olcan Sercinoglu, and Amitabh K. Singhal
US Patent Application 20060036593
Filed: August 13, 2004
Published February 16, 2006

Abstract:

A multi-stage query processing system and method enables multi-stage query scoring, including “snippet” generation, through incremental document reconstruction facilitated by a multi-tiered mapping scheme. At one or more stages of a multi-stage query processing system a set of relevancy scores are used to select a subset of documents for presentation as an ordered list to a user. The set of relevancy scores can be derived in part from one or more sets of relevancy scores determined in prior stages of the multi-stage query processing system. In some embodiments, the multi-stage query processing system is capable of executing one or more passes on a user query, and using information from each pass to expand the user query for use in a subsequent pass to improve the relevancy of documents in the ordered list.

Share

4 thoughts on “Google looks at multi-stage query processing”

  1. The approach is quite interesting. But, I guess, Google must have modified and updated its algorithms to determine relevancy quite a bit since they filed this patent back in 2006.

  2. Hi Robert,

    Thanks.

    Google has changed around a number of the things they do, and their approaches since this patent application was published, but it’s quite possible that Google is using a multi-staged approach to processing queries that’s probably still similar to what is described in that document. I know they do treat stop words differently, and there may be some other stages as well know, but the basic idea, that queries can be expanded in a number of ways, such as finding appropriate synonyms, and so on, is probably still on point.

Comments are closed.