If a large percentage of people searching for “NY travel” in a search engine choose a result titled “Airfare to New York City”, should the search engine start defining “NY” and “New York” to be synonyms?
Can a search engine learn from results that it provides to searchers? Can it make inferences about the relationships between different queries based upon the similarity of results that it returns, and the choices that people make when faced with those results?
How could a search engine be set up so that it can take advantage of the histories of different queries that return similar results, and yield similar choices from the searchers who enter those queries? A new Yahoo patent application explores a method for approaching that result.
Using matrix representations of search engine operations to make inferences about documents in a search engine corpus
Invented by Shyam Kapur
Assigned to Yahoo!
US Patent Application 20070094250
Published April 26, 2007
Filed: October 20, 2005
In a computer system including a search engine that receives queries and returns search results comprising zero or more hits from a document index, a method of post-processing queries and results comprising
collecting search sets, wherein a search set comprises a query and at least some set of the search results provided by the search engine in response to the query from a corpus,
storing the plurality of search set in reference symbol storage,
identifying an analysis set comprising at least two documents in the corpus to comparatively analyze,
retreating from the retrievable storage search sets containing at least one document of the analysis set, thus obtaining a group of one or more search sets,
generating an inference between the documents in the analysis set based on which is search sets occur in the group.
This process allows a study of a large number of queries and a large number of search results for those queries, and can help fine tune a search engine and the results that it returns to searchers.
The patent application itself is highly recommended reading. There’s an odd paragraph within it that offers advice to search engine optimizers on how they could use a method like the one described here to “reverse engineer” a search engine:
 Search Engine Optimizers (SEOs) are organizations that advise clients on having their pages more highly ranked in search engines. Some advice is legitimate (“become a respected source”, “keep each page focussed on a topic”) and some is not so legitimate (“add piles of keywords in hidden text”, “insert your competitor’s trademarks”), where legitimacy might relate to how much the searching public would up-rank a page if the advice was followed.
In either case, by performing post-search analysis using arrays of search inputs and outputs, SEOs can “reverse engineer” a search engine. Notably, even if the SEO does not have access to all of the millions of queries that pass through the search engine, it can generate a representative set of queries, apply those queries to the search engine and build a matrix of queries and results.
Studying and comparing search results for queries that could be related is an SEO best practice regardless of whether or not it’s suggested in a patent application like this one.
It’s not so much a question of “reverse engineering” a search engine as it is understanding the landscape on the web as it relates to terms that you might target for optimization efforts. Regardless, it’s an interesting thing to see in a patent application from one of the major search engines.