Yahoo to Use Query Histories to Improve Search Results?

Sharing is caring!

If a large percentage of people searching for “NY travel” in a search engine choose a result titled “Airfare to New York City”, should the search engine start defining “NY” and “New York” to be synonyms?

Can a search engine learn from results that it provides to searchers? Can it make inferences about the relationships between different queries based upon the similarity of results that it returns, and the choices that people make when faced with those results?

How could a search engine be set up so that it can take advantage of the histories of different queries that return similar results, and yield similar choices from the searchers who enter those queries? A new Yahoo patent application explores a method for approaching that result.

Using matrix representations of search engine operations to make inferences about documents in a search engine corpus
Invented by Shyam Kapur
Assigned to Yahoo!
US Patent Application 20070094250
Published April 26, 2007
Filed: October 20, 2005


In a computer system including a search engine that receives queries and returns search results comprising zero or more hits from a document index, a method of post-processing queries and results comprising

collecting search sets, wherein a search set comprises a query and at least some set of the search results provided by the search engine in response to the query from a corpus,

storing the plurality of search set in reference symbol storage,

identifying an analysis set comprising at least two documents in the corpus to comparatively analyze,

retreating from the retrievable storage search sets containing at least one document of the analysis set, thus obtaining a group of one or more search sets,

generating an inference between the documents in the analysis set based on which is search sets occur in the group.

This process allows a study of a large number of queries and a large number of search results for those queries, and can help fine tune a search engine and the results that it returns to searchers.

The patent application itself is highly recommended reading. There’s an odd paragraph within it that offers advice to search engine optimizers on how they could use a method like the one described here to “reverse engineer” a search engine:

Reverse Engineering

[0051] Search Engine Optimizers (SEOs) are organizations that advise clients on having their pages more highly ranked in search engines. Some advice is legitimate (“become a respected source”, “keep each page focussed on a topic”) and some is not so legitimate (“add piles of keywords in hidden text”, “insert your competitor’s trademarks”), where legitimacy might relate to how much the searching public would up-rank a page if the advice was followed.

In either case, by performing post-search analysis using arrays of search inputs and outputs, SEOs can “reverse engineer” a search engine. Notably, even if the SEO does not have access to all of the millions of queries that pass through the search engine, it can generate a representative set of queries, apply those queries to the search engine and build a matrix of queries and results.

Studying and comparing search results for queries that could be related is an SEO best practice regardless of whether or not it’s suggested in a patent application like this one.

It’s not so much a question of “reverse engineering” a search engine as it is understanding the landscape on the web as it relates to terms that you might target for optimization efforts. Regardless, it’s an interesting thing to see in a patent application from one of the major search engines.

Sharing is caring!

11 thoughts on “Yahoo to Use Query Histories to Improve Search Results?”

  1. It is an interesting application, and one of the things I was looking at when reading through it were some of the past patent application which Shyam Kapur was involved in. In a number of ways this is related to some of the, such as Reranking and increasing the relevance of the results of Internet searches.

    One of the main differences is that the “Reranking” one tries to understand units or concepts in queries first, and then look at relationships between those queries with search results from them, with other units or concepts and search results. This one doesn’t necessarily focus upon finding the units or queries first, which I think is a good and helpful alternative way of exploring relatedness of queries with their associated search results.

    I am trying to understand the reason for including that paragraph in the patent application, and I’m wondering if it’s there in an attempt to exclude people from developing a commercial process that takes advantage of the processes described within the patent application from the perspective of providing SEO services.

  2. I like the ideas in the patent when it comes to helping search engines understand relationships between words, acronyms, etc. They’re obviously moving this way and I think it will only be a good thing the closer they can get to truly understanding how words, pages, sites are related to one another.

    You’re right though that it’s odd to see the reverse engineering part in a patent. It’s not exactly earth shattering news, but still strange to see a search engine mention it, let alone in a patent.

  3. One way to look at it might be as a way for a search engine to compare the results of similar queries, such as “New York Pizza” and “NY Pizza”.

    Should there be a lot of overlap for those results? Probably, since they likely mean the same thing. Clicks on certain results for each search might suggest that the search engine do a comparison, but it isn’t necessarily just the clicked upon result that might see a benefit.

  4. So sites would be ranked not just by links to the site and keywords on the site, but also clicks from the queries. This is an interesting addition to search engine algorithms, but will still only be useful if you happen to be one of the sites the ranks in the top few pages for a search term. Again, I see this as being somewhat limited.

  5. It’s definitely time for the search engine to start thinking of new ways to improve search results. I believe that personlized search results and other methods will be much more common in the future. It should also make things easier for beginners doing SEO, since it should open up more niches.

    Anyway, it will be interesting to see how this works out.

  6. I think this is great for business. Competition is great for business, and I’m glad to yahoo trying to do something about it. Search engines need to get better – and the next decade will be very interesting on the web. We’ve seen alot of movement in the last year alone, and its interesting to see other companies (i.e. not google) try things to claim back some market share.

  7. I allineate with mr. William Slawski answers:

    One way to look at it might be as a way for a search engine to compare the results of similar queries, such as “New York Pizza” and “NY Pizza”.

    I said Yahoo! save and compare the query and save and compare the search results.
    AI (artificial intelligence) is integrated by errors and results.
    (scuse me for my bad english)

  8. Hm I see this is posted in 2007 I was wondering if you knew back then about Google and their behavioral collection? They analyze history like crazy too…who knows maybe it’s one of their ranking factors.

  9. Hi Mark,

    Good question. There were pretty clear indications in patent filings going back a number of years that Google was considering the use of user behavior signals for ranking pages, including Google’s patent filing on information retrieval and historical data from March, 2005.

    Many of the patent filings from Google that involve rankings for pages, over the past few years, do seem to include some aspect that involve looking at searching and browsing behavior.

    There’s also some discussion, and a mention of the possibility of using actual web usage data in one of the first papers on PageRank: The PageRank Citation Ranking: Bringing Order to the Web, in the section on 7.1 Estimating Web Traffic.

Comments are closed.