We can make your web site easier to find, and easier to use.

Why Sometimes Best Search Results aren't Always Top Search Results

When we talk about the results that show up in search engines, we often do so in terms related to relevance and importance of those results.

Sometimes the results we see, and that we don’t see, are influenced by other factors, such as steps taken by the search engines to reduce the amount of work that they have to perform in order to return results to searchers.

Using Two Tiers of Search Results

If a search potentially returns thousands of results, and people only look at the first few pages of those results, it would make sense for a search engine to serve results in batches, and perhaps only initially use a modified (and much smaller) version of their database to answer search queries.

A first index tier may have a number of potential results pruned, so that documents that are more likely to be returned at top answers to searches are kept. The first batch of results returned to searchers may be taken from this pruned index.

While this approach allows a search engine to quickly return results for a search, it may provide a result sets page that miss some results that should have been included if those weren’t in this top tier of the index – with those documents appearing behind pages that are returned first.

A new paper from Alexandros Ntoulas of Microsoft and Junghoo Cho of UCLA, Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee, looks at avoiding “any degradation of result quality due to the pruning-based performance optimization, while still realizing most of its benefit.”

Adding a Correctness Guarantee

The paper provides suggestions for search engines on how they could use a “correctness guarantee” to make sure that top results are included in the pruned index:

How can we avoid the potential degradation of search quality under the two-tier architecture? Our basic idea is straightforward: We use the top-k result from the p-index only if we know for sure that the result is the same as the top-k result from the full index.

The problem with that approach is that calculating both the top results of the pruned index and the full index is more work than just calculating the top results of the full index. Of course, that correctness guarantee doesn’t need to be run everytime someone searches for a particular query, and that’s where there’s potential savings in computational resources.

The paper delves into how often a correctness guarantee should be run for different queries, and policies for pruning certain keywords and documents.

It’s a nice discussion of how a search engine’s inverted index may be managed and optimized. It also covers the assumptions that the authors make concerning how modern commercial search engines rank documents.

LinkedInPinterestStumbleUponShare

3 comments to Why Sometimes Best Search Results aren’t Always Top Search Results

  • Todd Harrison

    I gone down the whole list of search results and have found myself finding a dead end when I get to about a thousand results. So are talking along the line of about the first 200 results or are you thinking the first 3 pages?

  • Hi Todd,

    A very good question. This paper really is about “search engine optimization” where the thing being optimized is a search engine itself.

    It’s about efficient use of the resources available to the search engine, and the likelihood that people will look at search results past the first page.

    There is a statement in the paper that surprised me a little:

    A recent study [16] indicated that approximately 80% of the users examine at most the first 3 batches of the results. That is, 80% of the users typically view at most 30 to 60 results for every query that they issue to a search engine.

    If most people (80%) performing searches are looking at the first three pages of results, it might make sense to try to come up with this first tier of database results for the first 30-60 results, where there are 30-60 results that are relevant for specific queries. It might also make sense to only do something like this for the most popular of results, and only include them within such a first tier if people are actually searching for them.

    So, it may depend upon how often certain queries are searched for as to whether there’s a first tier of results. It might depend upon whether there are even 30-60 results for some queries (sometimes there aren’t). The amount of results that might be included may depend upon the query itself. If more results are needed, it is always possible to dip into the second tier to retrieve those.

  • [...] For a more technical look into these types of issues, Bill Slawski covers a lot of search patents on his blog. If you’re interested, have a read of his post – Why Sometimes Best Search Results aren’t Always Top Search Results. [...]

Comments Policies

  • Relevant comments on the topic of a post are very much appreciated.
  • Please use your personal name rather your business name or keywords in the name field.
  • Comments filling the name field with anchor text to spam this site and search engines (in English or any other language) may be edited, have URLs removed, or deleted entirely.
  • If you include a link in the website field, please choose one about you rather than some product or service or site or blogpost that you are promoting.
  • No signature links in comments, please.