Making Search More Efficient: A little about caching and prefetching

I hate doing the very same task over, and over, and over again. I’d bet search engines do, too.

Increasing the efficiency of search

Chances are that your choice of search engine, whether it’s Google, or Yahoo!, or MSN, or another, uses some methods to try to make what they do a little more efficient, and a little less costly to run.

Of course, popular searches are ones that lots of folks search for. If a search engine would process every search request as if it were a new one, and try to grab results from its index, or indexes, including searches that were repeated often, it could be reinventing the wheel frequently. But what if there was a way to speed that up?

What if that method created a significant savings in terms of time and processing power? What if it didn’t do a full search for the most popular terms over and over?

Caching results pages from popular searches

One method might involve using a cache file, like the cache file that your browser uses to store temporary internet pages. The idea behind the browser cache is that if you return to a page, your computer doesn’t have to ask for the page all over again. It can instead just grab the local copy of the page that is in the cache file.

So, imagine a search engine doing the same thing. It has a separate computer, or set of computers that contain cache files, and when a request for a certain word or phrase gets entered by someone using the search engine, a look around in those cache files may be the first stop instead of the search engine doing a lookup in its index.

If a set of results for the phrase or word searched for is there, the search engine could serve it to you. If it isn’t, the search engine might then perform a full search. That could save some processing power, and some time.

Now keep in mind that the most popular searches can account for a large percentage of the requests that a search engine gets. If you look somewhere like the Google Zeitgeist pages, you can see some of the terms that the search engine might be keeping in a cache.

Deciding how many results to cache

This is a tricky area. The efficiency that is gained by caching some results pages, for popular searches could be lost by not caching enough results, or caching too many. So how do you figure out how many results to cache? Well, you could look at how people use the search engines. If people typically only look at the first page of results – ten links and titles and snippets – then you might only want to keep track of those ten results for a certain query.

A paper that looks like it was written in 2004 describes how to make a search engine run more efficiently. One of the authors’ pages notes that it will be published in 2006 in the ACM Transactions on Information Systems, Vol. 24, n. 1, January 2006.

It’s impossible to tell, without seeing that issue whether the version that appears will be the same one that appears here: