Main navigation

Reader Interactions


  1. Hi Danny,

    Thanks for your comment. This post was pretty much meant as a simple restatement of what search engines do when responding to queries, but it might not be so obvious to some folks who use search engines.

    When I wrote it, I didn’t think I was going into any groundbreaking new territory. Rather, I considered it as possibly laying the foundation for some future posts involving some of the mechanics of what we see when we see a search engine in action.

    With this type of caching and prefetching, where things start to get a little interesting are the replacement policies that determine when results within a cache or caches are refreshed, and new documents replace old ones in results pages.

    I’ve written a couple of recent posts about patent applications describing predictive queries, and how they are used in places like Google Suggest, and possibly on mobile devices to assist in the entry of data on smaller or constrained keyboards, or by the use of a stylus.

    I remember when I first looked at Google Suggest, I was impressed with the Ajax interface that updated the suggestions shown, without giving much thought to where they were getting those results. The patent application that described how that might work also gave us some insight into where those results were coming from. Caching and prefetching of results was covered pretty well in that document.

    I’ve trained a few people who had extensive web design backgrounds, but haven’t had much experience with how search engines work, and have seen them slap themselves on the forehead when I mentioned something like this, saying that it’s so obvious that they should have realized that was going on.

  2. Bill, every major search engine to my knowledge already caches results to some degree and long have done so. My understanding is that they tend not to hit disk unless they have a good reason to do so. Those same popular queries that happen over and over, my understanding is they are pulled out of memory and they’ve got ways to ensure that’s refreshed when needed.

  3. Hi folks,

    I’m one of the author of the paper…

    I would like to just point out some items that, I hope, should clarify the picture.

    Our SDC policy is actually a policy that is aimed at deciding whether an entry in the cache should be evicted or not. The key point of SDC is that it uses stats on usage information in order to fill a static-section of the cache with the most likely-to-be-requested-in-the-future pages. Differently from previous policies this section doesn’t change, allowing us to keep up with queries that are frequently submitted but not submitted at high rates… The dynamic-section, which is a classic cache, instead is concerned with queries that are submitted at an high rate.

    About the refresh rate of the static-set… In the paper that will appear in the journal (that is a little bit different from the tech rep linked by the article) we also measured how often the static-set should be refreshed… We have nice results about that… People curious about that may e-mail to me and I’ll send a copy of the camera-ready paper.

  4. Thank you, Dr. Silvestri.

    The static section of the cache that the paper describes did sound like something unique to me. I appreciate your taking the time to point that out here. I’ll likely be in touch soon to hear more about your results.