Can looking at how many times rare words appear in a search engines index give us an idea of the size of the database for that search engine?
About a week ago, I wrote about some of the most common English words in the indexes for Google, Yahoo, Bing, Ask, and Google Caffeine. I took a look at 50 words that are amongst the most frequently appearing words in English, and estimates from those search engines about the number of times that those words showed up.
Comparing the number of results between the different search engines for those common words really didn’t tell us anything about the relative sizes of the indexes for those search engines for a number of reasons.
One is that the number of results shown are rough estimates only. It’s also possible that the way that estimates are calculated from one search engine to another are very different. Some of the pages listed among those results are likely duplicate pages at different URLs, or may have contained misspellings of the words. Some of the words may be abbreviations or acronyms, as well (such as “it” being an abbreviation for information technology).