A Peek into Google’s Desktop Search Indexing Algorithms

Google’s Desktop Search is probably more well known for a mix of features and gadgets than it is the ability to index content on a computer, or on network directories. There’s also an Enterprise edition that enables a company to share the use of desktop search.

Most of what I’ve seen written about this Google search focuses upon all of the add-ons, and the way the program looks, than how it indexes. The official Inside Google Desktop blog is also a gadget heavy look at Google Desktop Search.

If you’d like a little peek under the hood, at how the program may go about indexing your content, three new patent applications from Google provide some details.

These patent filings are closely related to each other, which means that there’s a considerable amount of overlap in the content of their detailed descriptions and backgrounds.

The first one describes a ranking algorithm which shows results based not only on relevance to a search, but also frequency of use – on the assumption that documents and information that are viewed frequently tend to be what someone will be looking for.

Temporal ranking scheme for desktop searching
US Patent Application 20070043704
Published February 22, 2007
Invented by Susannah Raub, Adam Dingle, and Daisy Stanton
Filed: August 19, 2005

Abstract

A system for searching an object environment includes harvesting and indexing applications to create a search database and one or more indexes into the database. A scoring application determines the relevance of the objects, and a querying application locates objects in the database according to a search term. One or more of the indexes may be implemented by a hash table or other suitable data structure, where algorithms provide for adding objects to the indexes and searching for objects in the indexes. A ranking scheme sorts searchable items according to an estimate of the frequency that the items will be used in the future. Multiple indexes enable a combined prefix title and full-text content search of the database, accessible from a single search interface.

The second discusses how the search is an incremental one, in that as you begin typing a query, it provides results. As you continue to type your search term, the list of objects that match the query terms gets smaller, and you may see the search result you are looking for before you’ve completely typed your search terms.

Data structure for incremental search
Invented by Adam Dingle
US Patent Application 20070043750
Published February 22, 2007
Filed: August 19, 2005

The third talks about having two separate indexable databases – one for titles alone, and another for the full text of documents/objects – and how both of these indexes are searched and results from them are merged together.

One reason for the two separate databases is that when a search system returns incremental results, and a searcher has only typed in partial words so far, it might make it easier for a searcher to find what they are looking for if only titles were displayed in results for those partial queries.

Displaying results from the full text index might present too many results, and negate the speed in finding results that the incremental search provides.

Combined title prefix and full-word content searching
Invented by Susannah Raub, Adam Dingle, and Daisy Stanton
US Patent Application 20070043714
Published February 22, 2007
Filed: August 19, 2005

Keep in mind, that as with any patent filings, what has been ultimately developed may differ somewhat from the descriptions in the patent applications. But looking at patents can be a good way of getting an understanding of some of the goals and underlying assumptions and philosophies behind the development of a technology.

Share

8 thoughts on “A Peek into Google’s Desktop Search Indexing Algorithms”

  1. I love Google desktop search. I use it every day, mainly for finding serial numbers, phone numbers and passwords embedded in e-mails.

    It is far better than the search function in Microsoft Outlook.

  2. Thanks, Brisbane SEO

    I haven’t tried Google’s desktop search yet, but I agree with you on the search funtion in Microsoft Outlook. It’s one of the clunkiest applications that I think I’ve ever used.

  3. Yeah I too prefer google’s desktop search indexing over microsoft’s counterpart “windows desktop search” cuzz the cpu usage isnt as high when its actively indexing. Some times I feel like microsoft just pushes stuff out like that for the sake of having a competing product. *shrugs*

  4. Hi Adrian,

    I gave Google’s Desktop Search a try for a few days, and ended up removing it. It felt too intrusive for me – that may have been my own skepticism or paranoia influencing the decision, but couldn’t keep it on my computer, thinking about all of the potential ways that Google has of collecting data, and wondering if I had given them one more portal into data about me.

    I also tried Microsoft’s desktop search, and it never quite installed correctly. :(

  5. I tend to use everything Google, but for personal and business use, but the Desktop Search is one of the rare Google products that I’m really hesitant about. It’s one thing to use Google online, it’s another thing to give them unrestricted access to my computer.

    On a related note, I’m also resisting Google Chrome. Don’t get me wrong, Google is good, but I also believe that this is a perfect example of “too much of a good thing”

  6. Hi Joe,

    I tried out Google’s desktop search, but didn’t keep it for very long. I also tried out Google Chrome when it first came out, but ran into some memory management issues, where it seemed that Chrome was using up too much memory. I don’t know if I’ll try out Desktop search again (I felt it was pretty invasive as well), but I may try Chrome again at some point in the future.

Comments are closed.