Google on Desktop Search and Personal Information Management

You sit down at your computer, and start working on a document, and visiting the Web to find information.

A program on your computer considers the way that you move your mouse, and the speed at which you type, and recognizes you as one of the people who use that computer, and looks through your past computing sessions to see what kinds of things you are interested in, what web pages you may have visited, which documents you’ve printed, whether you prefer HTML or PDF documents when given a choice.

It keeps track of what you are writing, what you are reading, where you are visiting online, whom you are emailing and messaging, what you are bookmarking, and it creates keywords based upon those contexts, which it uses to conduct searches. It shows the results of those searches in a query box, and pays attention to what you select to see and what you ignore.

If you choose to see some of the search results that are run based upon some of your activities, and ignore some of the others, that may influence how results from keywords created from those activities are ranked.

Searches on Implicit Queries from Your Activities

The kinds of events that might trigger the creation of keywords, and show search results from those keywords could be such things as:

  • Words you have recently typed,
  • Words that in a document you are reading or writing,
  • Words you have bolded, highlighted, italicized, changed the font color of, used as a heading, or otherwise selected,
  • Words that you’ve placed into the clipboard,
  • Words in your documents identified as proper names,
  • Click-throughs in search results shown in this query box
  • Words that you’ve actually searched for in a search engine, or;
  • Any other types of keyword that the system is able to identify.

These types of uses and selection, as well as other factors, may determine how highly such implicitly created keywords and results are ranked compared to others. Some others might be:

  • Term frequencies,
  • Frenquency of document use,
  • Keyword ranking scores,
  • Recently typed words,
  • Highly used terms in an entire document,
  • Highly used terms in a selected portion of a document,
  • Words surrounding a cursor,
  • People’s names, including those who appear in email and IM contacts,
  • Capitalization of certain words,
  • How frequently words appear in different documents accessed or created or both.

Back in September, I wrote a post on Google Patent on Desktop Search and Implicit Queries Based Upon Active Documents. The following two published this week indicate that they are related to that patent.

Systems and methods for ranking implicit search results
Invented by Niniane Wang and Stephen R. Lawrence
US Patent Application 20070276829
Published November 29, 2007
Filed March 31, 2004

Abstract

Systems and methods for ranking implicit search queries are described. In one embodiment a method comprising receiving an event, the event comprising user interaction with an article on a client device, wherein the article is capable of being associated with at least one of a plurality of client applications, extracting at least one keyword from the event, generating a query based at least in part on the at least one keyword, performing a search based at least in part on the query to determine a result set, wherein the result set comprises one or more article identifiers associated with articles comprising the at least one keyword, and determining a ranking for each of the one or more article identifiers comprising the result set is described.

Systems and methods for constructing and using a user profile
Invented by Niniane Wang and Stephen R. Lawrence
US Patent Application 20070276801
Published November 29, 2007
Filed: March 31, 2004

Abstract

Systems and computer-readable mediums constructing and using a user profile are described. In one described system, a query system receives an implicit query comprising a first search term, receives a user search attribute from a user profile derives a second search term from the user search attribute, and processes the search query based on the second search term and/or the user search attribute. The query system may add the first search term to the user profile for use in modifying a subsequent query.

User Profiles

In addition to the implicit query creation that I’ve written about above, we are also told about how profiles can be created for different users of a computer that consider this implicit information, and also look at explicit information that someone tells this system about any specifically stated preferences that they might have.

They might provide explicit information about:

  • Particular file types,
  • File sizes,
  • Files associated with particular applications,
  • Favorites and bookmarks,
  • Geographical location, or;
  • Other categories of data.

Activity data could also be collected in a profile, such as:

  • The user’s click-through data on displayed results,
  • The user’s previous explicit queries,
  • Most frequently viewed files,
  • Most recent files,
  • A list of senders and recipients of email messages,
  • A list of instant messenger “buddy” names, and;
  • Other information that the user has interacted with or typed.

You might not even need to log in to have your computer recognize you, and pull up your profile:

The query system determines which user profile is associated with the current user through user actions or events. In one embodiment, the user’s logon information is used to select the appropriate user profile. In another embodiment, the user’s typing patterns, mouse movement, and other activity and speed are used to identify the user and choose the appropriate user profile.

Stuff I’ve Seen

These patent applications were filed in 2004, and a search through a few other patent applications that were filed at approximately the same time. I was reminded by these of some work done by Microsoft, specifically the Personal Information Manager work described by Susan Dumais with Microsoft’s “Stuff I’ve Seen” project.

Many Unpublished Related Patent Applications

There were 10 patent applications listed as “related” in each of these patent filings, and I searched through the US Patent and Trademark database, and then the World Intellectual Property Organization database to see if I could find some of the others.

One patent application that I looked at on the WIPO pages was Methods and Systems for Information Capture. It lists both the “Implicit Queries” and “User Profile” patent applications as “related applications,” along with 49 other patent filings.

A handful of those have been published, but the majority are unavailable. It will be interesting to see what some of those contain if they are published.

Implicit Queries and Privacy

It will also be interesting to see how much of what is described in these patent applications is used. They appear to involve tracking almost every activity that could happen on a computer, and a lot of information being sent to the search engine.

I’d have some serious concerns about the kind of information that might be collected by the search engine, and how it might be used in other ways.

Share

3 thoughts on “Google on Desktop Search and Personal Information Management”

  1. That’s quite interesting Bill. I wonder if they actually apply it to their Google Apps collection and not just on the desktop. I bet it’s more for Apps.

    Imagine this scenario: user is typing in G Docs (requiring a login). User gets stuck, so goes on to search. Google figures out that the last thing the user typed that Google knows about are a bunch of words in G Docs. The Docs text becomes the context of the search as explained in this portfolio of patents.

    Very clever, and more tracking. I wonder if they’ll actually implement it.

    Pierre

  2. Hi Pierre,

    I have to confess that when I started reading these patent applications, and some of the titles to those related patents that hadn’t been published yet, it struck me that they might be working on something more ambitious than just a desktop search.

    Not quite a full blown operating system, but a lot more than a desktop search.

    It’s going to be interesting to see where these all lead.

Comments are closed.