The Google Advanced Search that Could Have Been

If you could limit the results of a search at Google to a specific point of view, would you? Depends upon what I mean by point of view, doesn’t it? I’ll get to that below.

A Google patent granted this week shows a screen shot of an advanced search that could have been:

An Alternative Google Advanced Search

There are a number of interesting features in this advanced search that would enable searchers to filter or expand search results in response to their queries.

These would require a searcher to make some choices as to what URLs are looked at (as on-topic” or “off-topic”), or categories, or keywords, enabling them to add some or reject others.

These choices are referred to in the patent as a “point of view search,” and they might also work by looking at a searcher’s past searching and browsing history, or browser favorites, or other information that isn’t expressly defined by a searcher. (See the check box in the image for “include stored data?”)

The patent was originally filed in March, 2003, and it’s difficult to tell if aspects of it will be implemented, or if parts of it already have been in some manner, such as part of Google’s personalized search.

Systems and methods for performing point-of-view searching
Invented by Martin Farach-Colton, Monika H. Henzinger, and Bay-Wei Chang
Assigned to Google
US Patent 7,296,016
Granted November 13, 2007
Filed: March 12, 2003

Abstract

A system provides search results relating to a point-of-view (POV). The system obtains a search query and POV data. The system generates a list of documents based on the search query and filters the list of documents based on the POV data. Alternatively, the system may perform a search based on the search query and the POV data to generate the list of documents. In either case, the system then presents the list of documents as the result of the search.

A point-of-view (POV) might be defined by the key words that a searcher choses, URLs for pages that they select or reject, or other information such as:

  1. A user’s browsing history (including information such as sites or documents visited and amount of time spent),
  2. Categories specified by a user or otherwise derived from user interactions such as search history, browsing history, etc.,
  3. Word vectors derived from sites or documents visited and various well-known information retrieval (IR) techniques,
  4. etc.

The patent tells us that POV searching may also be thought of as personalized searching, and POV data may correspond to personalization data such as the types of information described above or many other types of data.

POV data could also come from the user, from the browser software, or from other sources.

In one version, POV data may include URLs that act as examples of on-topic or off-topic documents. Here are places were URLs could come from:

  1. Specifying positive and negative URLs — A user of this system might specify URLs that are related to jaguars, but include negative URLs related to Jaguar brand cars, so that those results aren’t shown.
  2. Present Page URLs — Using a toolbar search, the URL of the page presently being displayed by the browser might be seen as an example of an on-topic URL.
  3. Users “bookmarks” or “favorites” saved in a browser could also be used as examples of on-topic URLs. These could be automatically collected by the search engine, or presented to the searcher to select one or more URLs to use in a search.
  4. Previously selected URLs — pages that a searcher has previously clicked upon in search results within a top certain number of results for a previous query, or pages that the searcher has visited recently in some time period, or URLs that can be somehow determined to be related to these URLs.

The term “recently” might be defined a few different ways, like within a certain period of time based upon a user’s browsing habits and browsing history, or it could refer to a certain number of the last documents viewed by the searcher.

It’s possible that these “recent” documents might be filtered to determine if they are on-topic, or even clustered to see what clusters of URLs are relevant to a query so that those can be used.

How the set of on-topic URLs might be expanded

The search engine might attempt to expand the set of on-topic URLs to provide more results to a searcher by:

  1. Obtaining a few key words, either from the examples of on-topic URLs or from the user.
  2. Performing a search to obtain a collection of additional on-topic URLs.
  3. Identifying URLs that are co-cited with the set of on-topic URLs, and adding the co-cited URLs to the set of on-topic URLs.

Expanding POV data with words that identify on-topic or off-topic documents

The search engine may obtain additional words from an example set of URLs, from the user (either explicitly or implicitly), or from personalization data. A user then could specify positive and/or negative key words that identify URLs that are on-topic or off-topic.

Expanding POV data with categories.

These categories may be similar to those seen in hierarchical directories, such as the open directory, which could be maintained by the search engine. For each of the categories in that hierarchical directory, the search engine may record a list of URLs and other information that correspond to that category. These categories could be presented to a searcher so that they could select the ones that are relevant.

Expanding POV data with user information

Point of view data could be expanded with the use of URLs taken from previous searches, and keywords taken from documents, or anchor text pointing to those documents, that a user clicked upon in search results, or that appeared in the a top number of results for a search by the user.

These URLs and words could be stored in a database, so that when a user performs a POV search, the key words can be used to automatically augment the query.

For example, if someone had previously performed a search for “tigers” and “elephants,” the term “mammals” might be one of the key words extracted based on their searches. When they later search for “jaguar,” then “mammals” might be automatically added to their search query.

Conclusion

Will we see additions to Google’s advanced search such as the “on Topic” and “Off Topic” URL fields as shown in the screenshot? I’m not sure that we will.

The checkbox in the image labeled “include stored data” really doesn’t make it clear that they will be using personalized information for a searcher. I think that having someone sign in to personalized search is a better alternative.

Using categories in a search might be difficult, especially since many pages can fall under more than one category.

The patent goes into more detail on how pages might be ranked in these searches and has an interesting discussion of PageRank and how on-topic pages could be ranked higher than off-topic pages.

Share

17 thoughts on “The Google Advanced Search that Could Have Been”

  1. There’s always that fine line between “Offering me stuff based on what you know about me though I didn’t know you knew” and “Here’s my intraweb and harddrive and firstborn, do what you will cause I trust you.”

    I’d like to think that the ultimate goal is always getting the information to the person, but that’s my naive and innocent part. The rest of me knows that it’s All About the Benjamins.

  2. That’s definitely one of the big issues around personalized search.

    When I read in the patent that they would consider grabbing bookmarks from your browser, it made me pause a little. I’m not sure if that struck me as innocence or arrogance in the way that it was presented, but I really wouldn’t want Google or any search engine accessing my browser’s bookmarks.

  3. Really interesting patent. One things that sticks out is this bookmarking issue. Unless people are using a really low-end web browser that allows bookmarks to be grabbed, how do they get this info? Is it through the Google toolbar? I know that bookmarks can be accessed in Firefox through the chrome, but I’m not sure what the situation with IE is. Good browsers are pretty much locked down for nasty JS such as cross-site AJAX (which you could use to purchase stuff for free if people are using and AJAX and session based cart), browsing history and bookmarks. Saying that the new BETA for Firefox 3 supports cross-site AJAX, but I guess there is some security in place.

    On another note, what are your thoughts regarding the hardware requirments of the possible use of the personalisation mentioned in this patent? It is my understanding that a huge ammount of search results are just pre-generated HTML (correct me if I’m wrong). I think the reason why we have seen so little in regards to personalisation is partly due to this and partly due to the fact that search engines don’t want to be embaressed by having irelevant search results. I liked the example you posted a month or so back about a programming professor who did a lot of searches about Java and on going to Java he searches for busses in Java, but was return a load of programming guff. Also lets say Google has sussed out that I’m interested in SEO, how are they going to deduce that I think SEO By The Sea should be on page one rather than page five?

  4. There’s some detail in the patent on bookmarks, but not much about actual implementation:

    In yet another implementation, the browser software may provide the URLs included in the user’s “bookmarks” or “favorites” list to the search engine 125 for use as examples of on-topic URLs. In this case, the browser software may automatically provide the list of URLs to the search engine 125 or may present the list to the user for selection of one or more of the URLs prior to providing the list to the search engine 125.

    The rate of growth of Google’s data centers probably shouldn’t come as a surprise – I would suspect that the popularity of Google’s present day personalized search (according to Marissa Mayer, it’s one of the fastest growing services they have), would mean that they are looking at a lot of data.

    A lot of search results that we see are likely cached copies of results pages, but I have no idea how often they might be refreshed. The idea behind having a supplemental index is so that it doesn’t need to be searched unless there aren’t enough results for a query in the main index.

    A presentation that Marisa Mayer gave earlier this year noted that a query could go through 300-700 different computers at Google from the input of a search to the serving of results. That’s pretty quick. Keeping processing requirements down, and results fast is likely a very strong factor in determining what we see, and how much of it might be preprocessed.

    One of the recent patent filings I wrote about from Google discussed how they might create “group” profiles for searchers (a searcher can belong to different groups within different topic areas), and doing some grouping of searchers like that may both do something to help protect privacy of a searcher, and to make the process of delivering results a little faster since there would be less need to individualize every result for every search.

    So, folks interested in SEO who tend to pick similar pages for similar queries might be grouped together. It means more processing, and less caching of results, but could deliver better results (or at least more personalized ones.)

  5. There is however a custom search if you use the search parameters
    like
    allinurl: “keyword+keyword” site:.com
    etc but not many know of this and its a bit more for computer savvy users.

  6. Thanks, Dollar.

    Using search operators like that can be really useful in filtering and finding information. The combination that you suggest helps people find results when the keyword terms are in the address of the website. Using another one like this will find pages where the terms are in titles of pages:

    allintitle: “keyword+keyword” site:.com

    I will sometimes use a similar approach to only find results from educational sites (.edu), or “.org” sites, too.

    It’s worth exploring how helpful some of these alternative searches can be.

  7. Grabbing bookmarks from the browser…shocking but sure wouldn’t put it past them. I do thinks a bid oddly. I hate the browser storing bookmarks so I view in IE via my Firefox plugin, right click and click “create shortcut”. This puts an icon shortcut on my desktop that I keep in an organized folder. It’s just easier for me that way. I wonder if there’s anything out there that tracts those sort of actions other than CTRL-D. Any ideas?

  8. Hi Jordan,

    That’s a pretty creative way to save links to pages. I don’t think that I would put it past one of the search engines to grab bookmarks from the browser either.

    I can recall a few lists of browsing activities in patent filings that the search engines might be tracking that I didn’t expect, such as printing pages or saving them, but no mentions of creating shortcuts to the pages.

    I found myself surprised that a search engine might be able to tell how far down a web page a person might scroll, yet I’ve seen that referred to a few times, too. I’m guessing that the question isn’t so much could they track that kind of behavior, but rather, would they think to measure it.

  9. Pingback: SiteMost’s Blog Recap 28/11/07 at Brisbane SEO Blog
  10. well, here we are in 2008, and google has found a more advanced search for the sites.

    I really wouldn’t want Google or any search engine accessing my browser’s bookmarks.

  11. Thanks.

    A lot of people do allow Google to provide them personalized search, which looks at a lot of information involving past search and web browsing history. Bookmarks are only a small piece of that whole puzzle.

  12. Interesting patent.

    I agree William, Google offers a web history tool which tracks your every search, so a lot of your information including bookmarks are readily available to Google.

  13. Hi Firebubble Design,

    It can be difficult to get searchers to go to an advanced search, and make explicit choices to possibly filter results. Recently, I have been seeing Google tell me that they have “personalized” my search results based upon some past searches that I’ve performed, and that I can see results without the personalization by clicking upon a link.

    Instead of having me explicitly choose to search with personalization turned on, they are making me opt out of the personalized results. I’m not certain that they are including bookmark and search history information in those searches. But it is likely included if I log into Google’s personalized search.

  14. POV searching sounds interesting, I think in the next 5 years we will see a radical change to the way SERPS are given to us – at the moment it just feels the top 2 pages of links (which after all we only ever see) is just a list of who works hardest at SEO rather than naturally good sites all the time.

    So they must feel something has to change at some point.

  15. Hi JPink,

    I agree that we may start seeing some radical changes to search results. I’m not convinced that the top 2 pages of links in search results are often a result of who works hardest at SEO for many of the queries I use, but then again I’m all for most sites being created with some knowledge of how search engines work, and think of “naturally good” sites being ones that are search engine friendly and intelligently structured and created. See my post on good SEO.

  16. This is really interesting. Imagine if it had been like this, but who know, Google changes the way it indexes and searches every day, so maybe this was one of many ideas that where drawn up that day. Interesting post, thanks.

  17. Hi GaryR,

    Google experiments with so many different things, and makes so many changes that I wonder if we still might see an advanced search from them that could be used to include or exclude certain categories. I think I would like it if they did.

Comments are closed.