Google Archives to Appear Soon?

Garrett Rogers, at Googling Google, writes of new domain names registered to Google, including a number that hint at a Google Archive service, perhaps similar to that offered by the Internet Archive’s Wayback Machine.

Some comments in a post over at ResourceShelf take up the idea, and offer some additional commentary in Yet Another Day and More Google Domain Names.

A patent application assigned to Google in June, which was published this May – Multiple index based information retrieval system, describes an archive system that Google could offer. Here’s a snippet:

[0157] c) Indexing Instances of Documents for Archival Retrieval

[0158] Another embodiment of the present invention allows the capability to store and maintain historical documents in the indices, and thereby enable archival retrieval of date specific instances (versions) of individual documents or pages. This capability has various beneficial uses, including enabling a user may search for documents within a specific range of dates, enabling the search system 120 to use date or version related relevance information in evaluating documents in response to a search query, and in organizing search results.

I wrote some about this document the day after it was published in a blog post titled Google Aiming at 100 Billion Pages? As I noted there, the inventor listed in the patent filing is Anna Patterson, who developed a beta search engine for the Internet Archives before it was removed from the site, sometime around when she went to work for Google. There’s a possibility that the technology she developed accompanied her in the move to Mountain View, according to a news article from Stephanie Olsen:

Stanford continues in its role as a breeding ground for search projects. Since 2003, Google has purchased at least two projects hatched at Stanford–personalization search tool Kaltix and a project from Anna Patterson, a Stanford computer science research associate.

So, Google has an employee with experience in building a search system for such an archive system, they have intellectual property assigned to them that describes such as system, and it’s possible that they licensed or purchased a search system that successfully worked as a beta in performing searches on the Internet Archive.

The newly registered domain names may be in reference to some other type of service that offers historical records of things like newspapers, magazines, and other periodicals. But, Google seems to have a lot of pieces in place to offer an Archive system like the Internet Archives.

Anna Patterson is the listed inventor on a number of patent applications that appear to be related. The first two below are included in the USPTO Assignment Database, with assignments recorded to Google, while the remainder aren’t.

In addition to an archive system, these patent applications also include such things as a description of a supplemental index, personalization methods, presentation of search results, classification of documents to topics, and an annotation method that could lessen the impact of Google bombing.

Added 9/6/2006 - Google has added a news archive search. Garrett Rogers writes more about it in Google News Archive Search released today. I’ve tried it out – News Archive Search – and it is a nice addition to the searches that Google provides. It has more than news sources. For instance, Ancestry.com told me that my grandfather’s WWI draft registration card is available for me to see if I sign up with them.

I’m not seeing many results in the News Archive Search that don’t require a subscription or a pay-per-view fee.

Share

8 thoughts on “Google Archives to Appear Soon?”

  1. That’s such a natural thing for them to do. Presumably they would do a more robust job than archive.org. I can see how they can easily get all the future data. However how would they get all the back data? Does this mean they’re forced to buy the archive.org database? I imagine it would create another superb advertising vehicle for Google, which is what they’re all about really. Interesting conjectures.

  2. Some excellent questions, Barry.

    I wonder how much that database would cost. The about page from the Internet Archive notes that they have been receiving donations of data from Alexa and other unnamed sources. No telling who those other sources are. Has Google been one of them?

    What are the chances that Google has been holding on to cached copies of pages for years? It’s a possibility.

  3. Every time people start putting domain names and patent applications together, the SEO world starts buzzing about some great new Google service that, when it eventually rolls out, proves to be drasticaly less broad than originally imagined.

    Look at Personalized Search. All the conspiracy fairies had people thinking that all queries would be tracked and used to determine generic search results.

    Look at how the so-called Google Office has failed to materialize (much less to knock Microsoft Office back to the stone age where it belongs).

    Where is the Google Browser that the conspiracy fairies promised us?

    Google Checkout turned out to be anything other than a Paypal killer.

    On first glance, I’ll chalk up this latest combo as an indication that Google may offer a versioning service to the Creative Commons, Open Source, and business project development communities. The conspiracy fairies may not appreciate my bucking the trend, but the last few disappointments lead me to conclude that they probably don’t inspire us to reach the right conclusions anyway.

  4. Hi Michael,

    I’m pretty impressed with the patent documents from Anna Patterson, and the fact that she had a working beta model of such a system in place at the Internet Archive.

    Out of any of the things that you mention the one that seems like it could be most feasible, would be a Archive system.

    I agree with you that the buzz does get hard to take when the final results don’t live up to expectations.

    The Google Coupons setup ended up being much less grand than the patent application that described such a system. Of course, there’s room for it to expand and blossom into something like the patent filing suggests, and maybe we will see the micropayment system rolled into Google checkout that another patent suggests.

    I guess there’s a difference between what you could possibly do, which is sometimes captured in the patents and seemingly captured in pointers to the development of those systems, like domain name registrations, and what actually gets released to the public.

    The realm of possibilities meets up with difficulties of realistic business models, competition, and things like compromises based upon alliances – Google and eBay working together may be better than Google and eBay competiting with each other, for instance.

  5. Google Archive looks like nice idea, specially when they are using Way Back Machine instead of using their own Servers.

    Way Back Machine can earn through google, they will also get media attention and popularity by partnering with the Search Giant.

  6. It could be, Ed.

    It’s difficult in citing web sites when creating a research paper, too. It’s fine to provide a “last accessed date” for something like that, but almost meaningless if you can’t see the content quoted within context when a page has changed.

Comments are closed.