Google’s Quiet Acquisition of Transformic, Inc.

Update (2008-4-13) – It’s nice when an old post gets pointed to because of recent changes and developments. The official Google Webmaster Central Blog has a new post co-written by Jayant Madhavan and Alon Halevy about technology that they’ve “recently been experimenting with,” in Crawling through HTML forms. It looks like the promise of Transformic’s technology is living on at Google.

Anand Rajaraman, a friend and co-author and former intern to Alon Halevy, provides more details about this development from Google in his Datawocky post, The story behind Google’s crawler upgrade. Google Operating System adds some additional thoughts in Google Starts to Index the Invisible Web

*****

Reading a Washington Post article, Google Goes to Market, I noticed a company name in a list of Google acquisitions made that I didn’t recognize, and hadn’t included in my list of Google Acquisitions.

The company is Transformic, Inc., which ran Everyclassified.com. The aim of Transformic was to build search engines for the deep web – that part of the web that search engines had problems crawling and indexing. Everyclassified.com was an example of the ability of Transformic to gather information from other web sites that search engines had difficulties with, offering access to hundreds of classified sites found on the web.

A blog post from January of 2006, Google Base to Classifieds, from My RantSpot, discusses the acquisition. The blog appears to be written by a student who may have taken a class with the man behind Transformic – Dr. Alon Halevy.

Dr. Halevy has a long history of working with databases and the web, including developing a number of patents when he worked at AT&T Bell Laboratories, and then at AT&T Laboratories, and a very large number of research papers (A few of them here.). He began teaching at the University of Washington in 1998, where he developed a company named Nimble Technology, which he sold to Actuate in August, 2003. In 2004, he started Transformic, Inc.

The Acquisition

The Washington Post lists the acquisition date as September 15, 2006 – but that’s probably a typo, and the year is more likely 2005. It appears that Dr. Halevy has been working for Google since before September, 2006, including working on a number of papers and presentations for them:

Answering Structured Queries on Unstructured Data (pdf)
WebDB ’06, June, 2006, Chicago, Illinois USA

Principles of Dataspace Systems (pdf)
PODS’06, June 26 – 28, 2006, Chicago, Illinois, USA.

Data Integration: The Teenage Years
VLDB `06, September, 2006, Seoul, Korea.

When Semi-structured data meets the web (ppt no longer available)

Dataspaces: Co-Existence with Heterogeneity (ppt)

You can get a sense of why the folks at Google might have been interested in working with Dr. Halevy from his paper in ACM Queue, Why Your Data Won’t Mix, published in October, 2005:

The need for flexible data-sharing systems, within and across enterprises, is only in its infancy. The tools we have today lag far behind customer needs. The problem is only exacerbated by the fact that much more of the data we need to manage is semi-structured and is often the result of trying to extract structure from unstructured data. Hence, we need to manage data where the values, attributes names, and semantics are often uncertain.

Going forward, there are two major challenge areas: dealing with drastically larger schemas and dealing with vastly more complex data-sharing environments. In both of these areas, we may have to change the way we think.

From the archived Transformic Company Mission page:

The mission of Transformic is to lead the data management market to its next natural step: easy and large-scale data sharing and integration. Transformic offers the technology needed to produce the semantic glue among data sources. The Transformic Tools may be embedded in any data sharing and integration context, including but not limited to Enterprise Information Integration, online retailing, XML messaging, and enterprise meta-data management.

It’s difficult to tell what role Transformic and Dr. Halevy have had in the development of Google, and systems like Google Base and Google Coop, but it seems likely that they have played a part.

Patents Co-invented by Dr. Halevy

Alon Halevy is listed as one of the inventors on the following patents. I’m including them here to provide a glimpse of the knowledge and expertise of Alon Halevy. Google didn’t acquire these patents with the purchase of Transformic, and the hiring of Dr. Halevy, but it’s an impressive range of scholarship on database technology displayed in these patents.

Method and apparatus for optimizing database queries involving aggregation predicates (6,088,524)
Granted July 11, 2000
Assigned to Lucent Technologies, Inc.

Information manifold for query processing (5,995,961)
Granted November 30, 1999
Assigned to Lucent Technologies Inc.

System and method for obtaining complete and correct answers from incomplete and/or incorrect databases (5,987,450)
Granted November 16, 1999
Assigned to AT&T

Method and apparatus for web site management (5,956,720)
Granted September 21, 1999
Assigned to AT & T Corp

Method and system for using materialized views to evaluate queries involving aggregation (5,897,632)
Granted April 27, 1999
Assigned to AT&T Corp

Integration of an information server database schema by generating a translation map from exemplary files (5,778,373)
Granted July 7, 1998
Assigned to AT&T Corp

User interface for information retrieval system (5,768,578)
Granted June 16, 1998
Assigned to Lucent Technologies Inc.

Query optimization by predicate move-around (5,659,725)
Granted August 19, 1997
Assigned to Lucent Technologies Inc.

Apparatus and methods for retrieving information (5,655,116)
Granted August 5, 1997
Assigned to Lucent Technologies Inc.

Apparatus and methods for retrieving information by modifying query plan based on description of information sources (5,600,831)
Granted February 4, 1997
Assigned to Lucent Technologies Inc.

Share

14 thoughts on “Google’s Quiet Acquisition of Transformic, Inc.”

  1. This is VERY interesting…they are smarty pants…trying to fix the broken search features….

  2. Hi monks,

    I’m not sure that acquiring Transformic and employing Dr. Halevy is an effort to “fix” search features as much as it is to give Google expertise from someone who has been working on deep web search, databases, and adding organization to unstructured or semi-structured data for more than a decade.

    The expertise that Dr. Halevy appears to bring to Google seems to be more related to extracting information from other sites and putting them into a more structured format – like you might find in Google Local, or Google Base.

    It’s a little different approach than trying to index and serve pages on the web – focusing more on extracting and serving information in a meaningful manner.

  3. Good article.

    Wanted to point out that the Washington Post does list the acquisition as Sept 15 2005, it just so happens to be the final acquisition listed for 2005 so 2006 appears to be a part of the date when its just starting the next years listings. Notice the difference in font. !:)

    But yeah thanks for the article.

  4. Thanks, Jason.

    It looks like I did read the table incorrectly back when I wrote this post in 2006.

    I will be writing a followup post now that Google appears to be using the technology involved. A patent application had been filed by Google which describes something similar to what was mentioned in the Google Webmaster Central Blog, so I want to discuss that a little.

  5. Pingback: Google, Kosmix, and The Deep Web – A Love Triangle « AltSearchEngines

Comments are closed.