We can make your web site easier to find, and easier to use.

Google's Quiet Acquisition of Transformic, Inc.

Update (2008-4-13) – It’s nice when an old post gets pointed to because of recent changes and developments. The official Google Webmaster Central Blog has a new post co-written by Jayant Madhavan and Alon Halevy about technology that they’ve “recently been experimenting with,” in Crawling through HTML forms. It looks like the promise of Transformic’s technology is living on at Google.

Anand Rajaraman, a friend and co-author and former intern to Alon Halevy, provides more details about this development from Google in his Datawocky post, The story behind Google’s crawler upgrade. Google Operating System adds some additional thoughts in Google Starts to Index the Invisible Web

*****

Reading a Washington Post article, Google Goes to Market, I noticed a company name in a list of Google acquisitions made that I didn’t recognize, and hadn’t included in my list of Google Acquisitions.

The company is Transformic, Inc., which ran Everyclassified.com. The aim of Transformic was to build search engines for the deep web – that part of the web that search engines had problems crawling and indexing. Everyclassified.com was an example of the ability of Transformic to gather information from other web sites that search engines had difficulties with, offering access to hundreds of classified sites found on the web.

A blog post from January of 2006, Google Base to Classifieds, from My RantSpot, discusses the acquisition. The blog appears to be written by a student who may have taken a class with the man behind Transformic – Dr. Alon Halevy.

Dr. Halevy has a long history of working with databases and the web, including developing a number of patents when he worked at AT&T Bell Laboratories, and then at AT&T Laboratories, and a very large number of research papers (A few of them here.). He began teaching at the University of Washington in 1998, where he developed a company named Nimble Technology, which he sold to Actuate in August, 2003. In 2004, he started Transformic, Inc.

The Acquisition

The Washington Post lists the acquisition date as September 15, 2006 – but that’s probably a typo, and the year is more likely 2005. It appears that Dr. Halevy has been working for Google since before September, 2006, including working on a number of papers and presentations for them:

Answering Structured Queries on Unstructured Data (pdf)
WebDB ’06, June, 2006, Chicago, Illinois USA

Principles of Dataspace Systems (pdf)
PODS’06, June 26 – 28, 2006, Chicago, Illinois, USA.

Data Integration: The Teenage Years
VLDB `06, September, 2006, Seoul, Korea.

When Semi-structured data meets the web (ppt no longer available)

Dataspaces: Co-Existence with Heterogeneity (ppt)

You can get a sense of why the folks at Google might have been interested in working with Dr. Halevy from his paper in ACM Queue, Why Your Data Won’t Mix, published in October, 2005:

The need for flexible data-sharing systems, within and across enterprises, is only in its infancy. The tools we have today lag far behind customer needs. The problem is only exacerbated by the fact that much more of the data we need to manage is semi-structured and is often the result of trying to extract structure from unstructured data. Hence, we need to manage data where the values, attributes names, and semantics are often uncertain.

Going forward, there are two major challenge areas: dealing with drastically larger schemas and dealing with vastly more complex data-sharing environments. In both of these areas, we may have to change the way we think.

From the archived Transformic Company Mission page:

The mission of Transformic is to lead the data management market to its next natural step: easy and large-scale data sharing and integration. Transformic offers the technology needed to produce the semantic glue among data sources. The Transformic Tools may be embedded in any data sharing and integration context, including but not limited to Enterprise Information Integration, online retailing, XML messaging, and enterprise meta-data management.

It’s difficult to tell what role Transformic and Dr. Halevy have had in the development of Google, and systems like Google Base and Google Coop, but it seems likely that they have played a part.

Patents Co-invented by Dr. Halevy

Alon Halevy is listed as one of the inventors on the following patents. I’m including them here to provide a glimpse of the knowledge and expertise of Alon Halevy. Google didn’t acquire these patents with the purchase of Transformic, and the hiring of Dr. Halevy, but it’s an impressive range of scholarship on database technology displayed in these patents.

Method and apparatus for optimizing database queries involving aggregation predicates (6,088,524)
Granted July 11, 2000
Assigned to Lucent Technologies, Inc.

Information manifold for query processing (5,995,961)
Granted November 30, 1999
Assigned to Lucent Technologies Inc.

System and method for obtaining complete and correct answers from incomplete and/or incorrect databases (5,987,450)
Granted November 16, 1999
Assigned to AT&T

Method and apparatus for web site management (5,956,720)
Granted September 21, 1999
Assigned to AT & T Corp

Method and system for using materialized views to evaluate queries involving aggregation (5,897,632)
Granted April 27, 1999
Assigned to AT&T Corp

Integration of an information server database schema by generating a translation map from exemplary files (5,778,373)
Granted July 7, 1998
Assigned to AT&T Corp

User interface for information retrieval system (5,768,578)
Granted June 16, 1998
Assigned to Lucent Technologies Inc.

Query optimization by predicate move-around (5,659,725)
Granted August 19, 1997
Assigned to Lucent Technologies Inc.

Apparatus and methods for retrieving information (5,655,116)
Granted August 5, 1997
Assigned to Lucent Technologies Inc.

Apparatus and methods for retrieving information by modifying query plan based on description of information sources (5,600,831)
Granted February 4, 1997
Assigned to Lucent Technologies Inc.

LinkedInPinterestStumbleUponShare

14 comments to Google’s Quiet Acquisition of Transformic, Inc.

  • [...] Transformic, and what this company can and might bring to Google. From Bill’s post, Google’s Quiet Acquisition of Transformic, Inc.. Reading a Washington Post article, Googl [...]

  • [...] Google Acquires Transformic Inc. As William points out, Google has quietly acquired Transformic Inc. Earlier this year, it purchased the rights to [...]

  • monks

    This is VERY interesting…they are smarty pants…trying to fix the broken search features….

  • Hi monks,

    I’m not sure that acquiring Transformic and employing Dr. Halevy is an effort to “fix” search features as much as it is to give Google expertise from someone who has been working on deep web search, databases, and adding organization to unstructured or semi-structured data for more than a decade.

    The expertise that Dr. Halevy appears to bring to Google seems to be more related to extracting information from other sites and putting them into a more structured format – like you might find in Google Local, or Google Base.

    It’s a little different approach than trying to index and serve pages on the web – focusing more on extracting and serving information in a meaningful manner.

  • [...] forms into common categories of classified ads. His company, Transformic Inc., which was acquired by Google probably in September of 2005, was the creator of everyclassified.com. Email author | [...]

  • [...] larios web, en categorias comúnes de anuncios clasificados. Su empresa, Transformic Inc., que fue adquirida por Google probablemente en Septiembre de 2005, fue la creadora de everyclassified.com. Tradu [...]

  • [...] I’ve written a full blog post about this acquisition – Google’s Quiet Acquisition of Transformic, Inc. [...]

  • Jason

    Good article.

    Wanted to point out that the Washington Post does list the acquisition as Sept 15 2005, it just so happens to be the final acquisition listed for 2005 so 2006 appears to be a part of the date when its just starting the next years listings. Notice the difference in font. !:)

    But yeah thanks for the article.

  • Thanks, Jason.

    It looks like I did read the table incorrectly back when I wrote this post in 2006.

    I will be writing a followup post now that Google appears to be using the technology involved. A patent application had been filed by Google which describes something similar to what was mentioned in the Google Webmaster Central Blog, so I want to discuss that a little.

  • [...] Google в блоге Центра для вебмастеров объявил о том, что его робот в тестовом режиме индексирует страницы, доступ к которым возможен только через HTML-формы. [...]

  • [...] back. Years ago, Anand’s VC firm, Cambrian Ventures, funded a company that Alon founded called Transformic Inc. Transformic, which built technology to crawl HTML forms, was later acquired by Google. Alon joined [...]

  • Google, Kosmix, and The Deep Web – A Love Triangle « AltSearchEngines

    [...] back. Years ago, Anand’s VC firm, Cambrian Ventures, funded a company that Alon founded called Transformic Inc.HTML forms Transformic, which built technology to crawl , was later acquired by Google. Alon joined [...]

  • [...] and hence got handful patents too  which were owned by Neven Vision. Google also brought Transformics in 2006 which enabled it to index the pages its Google crawlers were not able to – basically [...]

  • [...] of the inventors listed on the patent is Alon Halevy, who came to Google with the acquisition of Transformic, and has worked on projects involving Google’s efforts to extract and organize data about [...]

Comments Policies

  • Relevant comments on the topic of a post are very much appreciated.
  • Please use your personal name rather your business name or keywords in the name field.
  • Comments filling the name field with anchor text to spam this site and search engines (in English or any other language) may be edited, have URLs removed, or deleted entirely.
  • If you include a link in the website field, please choose one about you rather than some product or service or site or blogpost that you are promoting.
  • No signature links in comments, please.