Google Gets Smarter with Named Entities: Acquires MetaWeb

You may know him by a number of names or titles – Governor of California, Terminator, Governator, Conan the Barbarian, Kindergarten Cop, Mr. Universe, Mr. Olympia, Arnold Strong, Arnie, The Austrian Oak.

To Metaweb, Arnold Schwarzenegger is referred to as 9202a8c04000641f8000000000006567.

Who is Metaweb?

Metaweb is a company recently acquired by Google, and they’ve created a system of indexing named entities that allow you to search for information in a new way. Actually, the idea sounds a little like a library’s dewey decimal system, but for named entities. Why is this important, and what is a Named Entity?

A named entity is a specific person, place, or thing. For example, named entitles can include Barack Obama, or the Commonwealth of Virginia, or the Great American Ballpark in Cincinnati. Associating unique identification numbers with named entities can make it easier to index them, and to find information about those named entities when they might be referred to by different names, like my example above about Arnold Schwarzenegger. They can also help with local search, by allowing specific places or businesses or landmarks to have unique identification numbers.

How often do named entities appear in Web searches? A recent paper from Microsoft, Building Taxonomy of Web Search Intents for Name Entity Queries (pdf) tells us that they are pretty common:

According to an internal study of Microsoft, at least 20-30% of queries submitted to Bing search are simply name entities, and it is reported 71% of queries contain name entities.

Google announced their acquisition of Metaweb in an Official Google Blog post, Deeper understanding with Metaweb. Metaweb also announced the acquistion in their post, Metaweb joins Google

Metaweb has a number of patent applications assigned to them at the United States Patent and Trademark Office, and they are worth diving into if you want to learn a little about some of the technology behind the company.

I’ve just started looking at them myself, beginning with the one below on “Query Optimization,” which is where I found the Metaweb ID number of Arnold Schwarzenegger. The patent filing describes how an ID number can be used to collect and store data about named entities, and information associated with them, and how queries can be performed based on that collected information.

Here are the patent filings assigned to Meta Web:

Automated online purchasing system
Invented by W. Daniel Hillis, Bran Ferren
US Patent Application 20030195834
Published October 16, 2003
Filed: September 18, 2002

Meta-Web
Invented by W. Daniel Hillis, Bran Ferren
US Patent Application 20040210602
Published October 21, 2004
Filed: December 15, 2003

Personalized profile for evaluating content
Invented by W. Daniel Hillis and Bran Ferren
US Patent Application 20050131918
Published June 16, 2005
Filed: May 24, 2004

Delegated authority evaluation system
Invented by W. Daniel Hillis and Bran Ferren
US Patent Application 20050131722
Published June 16, 2005
Filed: May 25, 2004

System and method to facilitate importation of user profile data over a network
Invented by W. Daniel Hillis and Bran Ferren
US Patent Application 20060095780
Published May 4, 2006
Filed: October 28, 2004

User Contributed Knowledge Database
Invented by Timothy Sturge, Kurt Bollacker, Robert Cook, John Giannandrea, Nicholas Thompson, Edwin Taylor
US Patent Application 20090024590
Published January 22, 2009
Filed: April 22, 2008

Graph Store
Invented by Scott Meyer, Jutta Degener, Barak Michener, John Giannandrea
US Patent Application 20100174692
Published July 8, 2010
Filed: January 20, 2010

Database Replication
Invented by Scott Meyer, Jutta Degener, Barak Michener, John Giannandrea
US Patent Application 20100121817
Published May 13, 2010
Filed: January 20, 2010

Query Optimization
Invented by Scott Meyer, Jutta Degener, Barak Michener, John Giannandrea
US Patent Application 20100121839
Published May 13, 2010
Filed: January 20, 2010

Knowledge Web
Invented by W. Daniel Hillis and Bran Ferren
Assigned to Metaweb Technologies, Inc.
US Patent 7,502,770
Granted March 10, 2009
Filed April 10, 2002

Conclusion

Metaweb operates the community based site Freebase, which is a community-based source of data about different people, places, and things. For a great example of how they collect and display data, see their page on George Washington.

What will Metaweb bring to Google?

That remains to be seen, but it’s possible that Metaweb’s technology might help make it easier for Google to associate information with named entities. As the Microsoft paper I mentioned above noted, searches for named entities make up a good percentage of searches on their search engine. Chances are that searches for named entities are fairly popular on Google as well. So the impact of the Metaweb acquisition could potentially be a large one.

Share

27 thoughts on “Google Gets Smarter with Named Entities: Acquires MetaWeb”

  1. Never came across Metaweb before – but seems Google is getting smarter by the day and is acquiring anything that might pose a question mark on its supremacy even before those companies/properties become serious player.

  2. Hi John,

    I hadn’t heard of Metaweb before this acquisition either. Spending some time reading through some of their patent filings, I think they have some pretty interesting ideas. It’s hard to tell if they acquire the company to use its technology, or to “hire” the people working for them, or both.

  3. Hi Andrew,

    I think the potential is there for the acquisition to help improve what Google is doing. It sounds like Google isn’t going to make any changes to the Freebase site that metaweb runs, so whatever happens with the acquisition is more likely to impact Google’s search results. We may have to wait a while to see the impact of this purchase.

  4. Hi Bill,

    Well, I agree with John(the first commenter). Perhaps Google just see MetaWeb as a threat to their dominance. This is what these monster companies do right? Buy up the competition the moment they pose any kind of threat? Or am I just being a little cynical? ;-)

    Greetings from Spain.

    Rob

  5. Hi Rob,

    Thanks. It’s nice to meet you.

    There’s a possibility of that, though Google does have a significant head start over Metaweb in many aspects of search, and I’m not sure that they really could have been perceived as a threat to Google at this point in their life cycle.

    I would suspect that the chance to work with Metaweb, and use the technology they developed had to be pretty attractive to Google, however.

    Another possibility that someone like Microsoft may have targeted Metaweb if Google didn’t. :)

  6. After never coming across Freebase before, I paid a visit there and the one page I went to (the Boston Red Sox) was 12 months out of date in places. Unless you’ve got the critical mass of visitors to self edit a site, like Wikipedia for all it’s faults does, then even the backing of Google will in no way guarantee success.

  7. Hi Steve,

    I hadn’t seen Freebase before, either. I noticed some areas that were light on data as well – I’m wondering if they will have more people getting involved in adding to that data now that Google has acquired metaweb. I’m not sure that acquiring Freebase was Google’s main objective in acquiring the company, however.

  8. Google is making the right move when purchasing meta web. the semantic web is the next generation of the internet, where search engines stop looking for words and starts to understand truly what we are looking for.

    so far i didn’t see any semantic database as serious as freebase. it looks as if while microsoft and bing search alliance is coming up google still takes a step ahead into the semantic web.

    Thanks for the brilliant post.

  9. Looks like Named Entities might be another thing that will become a ranking factor in search results. This was definitely a valuable acquisition for Google.

  10. actually my friend, more that you could imagine, i have researched the latest patent that google had released on may this year, after 4 years of waiting. some of the features in the algorithm are able to calculate a person’s quality and expertise level in his area.

    This means that if many SEO people will tend to visit your site and quote you and your articles it will mean much more than a bunch of bogus bookmark accounts with no clear entities.

    i have wrote the article in hebrew – what google really knows about surfer behavior

    i have tried to use google translate for people to read it. it is a bit weird but the message can be understood well.

    bill if you see this, not in the quality that it will assist anyone here, don’t think i am trying to earn a link !
    you can just remove it, though i think translating this piece of information to english could be important to anyone,
    it took me 6 hours to read all related documents and experiments to extract this information.

  11. Hi Duran,

    It’s funny, but reading the first sentence of your comment, I think you could have made the same statement back when Google purchased Applied Semantics. In many ways, what they offer in the area of organic search is an approach that looks less at keywords and more at the meaning behind those words.

    Microsoft has been working on an object-level (pdf) search approach for a few years, and you can see it in action at Microsoft Academic Search. They have a few other papers on this kind of object level ranking, including how it can be used in other kinds of vertical searches such as for products.

    Google has also allocated a lot of time and effort to fact extraction on web sites about specific people, places, and things, and it’s possible that they may take what they’ve acquired from Metaweb to build something that goes beyond what Freebase offers presently.

  12. Hi Alex,

    I’m not sure if this approach to named entities might translate into another ranking factor as much as it might signify a different approach to collecting and indexing information found on the web. Rather than helping to rank web pages presently, it focuses upon how “facts” found on pages about a specific person, place, or thing might be collected, organized, and presented to searchers.

    Combine the technology that Google acquire from Metaweb with the technology that they acquired in their acquisiton of Transformic, and this could potentially be a very valuable acquisition.

  13. Hi seo academic,

    Yes, I mentioned Microsoft’s Academic search a couple of comments above yours as an example of how Microsoft isn’t sitting still on getting smarter about named entities as well.

  14. Yesterday I read an article on Business Insider and there it was mentioned that Google acquired about 50+ companies this year. Its a gigantic growth rate. Hats off to Google..

  15. Hi Geek Revealed.

    I read that article too. I wish that there was a way to find out more about many of those acquistions – most of the details about them really weren’t made public, and the names of most of the companies involved are unknown. It is a gigantic growth rate.

  16. I’m a big fan of Arnold. Just wondering why he quit showbiz and choose a political career. I mean not totally quit but.. you know what I mean..

  17. Hi RJ,

    Arnold has seen some tough times lately in the public eye, but his story is pretty interesting, and if he came out with an autobiography, I’d read it.

    If you read the Wikipedia article about Arnold, there seems to be a little contradiction in his choice of getting involved in politics. One biographer states that Arnold planned getting involved in politics by using bodybuilding and then a career in politics as building blocks for gaining political office. Another section of the entry implies that Arnold wasn’t even seriously considering running for governor of California until he stated he would during an appearance on the Jay Leno show.

    There are rumors out there that he is considering getting back into movies.

Comments are closed.