All Your Knowledge Bases Belong to Google
In a Google Inside Search blog post, Introducing the Knowledge Graph: Things, not strings we’re told of a new initiative from Google to show us more information within search results themselves about the things we search for. This is a potentially paradigm shifting view of what a search engine does. The post tells us:
The Knowledge Graph enables you to search for things, people or places that Google knows about—landmarks, celebrities, cities, sports teams, buildings, geographical features, movies, celestial objects, works of art and more—and instantly get information that’s relevant to your query. This is a critical first step towards building the next generation of search, which taps into the collective intelligence of the web and understands the world a bit more like people do.
It’s not a surprise that Google’s been working towards reinventing themselves and what they do. With an increased emphasis on social and real time search results, Google’s been transforming themselves into a way to monitor activities and events in the world as a near real time monitor, rather than just a repository of links to web pages that might satisfy situational and informational needs.
With this move towards displaying information directly within search results about specific people, places, and things, Google is tapping into resources like Wikipedia, Freebase (which Google acquired when they purchased Metaweb), and other places on the Web, as well as the knowledge derived from what people are searching for, and how they might do things like refine search queries.
A recent paper from three Google Engineering team members, Extracting Unambiguous Keywords from Microposts Using Web and Query Logs Data (pdf) even provide details on how they might be working to better understand the ideas and concepts expressed in Tweets and Status Updates and other social media, from statistical analysis performed on documents found on the Web and information gleaned from their query logs. So, social media is also potentially a source of the kind of information that could be included within a knowledgebase as well.
A patent application that Google published in August of 2010, Identifying Query Aspects hinted at this kind of knowledge base search and how it might be used to transform search results. I wrote about it when it was published in the post, Google and Metaweb: Named Entities and Mashup Search Results?
The inventors listed in that patent application have also written a whitepaper that describes how combining information from knowledge bases like Wikipedia and Freebase about “named entities,” or specific people and places and things, with information from search query logs could help to identify different aspects of those named entities, and help the search engine decide what to display about them. The paper is Identifying Aspects for Web-Search Queries, and it was initially published in the Journal of Artificial Intelligence Research in March of 2011.
The paper gives us an idea of why Google might have decided to move towards this knowledge base model of search with the opening statement:
Many web-search queries serve as the beginning of an exploration of an unknown space of information, rather than looking for a specific web page. To answer such queries effectively, the search engine should attempt to organize the space of relevant information in a way that facilitates exploration.
So how do these knowledge base results help “facilitate exploration?” How does Google decide what to show about about specific people or places or things? The paper tells us that it looks to at least a couple of sources to understand different “aspects” of a particular named entity:
Aspector combines two sources of information to compute aspects. We discover candidate aspects by analyzing query logs, and cluster them to eliminate redundancies. We then use a mass-collaboration knowledge base (e.g., Wikipedia) to compute candidate aspects for queries that occur less frequently and to group together aspects that are likely to be “semantically” related.
It really shouldn’t come as a surprise that Google would venture off into this direction. It’s an interest on the part of Google that could be seen as far back as Sergey Brin’s publication of the paper Extracting Patterns and Relations from the World Wide Web (pdf) in the 90s around the time that he and Lawrence Page worked to transform their Backrub search engine into Google.
Google projects like that described in the paper WebTables: Exploring the Power of Tables on the Web (pdf) also shows Google attempting to extract information from structured tables found within the unstructured pages of the Web to understand the semantic relatedness of data about people, places, and things. Google Squared was powered by such an analysis and understanding derived from projects like that one.
This knowledge base approach isn’t even very novel at Google if you look at how Google has been collecting and ranking results for Google Maps for more than a couple of years. In addition to looking at databases from telecom directories for information about distinct businesses and organizations at specific locations, Google has also been crawling the Web to find mentions of those businesses where geographic location information is also included.
Google may also determine that a particular site or page is the authoritative page for a business or organization, but there are businesses and locations and landmarks that don’t even have webpages that are included in Google Maps.
Google’s knowledge base results provide information about entities that might appear within queries, and may even anticipate and answer subsequent queries, but they are summaries that both provide value to searchers and may help lead to the further exploration of topics and ideas around aspects related to searches. If you’re a site owner, it might not hurt to be perceived as an authority for those topics.