Is Google Going to Marry their Knowledge Base with their Search Engine?

Google has been answering queries with its search engine for over 15 years, and has been showing us it can answer questions with facts from its Browsable Fact Repository and/or the Google Knowledge Graph.

Might Google at some point bring the two together?

To a degree, Google has been merging some results, showing a set of search results (from the search engine) and a knowledge panel (from the Knowledge Graph) on the same results page. But you could say that those are separate and unique entities on search results pages.


More recently, Google has been merging elements of both together as a single bundle, as noticed by Alex Chitu at Google Operating System in Inline Facts Next to Google Search Results, published on September 7th, 2014.

In that post he shows a couple of search results, one for [duchy of Amalfi] and one for [king of Rome]. In the search results is a query answering result from Wikipedia joined with a question answering result, also from Wikipedia.

query and question answering results fo duchy of Amalfi

query and question answering results to king of rome

I tried some other queries that might reasonably be answered with a Wikipedia page in response a search query, and with facts taken from the same Wikipedia page, or possibly even another one.

In response to a number of those, I received query results for some that didn’t have a set of facts after them, like the [duchy of Amalfi] and the [king of Rome] Wikipedia results did.

But on a search for something such as [mexico], there were quicklinks linking to topics or categories from Wikipedia pages. For example, the [mexico] query had the following in those links: History of Mexico, ‎Mexico City, ‎Enrique Peña Nieto, ‎Languages of Mexico.


For a query such as [oil spill gulf of mexico], the quicklinks shown include: Deepwater Horizon explosion, ‎Ixtoc I oil spill, ‎Timeline of the Deepwater, ‎Corexit. Each of those link to separate Wikipedia pages.

For a query such as [oil spills mexico], instead of links, I did get facts listed after the Wikipedia entry:

Wikipedia snippet with facts following

The difference between the pages that Google Operating System pointed out, and most of the ones I found is that the “facts” or topics that Google associated with my searches were bigger topics, possibly served better by a link to a whole page from Wikipedia than just some fact that could fit under the search result from the Wikipedia page.

A Matching Patent

The title to this Google patent, granted in 2011, seems to be a good fit for results snippets with both “query terms” and “answer terms”:

User Interface for Facts Query Engine with Snippets from Information Sources that Include Query Terms and Answer Terms
Invented by Andrew William Hogue
US Application 20110295888
Published December 1, 2011
Filed: August 9, 2011


A method and a system for providing snippets of source documents of an answer to a fact query are disclosed. Snippets of source documents may be provided in response to a user request for the source documents from which the fact answer to a fact query was extracted. The snippets include the terms of the fact query and terms of the answer. The snippets may be displayed along with Uniform Resource Locators (URLs) of the source documents.

The disclosed embodiments relate generally to queries for facts, and more particularly, to a user interface for a factual query engine and snippets of sources with query terms and answer terms.

Reasons to combine query answers with question answers

The patent starts out with a number of reasons why it might be used:

1. For question such a “what is the capital of Poland”, a question answering approach can provide a short succinct answer, likely to be correct. A search result is less likely to be short and focused.

In the image below, the title of the search result answers the question, but the snippet that accompanies it is pure fluff. So, the question answer OneBox and Wikipedia results answer the question quickly and well.


2. A single entry from an encyclopedia site might “limit the kinds of questions answered.”

The patent goes on to say:

For instance, a search engine based on an encyclopedia is unlikely to be able to answer many questions concerning popular culture, such as questions about movies, songs or the like, and is also unlikely to be able to answer many questions about products, services, retail and wholesale businesses and so on. If the set of sources used by such a search engine were to be expanded, however, such expansion might introduce the possibility of multiple possible answers to a factual query, some of which might be contradictory or ambiguous. Furthermore, as the universe of sources expands, information may be drawn from untrustworthy sources or sources of unknown reliability.

Right now, we are only seeing factual question results, or site links (see the result above for “mexico”) from Wikipedia. The knowledge panel patent for Google tells us that it tries to use at least 2 different sources for knowledge panel results to keep from having answers that are too limited, or don’t cover other things.

3. Is there is a benefit to having more than one source?

Having multiple answers means that a search engine could choose the best of those to display, or show more than one. Providing links to their sources, like the quick links do, enables a searcher to verify the answers and their sources.

As shown in this image from the patent, the query answer and the question answer could be kept separately, and each could be cached to help the search engine/knowledge graph answer questions quickly.

A flow chart from the patent showing separate caches for question and query answers

Is this a trend that we will be seeing more of, or just experimentation from Google?

12 thoughts on “Is Google Going to Marry their Knowledge Base with their Search Engine?”

  1. You make a great point. Got some great info here. I think that if more people thought about it that way, they’d have a better time get the hang of the issue.

  2. Awesome post Bill. I think this is very much a trend we’ll be seeing more of, and if this doesn’t result precisely in the marriage of search engine query answers and Fact Repository/Knowledge Graph factual answers to questions, I see a close civil union brewing.

    As the patent you reference suggests, relying solely on Wikipedia, Freebase and Wikidata still leaves billions of facts on the table, as it were.

    Of particular interest to marketers should be the patent’s observation that an encyclopedia is “unlikely to be able to answer many questions about products, services, retail and wholesale businesses and so on.” But – as Google recognizes – that this information isn’t provided in encyclopedia-like sources doesn’t mean that these sort of questions aren’t answerable, and may even be answerable with factual answers (as opposed to linked sources).

    If I query [screen resolution np940x3g-k05us] the fact that that product doesn’t exist in Wikipedia doesn’t mean that that isn’t reliably answerable from other sources, and certainly that the provision of that (question) answer directly to consumers wouldn’t be useful to them – especially when combined with query answers to provide information not provided in the question answer, like offers or other technical specifications for the product in question.

  3. Hi Steve,

    I’ve seen some people who are disturbed by Google returning both query results and question results, and saying things like “Google is scraping other sites.” I’m not sure if those folks are ever going to be happy with content such as the stuff that shows up in the Google Knowledge Panel or these facts that are now showing under Wikipedia entries.

    The idea that Google is not only indexing web pages, but also indexing object and entities and facts related to them is alien to many who only want to see Google index web pages and only web pages. I don’t think that is going to change.

  4. Hi Aaron

    Thanks for sharing your thoughts. I do think we will keep on seeing Google move towards more knowledge in search results, and more hybrid results that combine both query answering and question answering.

    That the Wikipedia results are showing a mix of either quick links or facts related to the original query and the patent describes this kind of behavior could be seen as a sign that we will see even more of these kinds of results in the future.

    I liked that the patent raised some interesting issues and attempted to address them, such as a sparsity of source of answers information, and how it doesn’t help in the domains of ecommerce. There’s a lot of room for Google to grow, and we don’t know the exact path that growth will follow, though it’s possible to make some guesses at this point. Its probably a good idea to do things like help develop schema for types of products or services you might have an interest in.

    I think I have a handle on the 2 different types of Wikipedia SERPS, with just the facts at the bottoms of some, and quicklinks at the bottoms of other, and what the thinking is behind those different types. I’ll probably be focusing on that in one of my next posts.

  5. Is there is a benefit to having more than one source?

    Speaking as a Google user, I’d probably just like to see the most popular / accurate answer in a knowledge box followed by normal SERPs. It is really just hard to tell these days, Google does a ton of experimentation with regards to their UI, but a lot of it is driven by Adwords, so it is hard to tell sometimes if the decision is made for UX or upcoming 4th quarter earnings.

  6. They are/have/did… =)

    It is proactively targeting voice search….

    What if Google was to answer back, Siri can’t…

    Ask Siri,
    “Siri, get me (Some Celebrity NOT on Contacts List)’S Facebook page…””

    “Sorry (insert Celebrity) is NOT on your Contacts list”

    Ask Google…

    “okay Google get me (Justin Timberlake) Facebook page”

    HTTPS result for correct Facebook page of designated Celebrity Page.

    KBO factors heavily with Voice /Verbs vs Nouns vs Adjectives
    Time, Measurement, Distance, Contents (of a location, i.e. Restaurant vs Grocecy Store Vs. Auto Parts Store vs. Home Depot or Best Buy. Where as a BRAND is an entity associated with a Company Type – for associations to be relevant and measureable they must be quantifiable,
    Restaurant –
    Associated Touch Points
    Menu (Gluten Free, Diabetic or Vegetarian Options)
    Contact – (Phone Number, [/address] Email, Form, Email List Subscription, Forum sign up.) What is the measured conversion point?
    Directions – Map link, Address and Phonech number clickable or transferrable to a search snippet.
    Images – What assocaited images are on the website that should be indexed as classified?
    Social – any social connections, registration with other services,
    Domain – Links.. inbound
    OFFERS – Pricing or Discount information assocaited with the Restaurant. Coupons, etc..
    Products – Some Restaurants have an Consumer or Wholesale focused food line. BBQ sauces, etc.. Listing in Google PLAs? yes or no?
    Raitings and Reviews – etc..
    Registration or Data Collection – i.e. GA,GWT, G+, YT, GPlaces, Bing Webmaster Tools, other analytics or cookie collecting software, not limited to but including something like.. DoubleClick analatics and Retargeting..
    Down stream databases content from website is included in.

    Location listings.. bad or Good..

    (sorry just thought of something for client.. To be con’t?? TBD)

  7. Hi Bill,

    This is very interesting indeed and I do believe that the more sources the better answer. Even if the user only sees one answer in their results, it makes sense that google would use both knowledge databases and a variety of documents to find the best answer.
    Mostly it’s the best way to validate data across databases and documents but also in the future it could lead to much more complex answers that go beyond the Knowledge Graph or answer box, quick facts and bottom links.
    For example right now if I search for “World War II timeline”, one of the results has a couple of dates/quick facts. With a more advanced search engine that combines all the data, we could see a real historical timeline with major events (that would be clickable and lead to a more detailed page like Wikipedia). That could be a sort of mix between an answer box and Knowledge graph results.

  8. Hi Patrick,

    The question answering results don’t impact adwords in any manner, do they? I don’t have a clue how anyone would be able to determine which answer to a question anwering result would be the “most” accurate or “most” popular

    There are a few ways of determining things like the answer with the most “confidence” to it, or a process involving clustering facts, to see which have the highest probability of being correct.

    You could possibly attempt to gauge popularity based upon something like the # of mentions within documents across the web, or something else such as the PageRanks of those documents mentioned upon, across the web, though I’d be concerned about too much defined by a citation analysis such as PageRank.

  9. Hi Hortense,

    I agree – information from a variety of different sources and knowledge bases could significantly improve the quality of results that people are shown.

    We’ve seen timelines from Google, and things like customized maps or graphs would be another of the kinds of things that Google could possibly use data to generate in response to a query. A query such as:

    “Show me a map of all of the states, and the majority of votes for each party that each state voted for in the year 2000.”

    Being able to get those kinds of answers would be fun. Or, “who was the top selling authors for each state, who grew up in those states?”

  10. Hi Steve,

    For the conversational entity queries of the type that Hummingbird was intended to address, I suspect that some semantic parsing is going on that involves identifying semantic frames that help put questions into context.

    Being able to extract facts is useful, but even a question such as “how fast is a Jaguar” can still involve a fact about the animal, a fact about a jacksonville football player, a zero to 60mph time of a car, and maybe even something about the effectiveness of an apple computer running Jaguar operating system software, so that context is still pretty important, even if you’re focusing upon facts. Sometimes additional facts, such as aggregated facts, geographic location of a searcher, past search history, searchers of people with similar profiles, profiles associated with the query terms, and more can help provide context.

    The patent around Siri was much longer and more detailed than what Apple released to the world, and a lot of it involved an ontology that was much richer, especially in terms of actions and nouns and transactions. I suspect Apple will get there someday. I can’t say that they will beat Google Now to getting there.

  11. ?! Why would the official webpage of the entity I Googled – in this case the official webpage of Warsaw – in your opinion (highlighted in a word balloon) qualify as “fluff” instead of facts – in fact, the official portal does make a lot of facts about the city available. Maybe there’s a better example that illustrates your point? Wouldn’t the official page of the entity for which I’m searching very possibly be something I want? I WOULD want it – whether it’s the site of a business, political entity, or cultural or educational or emergency services provider! Full disclosure: I worked with the city of Warsaw on its sustainability reporting – our collection of facts on the city related to environmental, societal, and economic/financial/governance performance is linked to the official portal of the city. So from both the user and content producer side of things, yes, in both roles I would want that site to come up!

  12. Hi Adam,

    I understand why you are saying that, and I called the site fluff based upon the snippet for the page, which is created by a non-government based marketing agency specializing primarily in tourism (there’s a link to the agency’s site in the bottom right of the page for that site). I probably should have picked another one, or noted that the other sites listed were primarily knowledge base sites filled with short and succinct facts rather than rich persuasive pages attempting to get people to visit the City.

    I have worked on the tourist site for at least one large city in North America that was primarily aimed at marketing the city. It really wasn’t a rich source of facts about the City, and neither is this one for Warsaw. It’s probably a better source of content for a query-based answer instead of a question answering type fact, like the Wikipedia results are. I think combining such source into one search result benefits from a prose plus facts type snippet.

Comments are closed.