How Google was Corroborating Facts for Direct Answers

When someone searches the web, and asks a question such as “what is the capital of Poland” or “what is the birth date of George Washington” a web search engine such as Google may not be very helpful in providing an answer if it provides a list of web pages that might answer that query instead of an actual answer. People in the SEO community have been referring to such answers as “direct answers.”

Google answering a direct question with a factual answer.
Google answering a direct question with a factual answer.

A patent granted to Google this week describes how Google indexes data across the web, and may look to a large collection of facts (in a fact repository such as a knowledge graph) to check upon and verify such answers, so that it can deliver them with more confidence and certainty, like in the answer to the question about George Washington’s birthday shown above.

The patent tells us that some efforts to build a search engine that can “provide quick answers to factual questions have their own shortcomings.” One of these is that the answers may come from a single source, such as “a particular encyclopedia.” Why this is perceived as a shortcoming is that it is:

…unlikely to answer many questions concerning popular culture, such as questions about movies, songs or the like, and is also unlikely to answer many questions about products, services, retail and wholesale businesses and so on. If the set of sources used by such a search engine were to be expanded, however, such expansion might introduce the possibility of contradictory or ambiguous answers. Furthermore, as the universe of sources expands, information may be drawn from untrustworthy sources or sources of unknown reliability.

If we instead use all of the data across the web as a potential source of answers, we get a much wider range of topics and things that questions can be answered about.

If facts related to things that questions might arise about can be found on many pages, than the data about those things can be corroborated from many sources to identify how correct a fact about them might be. A search engine that might build up such a knowledge base, or fact-based repository could then answer a question such as “what is the capital of Poland,” or “what is the birth date of George Washington” and use that collection of corroborated facts to return a “likely correct fact,” as an answer.

The patent that was just granted is:

Corroborating facts in electronic documents
Invented by Shubin Zhao and Krzysztof Czuba
Assigned to Google
US Patent 8,954,412
Granted February 10, 2015
Filed: September 28, 2006


A query is defined that has an answer formed of terms from electronic documents. A repository having facts is examined to identify attributes corresponding to terms in the query. The electronic documents are examined to find other terms that commonly appear near the query terms.

Hypothetical facts representing possible answers to the query are created based on the information identified in the fact repository and the commonly-appearing terms. These hypothetical facts are corroborated using the electronic documents to determine how many documents support each fact. Additionally, contextual clues in the documents are examined to determine whether the hypothetical facts can be expanded to include additional terms.

A hypothetical fact that is supported by at least a certain number of documents, and is not contained within another fact with at least the same level of support, is presented as likely correct.

The patent’s description starts off by telling us more about how Google crawls and indexes data from the Web, and uses data janitors to clean up that data. The “fact repository” described in that post is an early version of Google’s knowledge graph, before it was given that name.

This patent was also written before the fact repository it includes in its description was referred to as the knowledge graph by Google, and it is a precursor to the knowledge graph. The method of using lots of documents to see if facts from them support each other is similar to a desired consistency of citations in local search when it comes to NAP (name, address, and phone number). Google has much more confidence in the correctness of a local search listing, or an answer to a question about facts when a lot of documents provide the same answer to a specific question.

As to how Google might provide answers to questions about facts, the patent does give us a description of how that is done:

In one embodiment, the contents of the facts in the repository are also indexed in index. The index maintains a term index, which maps terms to {object, fact, field, token} tuples, where “field” is, e.g., an attribute or value. The service engine is adapted to receive keyword queries from clients such as object requestors, and communicates with the index to retrieve the facts that are relevant to user’s search query.

For a generic query containing one or more terms, the service engine assumes the scope is at the object level. Thus, any object with one or more of the query terms somewhere (not necessarily on the same fact) will match the query for purposes of being ranked in the search results. The query syntax can also be used to limit results to only certain objects, attributes, and/or values.

It’s possible that an approach like that might turn up more than one fact that answers a question, so the patent also tells us about how it might rank answers:

The relevance score for each fact is based on whether the fact includes one or more query terms (a hit) in either the attribute or value portion of the fact. Each hit is scored based on the frequency of the term that is hit, with more common terms getting lower scores, and rarer terms getting higher scores (e.g., using a TD-IDF based term weighting model).

The fact score is then adjusted based on additional factors. These factors include:

  • The appearance of consecutive query terms in a fact,
  • The appearance of consecutive query terms in a fact in the order in which they appear in the query,
  • The appearance of an exact match for the entire query,
  • The appearance of the query terms in the name fact (or other designated fact, e.g., property or category), and
  • The percentage of facts of the object containing at least one query term.

Each fact’s score is also adjusted by its associated confidence measure and by its importance measure. Since each fact is independently scored, the facts most relevant and important to any individual query can be determined, and selected. In one embodiment, a selected number (e.g., 5) of the top scoring facts are retrieved in response to query.


The patent provides more details and additional examples, including a detailed look at how it answers the question “Who did William Frawley play”?

The answer to that question is that he played the character of “Fred Mertz” in “I Love Lucy,” but he also played “Bub” in the even older TV series, “My Three Sons. The patent describes why it might answer with the more recent answer first, based upon search history.

In the Google Research blog, Google announced this week that they had updated their knowledge Graph to show improved answers to medical questions, in the post A remedy for your health-related questions: health info in the Knowledge Graph. They told us in the post that they had worked with people at the Mayo Clinic to respond to a wide range of health related questions. It goes beyond the process described in this patent on corroborating facts by looking at a wide range of other electronic documents on the Web. The answers to these health-related questions were reviewed by a number of actual doctors.

We had been getting answers to health related questions as direct answers, such as “what are the symptoms to Mono” shown in a patent screenshot that I showed in the post Direct Answers – Natural Language Search Results for Intent Queries

Natural Language results on a query for 'Symptoms of Mono' from authoritative sources.

Will we see Google updating other areas of knowledge using other subject matter experts to improve upon those answers? The idea in the patent on corroborating facts to provide answers does make sense, but we have an example from this week where Google was showing that it might not have been completely satisfied with those answers.

Has the corroborating facts approach described in the patent changed?

It’s possible that they might find ways to provide higher quality answers for other topics, too.

Article Name
How Google was Corroborating Facts for Direct Answers
A recently granted Google Patent explores how facts may be corroborated and chosen for a question answering search result.

33 thoughts on “How Google was Corroborating Facts for Direct Answers”

  1. Well done Bill, thank you for enlightening us on this. Particularly for highlighting this section of the patent:

    “The fact score is then adjusted based on additional factors. These factors include:…”

    I’ve been really keeping my eye on direct answers (as many SEO’s have) for the last 6 months or so and doing my best to see what makes it tick, reverse engineering it somewhat, and documenting.

    The part that I really haven’t been able to nail down is frequency of direct answer updates / edits. I know there is a manual edit process somewhere in here, but I also know that many of these answers are automated AEB how incorrect some of them are.

    Either way thanks for continuing to translate this for us!

  2. Hi Patrick,

    You’re welcome. I’ve been very curious about these direct answers too. The patents seem like a good starting off point for information about them. I suspect that there may have been some changes, as Google needed to make tweaks and updates, like they did with answers to medical question – so the inquiry as to how they actually work may be an ongoing thing. But I do like that the patents do provide us with a peek under the curtains.

    I really do think it’s an ideal situation if the whole updating process is an automated one as much as possible, but some quality control in a manual manner probably doesn’t hurt at all.

  3. Thanks Bill…I caught your interview on and made my way here after reading it.

    How concerned should local business owners be of direct answers?

    If the question is something like “how much does a flat of tomato plants cost?” would it be relatively easy to get a clients gardening business some local business using DA?

    If it’s a simple process I can see local grocery store taking advantage of direct answers for every recipe they can think of just to get a source link.

    Or doesn’t it work that way?

  4. Hi Matt,

    Thanks. It’s good to see you follow me over here from that interview. information is your chance to reinforce what you put in text on your pages, and show Google what services your site actually offers, where you are located, what your customer service phone numbers and other department numbers might be, where your social profiles for your business are at, and more.

    I linked to a post that focused on one Google patent in this post that describes some aspects of what Google might be looking for in sources for direct answers. These are sites that are “authoritative” in that they tend to get selected by searchers when they show up in search results, and they tend to rank highly for related queries. The schema markup I mention above can help your site rank more highly in search results, but you still need to do things like write engaging and persuasive page titles and meta descriptions, and content on your pages.

    I’ve been seeing more “direct answer” type results when I type in a query for many terms. A number of these look like definitions, and I suspect that we will see these increase, and include information about things like local landmarks and points of interest.

    This does seem like something we are going to see more of, but a local grocery store might have to compete with national sites on topics like recipes that aren’t something only known to be local in nature (Like a Maryland Crab Soup in Maryland, maybe).

  5. Google keeps trying to change the way we search or rather it tries to copy our (human) ways to answer.

  6. Hi Anand,

    Searching with keywords was more unnatural than using a natural language approach. I kind of prefer natural language questions, in the shape of actual conversational type questions, anyway. That does seem to be what Google is targeting. As well as providing answers to questions when they are asked.

  7. Direct answers are the future of the search, whichever search engine gets it right is going to rule the segment, no doubt about it. All those long and unending search engine listings have to become passe to make way for the exact and accurate solutions.

  8. Something all SEO’s really need to be paying very close attention to. Big thank you to Bill for highlighting this patent and giving his opinion on the “translation” of a lot of this. This is clearly the future of search, and something I have my eye on.

  9. Hi Cathy,

    I do think that many searchers prefer to see answers to questions rather than having to search through web pages to find those answers. Of course, pages that do contain more information than might be in a brief answer or a knowledge panel may be preferred by searchers, who really mostly want answers to their questions.

  10. Good to know this Bill Sir. I am seeing it as a very good move from Google, from a searcher point of view as it will actually save lots of time on browsing.
    Good thing is that Google is giving a link back to the main site from where it is showing the answer and mostly that is Wikipedia.
    I have a (silly) question to you Sir. Does search box on website with ld+json or microdata has any effect on this apart from sitelinks search box?
    Thank you Sir, for sharing this information with us.

    Soumya Roy

  11. Hi Soumya

    I agree that showing direct answers like this can speed up searching for people looking for answers. Google recently stated the same thing in the 10-K financial statement for 2014, too.

    Google doesn’t always link back to the source of a direct answer, but that’s usually in cases where the answer was more of statement of fact than anything else, like the U.S. President’s birthday.

    I don’t believe that markup for a site links search box has an impact on whether content from a site is shown as a direct answer, in Google search results.

  12. Very interesting thread from a SEO perspective once again Bill!
    Wondering if you are up to something about how does Google index Tweets today?

  13. Excellent stuff.

    I’ve suggested to fellow SEOs that Google only looks to the top 10 results for a query, when it’s providing an answer box, because I’ve never found an answer box that wasn’t pulled from the first SERP. It would also ensure some level of authority, and presumably reduce the computational?

    Do you think this is true, and if so, does that mean when Google corroborates a fact, it would be with the other 9 results?

    Thank you.

  14. I found this particular text rather amusing from your thread.
    “The method of using lots of documents to see if facts from them support each other is similar to a desired consistency of citations in local search when it comes to NAP (name, address, and phone number).”
    Its good to know such things and it seems Google doesn’t necessarily know everything.

  15. Hi John,

    I have seen answers in answer boxes that were from results from other pages in the top 10 than from just the top result. If Google corroborates a fact, it would possibly be from other pages that might not necessarily rank in the top 10 for a particular query.

  16. Hi Beakon

    We know that Google will supposedly be including tweets in a data stream again sometime this spring. I can’t say that I tend to see many tweets listed in search results very often these days. They don’t seen to be treating them as if they were like direct answers though.

  17. Hii Bill,

    You covered really a very important topic which is discussed everywhere in these days. Most of the bloggers or webmasters (including me) are worrying about the fact Google showing information directly.

    Thank you so much for providing the detailed information related to it. Loved the article !!

  18. Well it’s good to understand the logic that Google is using in that we can now understand how to shape our websites to meet what Google needs granted we have relevant information or at least someone relevant, interesting. Lot of sites with information or products have solutions, answers for people but don’t understand Google’s robot’s brain. So the connection doesn’t happen. And that’s a shame. On site presentation, structuring and technicals are becoming more important than ever, maybe more than our off site promotion.

  19. Hey Bill,
    Your blogs are intense! You have so much content on here and I’m really impressed. Google is an interesting thing for sure and I appreciate this post. I do Video Seo for clients and I run across your blog on occasion for advice. Thanks again for putting out so much content for us

  20. Good to know this Bill.
    This is clearly the future of search, and something I have my eye on.
    Thank you so much for providing the detailed information related to it. Loved the article!

  21. Hi Bill,

    I’m wondering where Google is sourcing it’s facts, and if any of them come from “the deep web”. So much rich and valuable data sits behind firewalls. I’m thinking of things like legal and medical libraries.

  22. Google is constantly changing its algorithms and we can’t predict them what it would be. So SEO professionals are best to know what the latest changes has been done by Google. Nice info

  23. Being an SEO expert I think that your info is very helpful to understand the aspects of Google analytics and it will help to boost up the ranking of any website easily.

  24. Hi Bill,
    First of all want’s to thanks you for writing this article in a simple way. So, that new blogger’s like me can understand this so, easily. Keep writing and helping us.

Comments are closed.