How Google May Handle Question Answering when Facts are Missing

Sharing is caring!

In 2017, I wrote about a similar patent in the post, Google Extracts Facts from the Web to Provide Fact Answers

The patent this post about starts with saying that Google may have a problem with answering questions from facts it collects from the Web to fill its knowledge graph:

Embodiments relate to relational models of knowledge, such as a graph-based data store, that can be used to provide answers to search queries. Such models describe real-world entities (people, places, things) as facts in graph nodes and edges between the nodes. While such graphs may represent a significant amount of facts, even the largest graphs may be missing tens of millions of facts or may have incorrect facts. For example, relationships, edges, or other attributes between two or more nodes can be missing.

That is the problem that this new patent tries to solve. The patent is from November 2017. The earlier patent I linked to above was granted in June 2017. It is not about missing or incorrect facts like this newer patent are about. The newer patent tells us about how they might answer some questions without access to some facts.

It’s also reminding me of another patent that I recently wrote about on the Go Fish Digital Website. That post is Question Answering Explaining Estimates of Missing Facts. Both the patent that the post was about and this new patent include Gal Chechik, Yaniv Leviathan, Yoav Tzur, Eyal Segalis, as inventors (the other patent has a couple of additional inventors as well.)

The earlier question answering with estimates patent talks about how they might infer answers and provide explanations with those answers. This also tells it might infer answers but doesn’t include the explanations:

Facts and/or attributes missing from a relational model of knowledge often can be inferred based on other related facts (or elements of facts) in the graph. For example, a search system may learn that an individual’s grandfather is a parent’s male parent. Accordingly, the system can determine with high confidence that an individual’s grandfather, even though there is no grandfather edge between nodes, is most likely a parent of a parent (given that there is a parent edge between nodes) with an additional check the parent of the parent is male. While this example uses one piece of supporting evidence (called a feature), inferring an individual’s grandfather, functions estimating missing facts are often more complex and can be based on several, even hundreds, of such features. Once the facts and/or attributes missing from a relational model of knowledge can be inferred, queries based on the facts and/or attributes missing from a relational model of knowledge can be resolved.

The process described in this question answering patent describes how Google may go about coming up with an answer to a question. This patent was filed after the one that includes estimates of how answers were created, so it does not include that step:

In one example, a computer system includes at least one processor and a memory storing a data graph and instructions. When executed by at least one processor, the instructions cause the system to generate a template sentence based on a fact, including a first node, a second node, and a string, wherein the first node and the second node exist in the data graph. The string represents a fact that is absent from the data graph, search the internet for a document including the template sentence, and upon determining the internet, include the document with the template sentence, infer the fact by generating a series of connections between nodes and edges of the data graph that together with the first node and the second node is configured to represent the fact, the series of connections defining a path, in the data graph, from the first node to the second node.

This process isn’t described in too much detail, but the patent does provide an example, which may help understand how it may work. Here is that example:

For example, a node may correspond to a fact describing a parent-child relationship. For example, baseball player Bob Boone is the son of baseball player Ray Boone and the father of baseball players Aaron Boone and Bret Boone. Accordingly, the data graph may include an entity as a node corresponding to Bob Boone, which may include an edge for a parent relationship directed to Ray Boone and two edges for child corresponding, respectively, to Aaron Boone and Bret Boone. The entity or node may also be associated with a fact or an attribute that includes an edge (e.g., occupation) between Bob Boone as a node and baseball as a node. Alternatively, the node Bob Boone may include an attribute as a property (e.g., occupation) set to baseball.

However, there may be no edge in the entity (or the graph as a whole) corresponding to a grandparent relationship. Therefore, the relationship between Ray Boone and Aaron Boone may not be shown in the graph. However, the relationship between Ray Boone and Aaron Boone may be inferred from the graph so long as the question answering system knows (i.e., has been instructed accordingly) that there is such an entity as a grandparent.

The inference may be based on the joint distribution of one or more features, representing facts in the data graph related to the missing information. The system may also be used to store the inferences (e.g., like functions or algorithms). The semantically structured sentence (e.g., X is the attribute of Y) is used to generate the inference. It then uses these entities to map a new string that corresponds to relationships between nodes. By that system may be configured to learn new edges between existing nodes in the data graph. In some implementations, the system can generate an inference and algorithm from a huge data graph, e.g., one with millions of entities and even more edges. The algorithm (or function) can include a series of connections between nodes and edges of the data graph. Accordingly, the algorithm can represent an attribute as an edge, in fact. The algorithm (or function) can also include a check of a node’s property (e.g., a gender property is a male). While the system in FIG. 1 is described as an Internet search system, other configurations and applications may be used. For example, the system may be used in any circumstance where estimates based on features of a joint distribution are generated.

The mentions of Joint Distributions in this patent are worth studying in more depth as the relationships between properties of different entities may reveal information that worth a system like the knowledge graph knowing about. For example, the son of someone’s son is their grandson. If the knowledge graph doesn’t include that grandson’s property, then making that connection can mean that a question answering system can start answering questions like Aaron Boone is Ray Boone’s Grandson. Other relations beyond whom is related to whom within a family can use this approach to answer questions.

This patent that is aimed at helping fill in missing and incorrect facts for question answering systems is:

Semi structured question answering system
Inventors: Yaniv Leviathan, Eyal Segalis, Yoav Tzur, and Gal Chechik
Assignee: GOOGLE LLC
US Patent: 10,346,485
Granted: July 9, 2019
Filed: November 8, 2017

Abstract

In one example, a computer system includes at least one processor and a memory storing a data graph and instructions. When executed by at least one processor, the instructions cause the system to generate a template sentence based on a fact, including a first node, a second node, and a string, wherein the first node and the second node exist in the data graph. The string represents a fact that is absent from the data graph, search the internet for a document including the template sentence, and upon determining the internet, include the document with the template sentence, infer the fact by generating a series of connections between nodes and edges of the data graph that together with the first node and the second node is configured to represent the fact, the series of connections defining a path, in the data graph, from the first node to the second node.

Some posts I’ve written about patents involving question answering:

Last Update July 11, 2019.

Sharing is caring!

13 thoughts on “How Google May Handle Question Answering when Facts are Missing”

  1. Hi Wolfgang,

    I think the questions are being asked by people. The Web may not be quite ready to answer many of them, and it is possible to see an evolution in how Google is working towards trying to answer such questions. Yes, the son of a son is a grandson – The Web may contain information about sons and not about grandsons yet, but if Google is able to make such a connection, and answer such questions, it is good to see.

  2. Hi AGenzia SEO,

    I think that is a very reasonable point. Spoken queries are likely to grow, and people willbe asking more questions in the future, where they are looking for answers rather than just links to pages. I have a Google Speaker, which I often ask questions to every morning, and it often provides me with answers, and offers to send me a link to more on my phone. That is often a good experience for me.

  3. Hey Bill, these tips are absolutely incredible. Half of these things I didn’t even think about. I’m a beginner in online world and I found these very helpful. Thank you very much!

  4. Hi Nikhil,

    That is one of the reasons why I spend time looking at patents – because many of the topics covered in patents, I wouldn’t have thought about either.

  5. Hey Bil

    Great Tips and really awesome it will definitely help me for better understand Google how to execute queries and how they handle question answer. Thanks for sharing great and valuable article!

  6. The Web may contain information about sons and not about grandsons yet, but if Google is able to make such a connection, and answer such questions, it is good to see.

  7. these tips are absolutely incredible. Half of these things I didn’t even think about. I’m a beginner in online world and I found these very helpful. Thank you very much!

  8. Hi Marcin,

    There is no timeline that tells us how often Google might make changes to the processes that they follow. They do sometimes publish continuation patents after they make changes to how they may do something, so if there are changes to how they might be handling question answering with there are facts that are missing, they may file one of those continuation patents, which will have an updated claims section (the description in the patent may be identical or very similar to the one that was originally filed. The attorney’s who prosecute patents and decide whether to grant them or not will do that on the claims that are filed.

Comments are closed.