Almost a year ago, Search Engine Land published an article titled Google Search OneBox Answers Are Getting More Detailed.
Search Engine Land has been referring to question-answering type results as Direct Answers, since they don’t seem to follow the normal rules of search results that return documents matching keywords in a query. Instead, they were using an approach to try to take advantage of both question answering and keyword matching, as shown in the image below:
This post is the third of a five part series that takes a look at natural language answers showing up in search results, possibly under Google’s patent application, Natural Language Search Results for Intent Queries.
Part 1 looked at the patent itself, and natural language search results. Part 2 described how Google was choosing “authority” sites to provide answers to queries from, which supposedly turned these results into high quality results.
This part looks at how this system may use intent templates to identify queries with clear intent questions and to identify natural language answers from the content of authoritative sources.
If the user performs a natural language query (“How do I make Hummus?”, “What are the symptoms for chicken pox?”, other normal language type queries) it may show both snippet-based results selected from web pages based on a keyword level search and it may also show natural language answers based on “intent” of the natural language query.
The natural language part of this process may use intent templates to translate the natural language query into a keyword query and that keyword query may be used to determine the snippet-based results.
A Question and Answer (Q&A) Engine
The Q&A engine may use the Q&A data store, the search records, and the crawled documents to generate intent templates, to populate and maintain the Q&A data store, and to determine if a query includes a clear-intent question that can be answered by the Q&A data store.
The Q&A Data Store is filled with pieces of text and headers from authoritative documents that could be used to help answer questions, like in the following examples.
Using Search Records to find Intent Questions
As shown in the three examples above, the search engine may send a query to the Q&A engine and the Q&A engine may provide natural language answers from Q&A data store (which has collected heading and text that answer questions) to the search engine. Those natural language answers may be ranked by the Q&A engine or by the search engine using data provided by the Q&A engine.
In addition, the search system may also obtain potential intent questions from search records such as search query and click logs, aggregated data gathered from queries, or other data regarding the search terms and search results of previously processed queries.
From those search records the search system may identify queries that relate to the subject matter of the Q&A data store.
If the subject matter of a query is medical information, the search system may look for query results with pages from sources such as mayoclinic.com or webmd.com in the top ranking search results. It could then assume that the query associated with such identified search results includes a clear-intent question.
By looking for clear-intent questions from queries as well as from authoritative sources, the search system could account for various different ways that an intent question can be posed. Examples of such variation could include “heart disease treatment” and “how do I treat heart disease?” Both of these represent the same intent question, but an authoritative source is more likely to include the former, while a query may be more likely to include the latter.
Intent templates may be taken from content available from both authoritative sources and from search records that include previously processed queries and their returned results.
These templates might include both a non-variable portion and a variable portion. That non-variable portion may be text and the variable portion may be a placeholder for one or more words. This approach makes it more likely that these are actually used as templates.
A template of “$X causes” has a non-variable portion of “causes” preceded by a variable portion, that could include words such as “sleepiness,” or “sluggishness” or “weakness” or so on.
As another example:
A template of “recipe for $X” has a non- variable portion of “recipe for” followed by a variable portion. The variable portion could cover a wide range of terms, such as “beef stirfry” or “orange marmalade” or “shrimp jambalaya.”
Topics of Intent Templates
A query or heading corresponding to or matching the template, may include a number of words followed by the word “causes”, such as diabetes causes” or “heart attack causes.”
The variable portion, for example “diabetes” or “heart attack” for a template of “$X causes” or “filet and scallop stir fry with asparagus” for a template of “recipe for $X” may be considered a topic of the query or heading.
Templates Assigned to Question Categories
Each of these templates could be assigned to a question category that represents a variety of questions used to request the same specific information. This can help make the search engine respond to such queries a lot more quickly.
The following templates may all be classified as belonging to a treatment question category:
- How do I treat $X
- $X treatment
- How is $X treated
- How to cure $X
Likewise, these templates could be classified as templates for a recipe question category:
- How to make $X
- $X recipe
- Directions for making $X
The patent application tells us that these questions could be assigned to the question category manually or done automatically by looking at similar search results returned for queries conforming to the template.
For example, if search results for the queries “how is diabetes treated” and “what cures diabetes” are similar, the Q&A engine may cluster those two templates together under the treatment question category.
The purpose behind this patent application is to try to provide both natural language results to a query, and to use those natural language results from authoritative sources, related search results, and those clusters of intent templates to come up with better keyword-based search results in addition to the natural language answer or answers.
We’ve looked at some of the important aspects of how this patent filing was intended to operate. Over the next two days (the first days of 2015), we will look at some ways to try to make it more likely that content from your pages might be used as answers from authoritative pages in response to natural language queries.