In one of those posts, I write about a paper (pdf) that the inventors of that patent co-authored which describes ways that Google was finding and extracting facts from pages to include in a repository of facts.
Google published a foreign patent at WIPO today that has an interesting perspective to it. When someone performs a search that involves a specific entity, their search may be influenced by the search engine’s knowledge of their past interactions with content involving that entity.
For example, someone searches for “Justin Timberlake” and the search system may have collected information about the searcher’s past consumption of content related to that entity, like having attended a concert featuring him, or a movie that he was in:
In some applications, the server-based system additionally receives and stores information describing the user’s consumption of the content. For example, the system can determine that the user viewed the movie “The Social Network” featuring “Justin Timberlake” on a particular date and at a particular location. The system can store the information at the media consumption history that identifies the particular date and the particular location where the user viewed the movie “The Social Network,” and can subsequently receive a request that identifies the user and “Justin Timberlake.” The system can provide a response to the request that includes information about “Justin Timberlake” and can also indicate that the user viewed the movie “The Social Network” that features “Justin Timberlake” on the particular date and at the particular location.
When someone searches the web, and asks a question such as “what is the capital of Poland” or “what is the birth date of George Washington” a web search engine such as Google may not be very helpful in providing an answer if it provides a list of web pages that might answer that query instead of an actual answer. People in the SEO community have been referring to such answers as “direct answers.”
A patent granted to Google this week describes how Google indexes data across the web, and may look to a large collection of facts (in a fact repository such as a knowledge graph) to check upon and verify such answers, so that it can deliver them with more confidence and certainty, like in the answer to the question about George Washington’s birthday shown above.
The patent tells us that some efforts to build a search engine that can “provide quick answers to factual questions have their own shortcomings.” One of these is that the answers may come from a single source, such as “a particular encyclopedia.” Why this is perceived as a shortcoming is that it is:
In January, Microsoft introduced a new build of Windows 10, which it will be giving away for free for non-enterprise users running Windows 7 and Windows 8.1. One of the features on this update is a personal digital assistant that goes by the name Cortana.
You’ve likely seen Apple’s Personal Assistant Siri, which was featured on a number of celebrity enhanced advertisements, and you may have seen people writing about Google Now, which feeds you cards to give you information that it predicts you might need or want when that information becomes available. Cortana is Microsoft’s entry into the Personal Assistant field.
Cortana is supposedly “powered by Bing” and “developed for Windows Phone 8.1″, and it looks like an important feature in Windows 10. I’ve been having difficulties defining what “powered by Bing” actually means, except that it seems to imply that all of the questions asked to Cortana are answered by the Bing search engine.
The temptation was to write this blog post mostly in pictures, since it’s about visual representations of things, based sometimes on a combination of objects that were understood using object recognition, and virtual semantic images superimposed on those, learned of from a knowledge base.
I sometimes like to start looking through the documents I see listed as citations or footnotes in a paper I find interesting, As I started looking at the documents in that paper, I found many of them to be very interesting.
The patent doesn’t tell us about how such natural language direct answers are chosen by the search engine, but the following document, which shares the same authors as the inventors of the patent, and which was filed by them as a provisional patent, does give us some ideas on how those are found on the web.
We know that Google is looking for responses from pages that they consider to be “authoritative” pages.