In one of those posts, I write about a paper (pdf) that the inventors of that patent co-authored which describes ways that Google was finding and extracting facts from pages to include in a repository of facts.
When someone searches the web, and asks a question such as “what is the capital of Poland” or “what is the birth date of George Washington” a web search engine such as Google may not be very helpful in providing an answer if it provides a list of web pages that might answer that query instead of an actual answer. People in the SEO community have been referring to such answers as “direct answers.”
A patent granted to Google this week describes how Google indexes data across the web, and may look to a large collection of facts (in a fact repository such as a knowledge graph) to check upon and verify such answers, so that it can deliver them with more confidence and certainty, like in the answer to the question about George Washington’s birthday shown above.
The patent tells us that some efforts to build a search engine that can “provide quick answers to factual questions have their own shortcomings.” One of these is that the answers may come from a single source, such as “a particular encyclopedia.” Why this is perceived as a shortcoming is that it is:
You are cloxacillin, a kind of medication and an entity that some people may not know a lot about, but part of a bigger class of medicines that people are familiar with. And you’re taking a visit through a search engine as someone has been recently prescribed to you, and they want to know more about you.
They copy your spelling from the bottle they got at the pharmacy. They couldn’t read the handwriting of the doctor who initially prescribed in. Good thing pharmacists are trained in reading doctors’ writing.You name is spelled out, and a press of the search box button and knowledge is on its way.
At the time, Google had a Subscribed links program, where site owners could create specialized search results based upon certain patterns of queries, that would show additional content for a searcher. For some of those, you had to log into your Google Account and subscribe to certain links to be shown special content.
Oddly, some of those specialized search results didn’t require subscriptions, and didn’t require logging in. Much like these NFL sports Scores from this weekend:
A Google patent application explores how Google may answer factual questions from unstructured Web pages and results rather than from more structured sources such as Freebase or Wikipedia. The processes described in the patent are pretty interesting, and they might be more familiar to an SEO trained audience than a Semantic Web one, like a result that ranks well because of a “query deserves freshness” approach.
They also avoid a problem for the search engines that I’ve been thinking about for weeks.
Tomorrow morning, I’m presenting on the Semantic Web at Google at Pubcon in Las Vegas. I’ve included my presentation deck here to use as a kicking off point for further discussion.
Changes to what Google shows in search results have been difficult to miss, from many different types of rich snippets to recent additions of search boxes in search results and Google showing snippets from pages that contain both query answering and question answering results mixed together.
The Semantic Web is making an even stronger appearance recently at Google than it has in the past. With knowledge panels, carousels listing all kinds of things (and people and places), structured snippets merging query answers with question answers into a single snippet, OneBoxes of many different kinds, and even Hummingbird responding better to longer and more complex queries, it’s the future of Google.
I’m presenting on it this morning at the Javit’s Center in Manhattan at SMX (Search Marketing Expo) East, in a session titled “Hummingbird and the Entity Revolution”
In creating a knowledge base, there seem to be a number of approaches that can be used to supply entities and facts from sources like web pages and query logs.
In my last post, I wrote about how search queries might be used, along with linguistic patterns, to extract attributes about facts from those search queries, as described in a patent titled Inferring attributes from search queries.
A Microsoft paper from 2009, Named Entity Recognition in Query, tells of a manual analysis they performed of 1,000 queries, and told us that 70% of those queries contained named entities.
So entities do appear in queries, and Google receives a lot of queries a day (as does Microsoft and Yahoo).