Category Archives: Fact Extraction and Knowledge Graphs

Techniques and approaches that search engines might use to extract facts and information from the Web, as uncovered in search-related patents and whitepapers.

Google’s Knowledge Cards

In the Google patent “Providing Knowledge Panels With Search Results” is a reference to an earlier Google patent filing describing Knowledge Cards in depth. The patent provision is titled, “Apparatus and Method for Supplying Search Results with a knowledge Card”, and it is identified as being Patent Application No. 61/515,305, filed on Aug. 4, 2011.

This provisional patent is not linkable from the Web, otherwise I would provide a link to it.

It is supposedly “incorporated fully” into that later patent filing, but a lot of details about what a knowledge card is have been left out of the later patent filing. I wrote about that later patent in a post titled, How Google Decides What to Know in Knowledge Graph Results, but the patent specifically about knowledge cards contains information not in the later patent.

Knowledge Panel results are part of Google’s Semantic Web search results which include a mix of result types such as Direct Answers, Structured Snippets, Rich Snippets and are part of an evolution of search results happening at Google and Bing and Microsoft that go much beyond yesterday’s 10-Blue links. I’ll be following this post with one about the rich search results that show up in response to queries at Bing.

Continue reading Google’s Knowledge Cards

Google on Crawling the Web of Data

A patent granted to Google this past fall explores how the search engine looks for patterns on Web pages to use to find facts on the Web to fill up Google’s data repository (Knowledge Base).

An image from a local park in Carlsbad symbolizing the Sun.
An image from a local park in Carlsbad symbolizing the Sun.

I recently wrote a series of posts about Google collecting data to enable them to answer Direct answers. starting with one titled Direct Answers – Natural Language Search Results for Intent Queries.

In one of those posts, I write about a paper (pdf) that the inventors of that patent co-authored which describes ways that Google was finding and extracting facts from pages to include in a repository of facts.

Continue reading Google on Crawling the Web of Data

How Google was Corroborating Facts for Direct Answers

When someone searches the web, and asks a question such as “what is the capital of Poland” or “what is the birth date of George Washington” a web search engine such as Google may not be very helpful in providing an answer if it provides a list of web pages that might answer that query instead of an actual answer. People in the SEO community have been referring to such answers as “direct answers.”

Google answering a direct question with a factual answer.
Google answering a direct question with a factual answer.

A patent granted to Google this week describes how Google indexes data across the web, and may look to a large collection of facts (in a fact repository such as a knowledge graph) to check upon and verify such answers, so that it can deliver them with more confidence and certainty, like in the answer to the question about George Washington’s birthday shown above.

The patent tells us that some efforts to build a search engine that can “provide quick answers to factual questions have their own shortcomings.” One of these is that the answers may come from a single source, such as “a particular encyclopedia.” Why this is perceived as a shortcoming is that it is:

Continue reading How Google was Corroborating Facts for Direct Answers

Google Queries for Instances of Data Help Reveal the Classes Where They Belong

You are cloxacillin, a kind of medication and an entity that some people may not know a lot about, but part of a bigger class of medicines that people are familiar with. And you’re taking a visit through a search engine as someone has been recently prescribed to you, and they want to know more about you.

cloxacillin molecule diagram
{{Information |Description ={{en|1=Ball-and-stick model of oxacillin molecule. The structure is taken from ChemSpider. ID 5873}} |Source ={{own}} |Author =MarinaVladivostok |Date =2013-07-22

They copy your spelling from the bottle they got at the pharmacy. They couldn’t read the handwriting of the doctor who initially prescribed in. Good thing pharmacists are trained in reading doctors’ writing.You name is spelled out, and a press of the search box button and knowledge is on its way.

A Google Knowledge panel for colxacillin

Continue reading Google Queries for Instances of Data Help Reveal the Classes Where They Belong

Rich Snippets and Patterned Queries

Revisting the Subscribed Links Patent Five Years Later and Finding the Rich Snippets Patent

I first looked at this patent five years ago, but called it the Subscribed Links Patent.

At the time, Google had a Subscribed links program, where site owners could create specialized search results based upon certain patterns of queries, that would show additional content for a searcher. For some of those, you had to log into your Google Account and subscribe to certain links to be shown special content.

Oddly, some of those specialized search results didn’t require subscriptions, and didn’t require logging in. Much like these NFL sports Scores from this weekend:

A Football Score Rich Snippet

Continue reading Rich Snippets and Patterned Queries

How Google May Answer Fact Questions Using Entity References in Unstructured Data

A Google patent application explores how Google may answer factual questions from unstructured Web pages and results rather than from more structured sources such as Freebase or Wikipedia. The processes described in the patent are pretty interesting, and they might be more familiar to an SEO trained audience than a Semantic Web one, like a result that ranks well because of a “query deserves freshness” approach.

They also avoid a problem for the search engines that I’ve been thinking about for weeks.

The problem was one that came to me when I attended The Semantic Web Business and Technology 2014 conference around a month or so ago. In a presentation by Yahoo!’s Nicolas Torzec, he discussed Yahoo!’s relatively new Knowledge Graph, and was asked a question by someone from the audience about

Continue reading How Google May Answer Fact Questions Using Entity References in Unstructured Data

At Pubcon, Presenting on a Semantic Timeline at Google

Tomorrow morning, I’m presenting on the Semantic Web at Google at Pubcon in Las Vegas. I’ve included my presentation deck here to use as a kicking off point for further discussion.

Changes to what Google shows in search results have been difficult to miss, from many different types of rich snippets to recent additions of search boxes in search results and Google showing snippets from pages that contain both query answering and question answering results mixed together.

Thanks to Barbara Starr for taking a look at the presentation, and for suggesting that I look for a Google patent for rich snippets which I hadn’t included. I went searching the patent in the US Patent office and found a good candidate for it, and will probably post a more detailed look at that one in the near future. It’s Generating specialized search results in response to patterned queries.

Here’s my presentation:

Continue reading At Pubcon, Presenting on a Semantic Timeline at Google

At SMX East; Presenting on Google and the Semantic Web

The Semantic Web is making an even stronger appearance recently at Google than it has in the past. With knowledge panels, carousels listing all kinds of things (and people and places), structured snippets merging query answers with question answers into a single snippet, OneBoxes of many different kinds, and even Hummingbird responding better to longer and more complex queries, it’s the future of Google.

I’m presenting on it this morning at the Javit’s Center in Manhattan at SMX (Search Marketing Expo) East, in a session titled “Hummingbird and the Entity Revolution”

msimmonds-smx-east

Continue reading At SMX East; Presenting on Google and the Semantic Web