Category Archives: Fact Extraction and Knowledge Graphs

Techniques and approaches that search engines might use to extract facts and information from the Web, as uncovered in search-related patents and whitepapers.

Images in Question Answers, Carousels, and Knowledge Panels at Google

When Google introduced us to the knowledge graph, it also introduced us to pictures and the possibility of other kinds of rich content (video, audio, etc.) in those knowledge panels, and pictorial lists displayed in carousels at the top of pages in response to a query, such as “What is the tallest building in the World?”

A carousel in response to  a question of 'what is the tallest building in the world?

A Google patent granted a couple of weeks ago, describes how Google processes search system queries, and might display knowledge graph answers to questions that include images. Here’s where they introduced carousels, in their page on the Knowledge Graph:

Google's Intro to carousels on the Google Knowledge Graph page.

Continue reading

Google’s First Semantic Search Invention was Patented in 1999

This is officially part of the story I’m telling in a presentation I prepared for SMX East, in a couple of weeks in New York. The name of the session I’m in is “Hummingbird and the Entity Revolution,” which reminds me of a Prince song from the 1980s.

The story starts off with a student given a tour by another student whom he gets into a fight with. They liked fighting with each other, and ended up becoming close friends. They studied together, and when their supervising professor went away to Japan for a year, they stopped working on their advanced degrees, and played on the internet instead. They created something they called Backrub. It later had its name changed to Google, and many people in the present day think it is the internet.

The entrance to one of Google's buildings

On March 10, 1999, Sergey Brin filed a “Miscellaneous Incoming Letter” (this is what it is described as in the USPTO’s PAIR database). It’s a provisional patent titled Extracting Patterns and Relations from Scattered Databases Such as the World Wide Web (pdf) (Skip quickly past the first couple of pages. It becomes much more legible from the third page on.)

Continue reading

Lessons Learned from Using Google’s Tagging and Extraction Data Highlighter Tool

I recently found a patent with two Google search engineers, Joshua Ain and Justin Boyan, listed as two of the three inventors. Last summer, at Google I/O in San Francisco, they joined together to talk about some tools that can more easily help webmasters add markup for structured data on the Web. The patent appears to be for Google’s Data Highlighter, which was one of those tools.

It inspired me to try to add structured data markup to my website. A task likely to fail for a few reasons.

I hadn’t read the patent yet last night, and I hadn’t done anything to improve the patterns found on my site, to make them more consistent. In other words, I learned the hard way, much like most non-developers, and non-programmers would.

The video below is an introduction to a number of Google tools, including the Google Data highlighter.

Continue reading

Is Google Going to Marry their Knowledge Base with their Search Engine?

Google has been answering queries with its search engine for over 15 years, and has been showing us it can answer questions with facts from its Browsable Fact Repository and/or the Google Knowledge Graph.

Might Google at some point bring the two together?

To a degree, Google has been merging some results, showing a set of search results (from the search engine) and a knowledge panel (from the Knowledge Graph) on the same results page. But you could say that those are separate and unique entities on search results pages.

jim-thorpe

Continue reading

Google’s Browseable Fact Repository – an Early Knowledge Graph

In earlier days at Google, when you used to ask a question, you could sometimes get a response providing answers to questions such as:

“When was George W. Bush’s birth-date?”.

We knew that Google could answer some questions like that, even if it might have been challenging, but we didn’t have much of a clue regarding the existence of something like Google’s Knowledge Graph until 2011. The answers we would see would sometimes be regular snippets where a word such as “birth-date” might be bolded.

Our set of 17 “related patents” that I first saw mentioned in a patent I wrote about this past Tuesday, and which was granted on August 19th, appear to have been created by a team under Andrew Hogue who was tasked to create “an annotation framework” to index more objects and facts associated with them on the web, which he would discuss more deeply during the presentation The Structured Search Engine, which is highly recommended.

He also oversaw the acquisition of MetaWeb by Google and the introduction of 25 former Meta-Web staff members from the company into Google.

Continue reading

Was Google Maps a Proof of Concept for Google’s Knowledge Base Efforts?

Not everything we read in a paper or in a patent from a search engine is something that happens in real life; but sometimes it is.

I like coming across a patent now and then that is dated but does a good job of describing something that happened as set out in that patent or paper.

The patent I’m writing about tonight was originally filed in 2006 and granted in 2010, and it provides a description of processes that I’ve seen first hand, and have used first hand to help people increase the number of visits they get to their offices or phone calls they get from future clients.

A Surveyor measuring land.

Google Maps a Proof of Concept of Knowledge Extraction

Continue reading

Looking at Peer Document Titles and Anchor Text when Collecting Facts about an Entity

During a civil or criminal legal case, the prosecuting side needs to present evidence to a judge or a jury. Each individual piece of evidence doesn’t have to prove the innocence or guilt of the party being tried by itself, but the combination of that evidence has to meet a certain standard. For a criminal case, the standard is beyond a reasonable doubt. For a civil case, it’s a standard of more probable than not. So, criminal cases tend to require higher levels of confidence.

When Google collects information on the Web about an entity, for their knowledge vault, they want that information to be as trustworthy as possible.

If you’ve read anything about Google’s introduction of the knowledge vault, one of the points about it that stands out is that there’s a high level of confidence in the information listed. There is more confidence in the facts that are associated with entities than there might have been in the Knowledge Graph.

Continue reading

Extracting Facts for Entities from Sources such as Wikipedia Titles and Infoboxes

There are a number of patents from Google, both granted patents and pending patent applications, that describe ways that Google might learn about entities and about facts associated with those by extracting the information from the Web itself instead of relying upon people submitting information to knowledge bases such as Freebase.

We learned from Google’s recent announcement that they would be replacing the Google Knowledge Base with their Knowledge Vault, and that supposedly brings a whole new set of extraction approaches with it that have high levels of confidence with them as to how accurate they might be.

It’s hard to tell exactly which approaches Google might be relying upon, and which ones that Google might have introduced through something like a patent that is no longer being used. But, it doesn’t hurt to learn some of the history and some of the approaches that might have been used in the past.

I’m blogging about a patent today that describes an approach that many of us have assumed that Google has been using for years to identify Objects or Entities and attributes about those and the values that fit those attributes.

Continue reading