Category Archives: Fact Extraction and Knowledge Graphs

Techniques and approaches that search engines might use to extract facts and information from the Web, as uncovered in search-related patents and whitepapers.

Was Google Maps a Proof of Concept for Google’s Knowledge Base Efforts?

Not everything we read in a paper or in a patent from a search engine is something that happens in real life; but sometimes it is.

I like coming across a patent now and then that is dated but does a good job of describing something that happened as set out in that patent or paper.

The patent I’m writing about tonight was originally filed in 2006 and granted in 2010, and it provides a description of processes that I’ve seen first hand, and have used first hand to help people increase the number of visits they get to their offices or phone calls they get from future clients.

A Surveyor measuring land.

Google Maps a Proof of Concept of Knowledge Extraction

Continue reading

Looking at Peer Document Titles and Anchor Text when Collecting Facts about an Entity

During a civil or criminal legal case, the prosecuting side needs to present evidence to a judge or a jury. Each individual piece of evidence doesn’t have to prove the innocence or guilt of the party being tried by itself, but the combination of that evidence has to meet a certain standard. For a criminal case, the standard is beyond a reasonable doubt. For a civil case, it’s a standard of more probable than not. So, criminal cases tend to require higher levels of confidence.

When Google collects information on the Web about an entity, for their knowledge vault, they want that information to be as trustworthy as possible.

If you’ve read anything about Google’s introduction of the knowledge vault, one of the points about it that stands out is that there’s a high level of confidence in the information listed. There is more confidence in the facts that are associated with entities than there might have been in the Knowledge Graph.

Continue reading

Extracting Facts for Entities from Sources such as Wikipedia Titles and Infoboxes

There are a number of patents from Google, both granted patents and pending patent applications, that describe ways that Google might learn about entities and about facts associated with those by extracting the information from the Web itself instead of relying upon people submitting information to knowledge bases such as Freebase.

We learned from Google’s recent announcement that they would be replacing the Google Knowledge Base with their Knowledge Vault, and that supposedly brings a whole new set of extraction approaches with it that have high levels of confidence with them as to how accurate they might be.

It’s hard to tell exactly which approaches Google might be relying upon, and which ones that Google might have introduced through something like a patent that is no longer being used. But, it doesn’t hurt to learn some of the history and some of the approaches that might have been used in the past.

I’m blogging about a patent today that describes an approach that many of us have assumed that Google has been using for years to identify Objects or Entities and attributes about those and the values that fit those attributes.

Continue reading

Identifying Entity Types and the Transfiguration of Search @Google

The World Wide Web is a vast resource for information. At the same time it is extremely distributed.

A particular type of data such as restaurant lists may be scattered across thousands of independent information sources in many different formats. In this paper, we consider the problem of extracting a relation for such a data type from all of these sources automatically.

We present a technique which exploits the duality between sets of patterns and relations to grow the target relation starting from a small sample. To test our technique we use it to extract a relation of (author, title) pairs from the World Wide Web.

Sergey Brin, Extracting Patterns and Relations from the World Wide Web (pdf), Stanford University, 1999

Torpedo as Aft, in the Torpedo Factory in Alexandria
Entities Change – Torpedoes become art and Search Engines become Knowledge Repositories.

Continue reading

Semantic SEO or Semantic Search?

A few years ago, I presented at SES San Jose and someone asked me what they should be keeping an eye upon in SEO. I told them “named entities.” I was reminded of that conversation as I gave a talk today about named entities and other semantics.

I presented this morning at San Jose McEnery Convention Center at the Semantic Technology and Business Conference (#SemTechBiz2014).

Barbara Starr and I gave a 3 hour Tutorial on Semantic Search to an enthusiastic and engaged audience. We also discussed which might be a better name for the tutorial, “Semantic Search” (the name it had) or Semantic SEO (what do you think?).

Here’s Barbara’s presentation, which is the first half of the tutorial Thanks, Barbara – totally brilliant stuff:

Continue reading

Google on Finding Entities: A Tale of Two Michael Jacksons

I’ve been saying for at least a couple of years that Google’s local search is a proof of concept for the search giant to use on how to find and understand entities.

With local search, Google goes out and looks for a mention of a business on the Web, especially when it it accompanied by geographic location information. It collects and gathers facts related to businesses (entities are people, places, and things) and then it clusters information about the objects it finds to make sure that those mentions across the Web are all referring to the same places.

If you start reading about local search, you’ll see people referring to the importance of consistency in how you present address information for a business, and the same thing is true for entities.

Two different michael jacksons

Continue reading

How Knowledge Base Entities can be Used in Searches

When Google crawls the Web to collect information about objects or entities, it also collects facts about those entities. These facts are separated into different categories or attributes associated with those entities. For example, a book may have attributes such as an author, a publisher, a year published, a web site it can call home , a genre, and more.

Identifying Entities by their Attributes

A search that includes those attributes can be used to identify the entity the attributes might be associated with.

Google was granted a patent recently that describes how those attributes could be searched within an attribute data store to find the entity. The patent shows how the process described within it might be used to answer some complex queries, and some interactive Answerbox type queries. The issue that this patent addresses can be summed up in a single question:

Continue reading

Finding Entity Names in Google’s Knowledge Graph

Most of us searchers and site owners and search engine optimizers are familiar with Google’s Link Graph, and how Google uses the connections between websites to help in ranking pages on the Web. In part, Google looks at the relevance of the content of a page compared to a query that a searcher enters at the search engine.

In addition to “relevance”, Google also uses the patented method of PageRank, in which the quality and quantity of links pointed to a page are used as a proxy for the quality of the page being linked to. The higher the quality of a page (and the higher PageRank it possesses), the more PageRank it likely passes along.

links between pages, from the reasonable surfer patent

The link graph is one example of how Google ranks and measures and possibly sorts web pages. Another that Google might look at is the attention graph – how Google might use topics and concepts that may be searched upon frequently to change rankings of pages based upon freshness and hot topics.

Continue reading