Category Archives: Fact Extraction and Knowledge Graphs

Techniques and approaches that search engines might use to extract facts and information from the Web, as uncovered in search-related patents and whitepapers.

The Evolution of Search

I just returned from a few days in Las Vegas and the Pubcon Conference.

I had the chance to see some great presentations and talk to a number of interesting folks, and the company that I am the Director of Search Marketing at, Go Fish Digital won a US Search Award for Best Use of Search for Travel/Leisure, for a campaign we did for Reston Limo.

I wanted to share my presentation from the conference here as well.

Continue reading The Evolution of Search

How Google might Disambiguate Different Entities with the Same Names in Queries and Pages

Last year I wrote a post titled Google on Finding Entities: A Tale of Two Michael Jacksons. The post was about a Google patent that described how Google might tell different entities apart that shared the same name. The patent in it was filed in 2012 and granted in 2014. Google was also granted a new patent on disambiguating entities this week, which was originally filed in 2006. It is worth looking at this second one, given how important understanding entities is to Google.

Webb Telescope Mirrors Arrive at NASA Goddard, NASA Goddard Space Flight Center, Some Rights Reserved.
Webb Telescope Mirrors Arrive at NASA Goddard, NASA Goddard Space Flight Center, Some Rights Reserved.

It contains a pretty thoughtful approach to understanding and distinguishing between different entities within documents and queries.

Continue reading How Google might Disambiguate Different Entities with the Same Names in Queries and Pages

When Google Started Answering Factual Queries

The Web is filled with factual information, and Search on the web has been going through changes to try to take advantage of all of the data found there. Mainstream search engines, such as Google, Bing, and Yahoo, traditionally haven’t given us simple and short answers to our queries; instead showing us a list of Web pages (often historically referred to as 10 blue links) where that data might be found; and then forcing us to sort through that list to find an answer.

Google introduced providing direct answers to questions at the Google Blog in April 2005, in Just the Facts, Fast.

That may have been in response to Tim Berners-Lee writing about the Semantic Web back in 2001, where he alerted us to the possibilities that freeing data otherwise locked into documents might bring to us. By search engines finding ways to crawl the web collecting information about objects and data associated with them, we begin approaching the possibilities he mentioned. And we get answers that we otherwise couldn’t find as easily.

Continue reading When Google Started Answering Factual Queries

How Google may Identify How Related Different Entities Are

A patent granted to Google this week attempts to identify similarities between different types of entities, when it finds information about them on the Web. It refers to these types of similarities as commonalities, as in things they may have in common. Google may use these similarities in a number of ways, such as supplementing search results containing related information based upon results that might be in the same category or possibly located in the same region.

The things identified as common may be for things that are moderately unique, but not completely rare.

The patent say “entities,” but it seems to be focusing upon different businesses that might share some similarities. For example, they refer to a food critic writing about restaurants a few times and tell us that the things such a critic might write about different restaurants might be used to find similarities between those places.

Continue reading How Google may Identify How Related Different Entities Are

How Google Decides Which Images to Show For Entities in Knowledge Panels

Google has been showing Knowledge panels in response to queries where Google recognizes an entity within that query, and Google has collected enough information about that entity for it to display a knowledge panel about the entity. I’ve written about these knowledge panels before in the posts, How Google Decides What To Know In Knowledge Graph Results, and in Google’s Knowledge Cards.

I mentioned images in knowledge panels in those, but not how images might be chosen to represent the entities that those panels are about, especially when the entities are people.

Knowledge panel images of Thomas Edison
Google's Knowledge Panel Images for Benjamin Franklin
Knowledge Panel images for George Washington.

Continue reading How Google Decides Which Images to Show For Entities in Knowledge Panels

How Google is focusing upon Building and Promoting Entity Collections

Added 11:48 AM (pst) May 3, 2015, H/t to Natzir Turrado, incoming news is that Google+ is introducing a new feature they are referring to as Collections, and that announcement from The Windows Club features the word “curation” prominently as do the two Google patent applications I write about in this post. Here’s how Susannah Lindsay in The Windows Club article uses the concept:

Google Plus users will get an opportunity to curate pieces of content into their collection, with others holding the permission of viewing, sharing, and following those collections as they please.

Added 12:15 Pm (pst) More on the rumored Collections feature at Google+: Google+ is Testing a New “Collections” Feature That Seems to be Part Pinterest, Part Blogging

Continue reading How Google is focusing upon Building and Promoting Entity Collections

Open Data Commons Opportunities

There are a lot of Government Web sites that have made the data that they collect and compile freely available to the public. The licenses that data has been released under are described on the following Pages:

ODC Public Domain Dedication and License (PDDL)
Open Data Commons Open Database License (ODbL)
Open Data Commons Attribution License

If you are considering starting a project using that kind of data, you should read the Open Data Handbook, which provides a lot in the way of details, and much more information is available on Data.gov, including a broad overview of different types of topics that data is available about, including:

  • Agriculture,
  • Business,
  • Climate,
  • Consumer,
  • Ecosystems,
  • Education,
  • Energy,
  • Finance,
  • Health,
  • Local Government,
  • Manufacturing,
  • Ocean,
  • Public Safety,
  • Science & Research.

Continue reading Open Data Commons Opportunities

How Google May Identify Central Entities from Resources

A Google patent granted this week describes how Google might try to understand Entities that appear on Web pages, and how that awareness might influence the search results that the search engine shows off in search results.

An Entity is a specifically named person, place, or thing (including ideas and objects) that could be connected to other entities based upon relationships between them. Some pages may make certain Entities to be the main Subject of a page, while other may include additional information about entities that are related in some manner to those first entities. When some entities appear on pages, they may be presented in an ambiguous manner that doesn’t make them the main topic for the page they appear upon.

Entities are said to exist in a graph that connects them to other entities based upon relationships between them. For instance, Google and Bing are both Search Engines, both internet domains, both employers of many search engineers, and have CEOs, Vice Presidents, Marketing staff, headquarters, data centers, Web indexes. There are a lot of related entities that might show up on Web pages about both.

This view of Entities being related to each other, and belonging to an “Entity Graph” is very similar to what the Microsoft Patent I wrote about recently in How Bing May Expand Queries Based upon Finding Entities Within them. A number of the ideas behind how that patent works and this one are similar in that some knowledge about an entity might cause a search engine to display information about related entities.

Continue reading How Google May Identify Central Entities from Resources