I just returned from a few days in Las Vegas and the Pubcon Conference.
I had the chance to see some great presentations and talk to a number of interesting folks, and the company that I am the Director of Search Marketing at, Go Fish Digitalwon a US Search Award for Best Use of Search for Travel/Leisure, for a campaign we did for Reston Limo.
I wanted to share my presentation from the conference here as well.
Last year I wrote a post titled Google on Finding Entities: A Tale of Two Michael Jacksons. The post was about a Google patent that described how Google might tell different entities apart that shared the same name. The patent in it was filed in 2012 and granted in 2014. Google was also granted a new patent on disambiguating entities this week, which was originally filed in 2006. It is worth looking at this second one, given how important understanding entities is to Google.
It contains a pretty thoughtful approach to understanding and distinguishing between different entities within documents and queries.
The Web is filled with factual information, and Search on the web has been going through changes to try to take advantage of all of the data found there. Mainstream search engines, such as Google, Bing, and Yahoo, traditionally haven’t given us simple and short answers to our queries; instead showing us a list of Web pages (often historically referred to as 10 blue links) where that data might be found; and then forcing us to sort through that list to find an answer.
Google introduced providing direct answers to questions at the Google Blog in April 2005, in Just the Facts, Fast.
That may have been in response to Tim Berners-Lee writing about the Semantic Web back in 2001, where he alerted us to the possibilities that freeing data otherwise locked into documents might bring to us. By search engines finding ways to crawl the web collecting information about objects and data associated with them, we begin approaching the possibilities he mentioned. And we get answers that we otherwise couldn’t find as easily.
A patent granted to Google this week attempts to identify similarities between different types of entities, when it finds information about them on the Web. It refers to these types of similarities as commonalities, as in things they may have in common. Google may use these similarities in a number of ways, such as supplementing search results containing related information based upon results that might be in the same category or possibly located in the same region.
The things identified as common may be for things that are moderately unique, but not completely rare.
The patent say “entities,” but it seems to be focusing upon different businesses that might share some similarities. For example, they refer to a food critic writing about restaurants a few times and tell us that the things such a critic might write about different restaurants might be used to find similarities between those places.
Added 11:48 AM (pst) May 3, 2015, H/t to Natzir Turrado, incoming news is that Google+ is introducing a new feature they are referring to as Collections, and that announcement from The Windows Club features the word “curation” prominently as do the two Google patent applications I write about in this post. Here’s how Susannah Lindsay in The Windows Club article uses the concept:
Google Plus users will get an opportunity to curate pieces of content into their collection, with others holding the permission of viewing, sharing, and following those collections as they please.
There are a lot of Government Web sites that have made the data that they collect and compile freely available to the public. The licenses that data has been released under are described on the following Pages:
If you are considering starting a project using that kind of data, you should read the Open Data Handbook, which provides a lot in the way of details, and much more information is available on Data.gov, including a broad overview of different types of topics that data is available about, including:
A Google patent granted this week describes how Google might try to understand Entities that appear on Web pages, and how that awareness might influence the search results that the search engine shows off in search results.
An Entity is a specifically named person, place, or thing (including ideas and objects) that could be connected to other entities based upon relationships between them. Some pages may make certain Entities to be the main Subject of a page, while other may include additional information about entities that are related in some manner to those first entities. When some entities appear on pages, they may be presented in an ambiguous manner that doesn’t make them the main topic for the page they appear upon.
Entities are said to exist in a graph that connects them to other entities based upon relationships between them. For instance, Google and Bing are both Search Engines, both internet domains, both employers of many search engineers, and have CEOs, Vice Presidents, Marketing staff, headquarters, data centers, Web indexes. There are a lot of related entities that might show up on Web pages about both.
This view of Entities being related to each other, and belonging to an “Entity Graph” is very similar to what the Microsoft Patent I wrote about recently in How Bing May Expand Queries Based upon Finding Entities Within them. A number of the ideas behind how that patent works and this one are similar in that some knowledge about an entity might cause a search engine to display information about related entities.