How Google is focusing upon Building and Promoting Entity Collections

Sharing is caring!

Added 11:48 AM (pst) May 3, 2015, H/t to Natzir Turrado, incoming news is that Google+ is introducing a new feature they are referring to as Collections, and that announcement from The Windows Club features the word “curation” prominently as do the two Google patent applications I write about in this post. Here’s how Susannah Lindsay in The Windows Club article uses the concept:

Google Plus users will get an opportunity to curate pieces of content into their collection, with others holding the permission of viewing, sharing, and following those collections as they please.

Added 12:15 Pm (pst) More on the rumored Collections feature at Google+: Google+ is Testing a New “Collections” Feature That Seems to be Part Pinterest, Part Blogging

Last November, before Google and Twitter announced that they had a new partnership delivering a data stream of tweets to Google, to enable the search giant to include that real time social media data into its search results, we read about some of the efforts that twitter was undertaking to try to get more visitors to its pages in the article, Twitter: Renewed Focus On SEO Generated 10 Times More Visitors.

My friend Barbara Starr commented on the approach, referring to Twitter’s use of hashtags, and collections of entities to attract attention to their pages, and it appears that it was an effective approach. Barbara’s use of the word collections referring to entities has been echoed in the issuance of a couple of recent patent applications from Google that focus upon building collections for the benefit of searchers, and content curators, like a Twitter.

Automatic Definition of Entity Collections
Pub. No.: WO/2015/051480
International Application No.: PCT/CN2013/001213
Publication Date: 16.04.2015
International Filing Date: 09.10.2013
Applicants: Google
Invented by Faen Zhang, Keith Golden, Amit Behal, Ben Hutchinson, Alexander Oliver Marks, Fei Wu, Yuan Gao

Abstract:

A system for automatically generating entity collections comprises a data graph including entities connected by edges and instructions that cause the computer system to determine a set of entities from the data graph and to determine a set of constraints that has a quantity of constraints.

A constraint in the set represents a path in the data graph shared by at least two of the entities in the set of entities. The instructions also cause the computer system to generate candidate collection definitions from combinations of the constraints, where each candidate collection definition identifies at least one constraint and no more than the quantity of constraints. The instructions also cause the computer system to determine an information gain for at least some of the candidate collection definitions, and store at least one candidate collection definition that has an information gain that meets a threshold as a candidate collection.

Determining Collection Membership in a Data Graph
Applicants: Google
Invented by Faen Zhang, Keith Golden, Amit, Behal, Ben Hutchinson, Alexander Oliver Marks, Jason Macnak
Pub. No.WO/2015/051481
International Filing Date 09.10.2013

Abstract:

An efficient system for evaluating collection membership in a large data graph. The system includes a data graph of nodes connected by edges and an index of constraints from collection definitions, a definition specifying at least one condition with at least one constraint, where a constraint has a constraint type and a constraint expression. Multiple conditions in the definition may be conjunctive.

The system may also include instructions that, when executed by the at least one processor, cause the system to: evaluate an edge for a node in the data graph against the index to determine conditions met by the edge and its associated neighborhood, repeat the evaluating for each edge associated with the node in the data graph, determine that conditions for a first collection are met, and generate an indication in the data graph that the node is a member of the first collection.

The first of these patents focuses upon finding collections of entities and creating collections. The second one describes how members may be added to a collection.

The knowledge panel below is for the Cincinnati Reds team, and includes a scrolling list of the players on the team, likely from a table Google has found on the web:

Scrolling roster shows a collection of entities for this sports team
Scrolling roster shows a collection of entities for this sports team

Defining Entity Collections

The patent on defining Entity Collections tells us that it might do that a few different ways.

(1) Assigning Constraints to entities

As Google discovers entities on the Web, it might collect information about those related to “constraints” in one of the following five formats; Exists, Not Exists, Equals, Not Equals, and a Template format. These constraints identify whether or not an entity is a member of a published or a candidate collection, and may be considered part of one of those.

Published collections may be found in places like a wiki or a table on the Web. like a list of members of a sports team or US Presidents or World Leaders.

Candidate Collections may be identified in popular web queries that might indicate a collection of entities, and the patent tells us that entities that are members of those collections may be identified in Semantic searches, as follows:

…Determining the first set of entities may include selecting a category from a crowd-sourced document corpus and determining entities identified by the category.

As another example, determining the first set of entities may include identifying a popular query from search records, converting the popular query to at least one semantic query, and executing the at least one semantic query against the data graph to obtain a query result, wherein the first set of entities is the query result from the data graph. Converting the popular query to the at least one semantic query may include converting the popular query to a plurality of semantic queries, running each of the plurality of semantic queries against the data graph, and determining a plurality of sets of entities, a set of the plurality of sets representing entities responsive to one of the semantic queries.

The first four types of constraints identifies whether or not an entity is a member of a certain type of collection, and the “template” constraint identifies a template that might be associated with the entity that shows off membership in a certain type of collection.

Take-Aways

The patent does describe how information about entities may exist in the web in the form of triples, how those are searchable, and can be used to identify whether or not an entity is the member of a team or a political activist group or a world leader group, and how different collections may be ranked based upon things like how notable their members are.

Collections can include groups of entities such as Chinese Scientists, or Tom Hank’s Movies.

A collection of Tom Hanks Movies shown off in a Google Carousel.
A collection of Tom Hanks Movies shown off in a Google Carousel.

From reading the patents, you get a sense that Google has implemented some aspects of these patents but hasn’t put into place knowledge panels or carousels that might show off collections of groups of entities such as “World leaders, ” but yet might.

Sharing is caring!

13 thoughts on “How Google is focusing upon Building and Promoting Entity Collections”

  1. Nice find, Bill! In the first one, I find this wording interesting:
    “… where each candidate collection definition identifies at least one constraint and no more than the quantity of constraints.”
    What do you make of that “and no more than… ” bit?

  2. Hi Doc,

    Trying to define patent claims is difficult, and trying to do it out of context is more so.

    The constraints the patent talks about assigning to entities include:

    Exists,
    Not Exists,
    Equals,
    Not Equals, and a
    Template format

    The first patent involves “AUTOMATIC DEFINITION OF ENTITY COLLECTIONS”, so it’s about finding and identifying collections involving different kinds of entities.

    So, if the creation of a collection involves identifying people who acted in a Tom Hanks movie, then the semantic query involved should be in finding people who acted in at least one Tom Hanks movie. If the collection is of Chinese Scientists, then the constraints mean that at least one semantic query should identify that they are Chinese and one semantic query should identify that they are scientists.

    I think that “quantity of constraints” is telling us that the entity collections identified can be limited to constraints being necessary to identify membership in a collection. I could be reading those wrong, though. It isn’t well defined there.

  3. This is interesting and also very fascinating, how they will go about building these collections, categorize them. But still could not understand clear implications, once these collections becomes a reality?

  4. Hi Cathy,

    It appears as though Google may be learning from its users about what may be contained with a collection with the collection curating model taking place at Google+

  5. Thanks encore Bill.

    Sigh …

    I have to say again, perhaps with a distortion bias.

    Entities have to exist within defined zones. The factorial of the chain of entity relationships gets more drastically huge and fuzzy with the increase of entities.

    It’s the tail wagging the dog. Instead of finding by index, it is defining buckets that categorise the subject that Google thinks (dictates) the searcher *should* be searching for. IOW, it leaves nothing to human decision making. An index is infinite – defining a grid of overlapping ‘things’ will naturally be self-confining.

    The trouble with brute-force AI like this is it will extinguish the light of interesting and diverse material which ironically is what people with any intelligence and curiosity really, really want.

    If Google continues beyond its remit, the filter bubble will get so bad, even the most superficial souls might realise it and drift off.

    Jabberwocky.

  6. Hi Jon,
    At this point, these patents seem to be behind the approach shown in Google’s new Collections Feature at Google+, which seems to leave open presently a lot of human decision making as to what should be place within a collection. That freedom and flexibility may be one of the things that may keep Google’s approach from being too brute-force an approach.

  7. Great article! It seems to me that Google is trying to slow shift more toward semantic search. Schema markup will definitely be playing a larger role in the future on search for sure.

  8. This is interesting and also very fascinating, which seems to that Google is trying to slow shift more toward semantic search.

  9. Google, to empower the pursuit monster to incorporate that ongoing social networking information into its indexed lists, we read about a portion of the endeavors that twitter was attempted to attempt to get more guests to its pages in the article,

  10. This is really very interesting, how they will go about building these collections, categorize them. But i still could not understand clear implications, these collections becomes a reality? Or yes, so Why ?

  11. Hi Sunny,

    One way they can build collections would be while crawling the web. Another is to leave it in the hands of content curators, in the new collections feature at Google+

  12. Nice Article, It looks that Google is making an attempt to slow shift additional toward linguistics search. Schema markup will certainly be taking part in a bigger role within the future on rummage around for certain.

Comments are closed.