How Knowledge Base Entities can be Used in Searches

When Google crawls the Web to collect information about objects or entities, it also collects facts about those entities. These facts are separated into different categories or attributes associated with those entities. For example, a book may have attributes such as an author, a publisher, a year published, a web site it can call home , a genre, and more.

Identifying Entities by their Attributes

A search that includes those attributes can be used to identify the entity the attributes might be associated with.

Google was granted a patent recently that describes how those attributes could be searched within an attribute data store to find the entity. The patent shows how the process described within it might be used to answer some complex queries, and some interactive Answerbox type queries. The issue that this patent addresses can be summed up in a single question:

What entity of a generic type, e.g., book, medical condition, or movie, is associated with the features named in the query?

So, one query that might be answered with this approach might be something such as, “What is the movie where Robert Duvall loves the smell of Napalm in the morning?”

Movie attributes used in query to identify movie

In this first movie, the attribution data store might tell us that Robert Duvall is an actor in the movie, and it might also tell us that he has a memorable quote in the movie about loving the smell of napalm in the morning.

By searching the data store for attributes listed in that query, “Apocalypse Now” is identified with a certain level of confidence as the movie being looked for, and a query for the name of the movie is sent to the search engines, so that results for it can be returned.

Another query about a movie might be, “Movie where Christian Slater has a baboon heart.”

A query where attributes about the movie Untamed Heart are ued to search fo the movie

In this query, Christian Slater is an actor in the movie, and one of the plot elements is a rumor that he has a baboon heart. So there are two attributes identified with this movie that a search of the attribute data store can be used to identify which entity is being looked for.

A search of the movie is conducted by the search engine so that search results for it can be returned.

As the patent notes, similar queries could be conducted that involve finding other entities, such as books or people or songs.

An Answerbox where Attributes Could be Checked Off

But the patent also describes a more complicated set of entities, with a Onebox approach which allows you to enter symptoms, and asks you if you would like to include others to identify a specific medical condition.

I tried a few of these searches and didn’t get results like those identified within the patent.

I don’t know if that is a technical limitation, or a business decision – should Google be used to identify medical conditions based upon the symptoms associated with it?

A medical conditions Onebox where a searcher could check off symptoms related to a medical condition they want to learn about.

The patent does tell us that the medical condition is a specific entity type, and those symptoms are attributes of the entity.

The patent is:

Identifying entities using search results
Invented by Thomas A. Lasko, Andrew Tomkins, Michael Angelo, Matthew K. Gray, Russell Ryan, Namrata U. Godbole, and Roni F. Zeiger
Assigned to Google
US Patent 8,775,439
Granted July 8, 2014
Filed September 27, 2011

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying entities using search results.

  • One of the methods includes the actions of determining that a first search query includes a respective text reference to each of one or more predetermined attributes, wherein each attribute is associated with a first entity type;
  • For each of a plurality of entities of the first entity type, generating a combined search query that includes the first search query and a name of the entity;
  • Obtaining search results for each of the plurality of entities using the combined search query for each respective entity; and using the obtained search results to generate combined search results to include in a response to the first search query.

Takeaways

I’ve been doing a number of entity audits for clients looking for and identifying the use of entities on a site, and entities that should be included on those sites.

Getting a sense of the attributes that are associated with those entities is also an important element, as are entities that might appear in search results as related entities.

This patent shows how entities from your web site might be found in search results in queries that focus upon the attributes your entities might contain.

This is a different type of semantic search than a Hummingbird search that can involve a long and complex conversational query being re-written

Share

10 thoughts on “How Knowledge Base Entities can be Used in Searches”

  1. Probably a dense question but is the thinking from a practical application standpoint that you can help clients identify potential entities on their site then guide them through creating structured data around those entities – making attributes Googs can more readily link to a query? I’m intrigued by all the semantic action but am still walking through the ways you make it digestible…

  2. Oh Gosh!! That’s a deep research. I am completely un-aware of that. Does that relate to ERD (Entity Recognition and disambiguation) project powered by Microsoft? Knowledge Graph was a result of that entity relationship for sure. I am looking forward if some entity could benefit a web page through structured data markup or something else which would benefit the webmasters to rank well in search engines. Hats off Bill.Love to explore more info on that.

  3. Really interesting insights, Biil!

    This checkbox inclusion that allows for further narrowing down the search and the ticks used as a simpler alternative to the AND and NOT search operators – that’s a really curious approach for achieving better UX .

    Though in this specific example, I am having certain reservations whether people should be offered a pool of possibly related illnesses to the symptoms they list. I don’t think that search engines should be allowed to make assumptions of what health condition one might have, i.e. to diagnose its users. Imagine the risk of people or even kids trying to treat themselves for something they’ve said to be having by Google (there are pretty gullible people out there – no doubt about that). But the illness related example might also be used just as an illustration of what the algo is all about,so I won’t read too much into it.

    The other interesting part is that the search results could be influenced by the previous queries; “the query classifier can be trained to add terms from previously submitted queries that appear in more than a threshold proportion of resources related to”. Thus if 2 entities are selected from the search results of the same query then the search engine might decide that they are equally important and combine those two in future search results:
    “The two conditions can then be presented in the form of a single link, e.g., with associated text “skin diseases (e.g., psoriasis, eczema).” The user can select the link to get search results for, e.g., a query including “skin diseases,” “psoriasis AND eczema,” or both.”

  4. Is this really entity based or keyword based search? Searching for “movie Robert Duvall loves Napalm morning” will give almost the same SERP, based on the individual keywords found in the query. Google replaced “movie” with “film” though. If it really was entity based search, Google would search trhough their index for “Apocalypse Now”.

  5. Hi digitaldionne

    Yes, the thinking is that named entities can be identified on a site, and attributes for those can be identified and included on a page. For example, if the page is about the author of a number of books, information about those books could be included on page pages of the author’s site. An entity audit could identify entities associated with the site, as could the attributes that are associated with that entity, related entities, and more. Some entities or related entities or attributes that should be included on a site could be added. Wikipedia articles about the entity could be updated. Freebase profiles could be updated.

  6. Hi Amit

    Both Google and Microsoft have done research on named entities over the years. I don’t think Microsoft’s project is necessarily further along than Google’s. Structured data markup isn’t necessarily going to give on search engine a benefit over another, but projects such as Freebase could be helpful. See: How Google and Microsoft taught search to “understand” the Web http://arstechnica.com/information-technology/2012/06/inside-the-architecture-of-googles-knowledge-graph-and-microsofts-satori/

  7. Hi Nevyana

    Not sure why the patent uses a checkmark user interface, but I’ll be keeping an eye on search results to see if this particular way of searching is released.

    When I first saw this patent, I wondered whether Google would create a symptom search, and I doubted that they would potentially expose themselves to the risk since they aren’t the actually publishers of the content they are helping to surface. I didn’t write about all of the variations that might be covered within the patent, but it does include some interesting features if you dig in deeper.

  8. Hi Jan-Willem,

    “If it really was entity based search, Google would search trhough their index for “Apocalypse Now”.”

    I’m not sure what you mean by that. The point behind the patent is to search through the attribute data store to find attributes there, and the entity they are associated with (with the highest level of confidence), and once that is done to do a keyword search on the entity and return the results for that. So, under the patent, it’s both an entity based and a keyword based search. The search for “Robert Duvall” as an actor in the movie, and the search for the quote “I love the smell of napalm in the morning” help to identify “Apocalypse Now” as the entity being searched for. The keyword search for “Apocalypse Now” then allows a searcher to choose the source they want to see, and whether that’s an informational page, such as a Wikipedia page or an official film site page or a YouTube video. So it starts out as an entity search, but transforms into a Web document based search.

  9. Hey Bill,
    Amazing research… Didnt knew that.
    Google has being eying entity based relationship patents and research from a long time. I think Microsoft have something of a similar research going on but dont know much detail about it. Bing has also made updates and made it search better but a long way to go for them, to compete with google.

  10. Your research is too good and i am very happy to read this blog. this knowledge base blog is very useful for Internet users which are browsing & surfing daily. According to my knowledge first of all when you finding any entities you have a good knowledge about that entities like what are the major facts are relating with & how to find that through these facts. your example which is given above its know all about this article, these is a very simple way to describe your thoughts. I think google is one of the best search engine in the world & its find very easily to your searches, for any search engine the most important thing is how to work that and how to fulfill the requirements of user. google use best attributes to finding any query and i like google very much to other search engine.

    Thanks

Comments are closed.