How Google Decides Which Images to Show For Entities in Knowledge Panels

Sharing is caring!

Google has been showing Knowledge panels in response to queries where Google recognizes an entity within that query, and Google has collected enough information about that entity for it to display a knowledge panel about the entity. I’ve written about these knowledge panels before in the posts, How Google Decides What To Know In Knowledge Graph Results, and in Google’s Knowledge Cards.

I mentioned images in knowledge panels in those, but not how images might be chosen to represent the entities that those panels are about, especially when the entities are people.

Knowledge panel images of Thomas Edison
Google's Knowledge Panel Images for Benjamin Franklin
Knowledge Panel images for George Washington.

A patent granted to Google earlier this month describes how images might be selected as high authority images to represent people returned in knowledge panels in response to a query.

There may be other issues that determine what image is selected, but this patent lays out a nice framework for determining what images to show. When you’re talking about someone like Thomas Edison or Benjamin Franklin or George Washington, there are potentially a lot of images that could be shown for each of those; so how does Google decide what to show?

Keep in mind that when you do an image search for a person, such as Thomas Edison, the image returned from that query appears on a web page, which may be referred to as a “landing page.” This web page may have a quality score associated with it, that could be used to rank the image from that page, that compares web pages where images are located for that person, and compares them to each other.

Images themselves could also be scored based upon the quality of the image.

The combination of image scores and quality scores for web pages that contain images of entities might be used to generate an image authority score.

The images may then be ranked based upon these image authority scores. The highest ranked images may be the ones displayed to a searcher.

We are told that quality scores for web pages may be determined independently of the content of those pages.

Click Logs and Query Logs may be used to identify images about entities, with images that have been clicked upon a lot in an image search for an entity possibly scoring higher than other images for that entity. The quality score for a page that an image is found upon could be based in part upon the number and quality of links pointed to that page.

Images for a person may be selected in part, using facial recognition software, creating a score, based upon a confidence level that the person displayed is similar to other images of that person. A “portrait score” may be part of that image score, making sure that the image shown contains matching features to other images that have been determined to be similar (eyes, a nose, a mouth, ears, and other features that may indicate a face).

Image quality scores

Advantages of the method in the Patent

  1. Authoritative, or high quality images, may be identified based on being included in a number of high quality resources, and the fact that there are similar images also from high quality pages
  2. A Comparison of image resource quality scores for similar images to image resource quality scores for dissimilar images provides a relative measure of image quality that can be used to select images with a high degree of authority with respect to an entity relative to the authority of other images with respect to the same entity
  3. Images that have a relatively high authority for an entity are more likely to satisfy a user’s informational need than images with a relatively low authority for the entity
  4. Images with high image scores for an entity are likely to good choices (as in visually representative, clear and distinguishable from other images related to that entity).

The patent is:

Scoring images related to entities
Invented by: Adam Hartwig, Sylvain Gelly, Yuan Li, Taehee Lee
Assigned to: Google
US Patent 9,098,552
Granted August 4, 2015
Filed: February 5, 2013

Abstract

Methods, systems, and apparatus for scoring images related to entities. In one aspect, a method includes:

  • Identifying images associated with a person, each image being included in one or more resources
  • Obtaining, for each resource that includes one of the images, a quality score that represents a quality of the resource; for each of the images
  • Generating an image resource quality score from the quality scores of the resources that include the image
  • Identifying a set of similar images from the images, each similar image having a measure of similarity to the image that meets a similarity measure threshold
  • Generating an image score based on image resource quality scores of the resources that include the similar images relative to image resource quality scores of the resources that include each of the images
  • Generating an image authority score based on the image resource quality score and the image score.

Take Aways

The patent is more detailed than what I’ve written about here, and there’s discussion about similar images found on the Web, and how those can support decisions made as to what images to show.

This approach may work best for people where there may be a lot of images of them on the Web.

If you hover over the images shown for a person in a knowledge panel, those are linked to the pages where they come from, if you want to investigate the sources of those pictures in more detail. Note that in the knowledge panel pictures below for Tim Cook and for Larry Page, One of the images for each of those persons come from their company’s websites.

Knowledge panel Image for Larry Page
Knowledge Panel Image for Tim Cook.

Sharing is caring!

14 thoughts on “How Google Decides Which Images to Show For Entities in Knowledge Panels”

  1. Nice share, is there any mention of linked data? I thought u just add linked data and then Google chooses the image based on a trust score of the connections.

  2. Hi bhd,

    The patent doesn’t say anything about Linked Data, but that is one way you can use to identify an entity that is associated with a page, or an image that is associated with a person, and that could play a role in the identification of an image for person, too.

  3. Hi Merlinox,

    Thanks for pointing that out. Google is going to make mistakes from time to time. Hopefully, they can improve upon those and keep getting better and better. They’ve shown an ability to do that. Hopefully they can with the Semantic Web, too.

  4. Thanks Bill. Unfortunaly italian is an “unregular” language. Too rulez, too exceptions, too “common saying” used to natural language.

    Some weeks ago Google confused Milano (city in northen Italy) with Milano Marittima (sea location at center of Italy)… 😉

  5. Excellent post, Bill! Even as Google Image Search is doing a wonderful job already, it’s still not free from glitches as discovered and shared by various users on social media from time to time. Identifying and filtering data based on entity are based on complex dynamics. I’m sure Google will improve the results soon.

  6. Not sure how all of this will work out if its concerning many people with the same name. If you have persons with a unique name its easy to collect images about them and compare them and know its this person. But if you have many people with the same name, even from the same states, I think this will fail. Nevertheless, very good information to know. Thank you.

  7. Hi Johnny,

    Thanks. Part of these algorithm involves looking for similar images, which would be different for people that might share the same name. So the point of failure you point out should be avoided. Especially if facial recognition software is used, as is also mentioned in the patent. So the concern you had is one that the inventors of the patent had as well, and it looks like they planned for the possibility of similar names for the same entities.

Comments are closed.