Google has been showing Knowledge panels in response to queries where Google recognizes an entity within that query, and Google has collected enough information about that entity for it to display a knowledge panel about the entity. I’ve written about these knowledge panels before in the posts, How Google Decides What To Know In Knowledge Graph Results, and in Google’s Knowledge Cards.
I mentioned images in knowledge panels in those, but not how images might be chosen to represent the entities that those panels are about, especially when the entities are people.
A patent granted to Google earlier this month describes how images might be selected as high authority images to represent people returned in knowledge panels in response to a query.
There may be other issues that determine what image is selected, but this patent lays out a nice framework for determining what images to show. When you’re talking about someone like Thomas Edison or Benjamin Franklin or George Washington, there are potentially a lot of images that could be shown for each of those; so how does Google decide what to show?
Keep in mind that when you do an image search for a person, such as Thomas Edison, the image returned from that query appears on a web page, which may be referred to as a “landing page.” This web page may have a quality score associated with it, that could be used to rank the image from that page, that compares web pages where images are located for that person, and compares them to each other.
Images themselves could also be scored based upon the quality of the image.
The combination of image scores and quality scores for web pages that contain images of entities might be used to generate an image authority score.
The images may then be ranked based upon these image authority scores. The highest ranked images may be the ones displayed to a searcher.
We are told that quality scores for web pages may be determined independently of the content of those pages.
Click Logs and Query Logs may be used to identify images about entities, with images that have been clicked upon a lot in an image search for an entity possibly scoring higher than other images for that entity. The quality score for a page that an image is found upon could be based in part upon the number and quality of links pointed to that page.
Images for a person may be selected in part, using facial recognition software, creating a score, based upon a confidence level that the person displayed is similar to other images of that person. A “portrait score” may be part of that image score, making sure that the image shown contains matching features to other images that have been determined to be similar (eyes, a nose, a mouth, ears, and other features that may indicate a face).
Advantages of the method in the Patent
- Authoritative, or high quality images, may be identified based on being included in a number of high quality resources, and the fact that there are similar images also from high quality pages
- A Comparison of image resource quality scores for similar images to image resource quality scores for dissimilar images provides a relative measure of image quality that can be used to select images with a high degree of authority with respect to an entity relative to the authority of other images with respect to the same entity
- Images that have a relatively high authority for an entity are more likely to satisfy a user’s informational need than images with a relatively low authority for the entity
- Images with high image scores for an entity are likely to good choices (as in visually representative, clear and distinguishable from other images related to that entity).
The patent is:
Scoring images related to entities
Invented by: Adam Hartwig, Sylvain Gelly, Yuan Li, Taehee Lee
Assigned to: Google
US Patent 9,098,552
Granted August 4, 2015
Filed: February 5, 2013
Methods, systems, and apparatus for scoring images related to entities. In one aspect, a method includes:
- Identifying images associated with a person, each image being included in one or more resources
- Obtaining, for each resource that includes one of the images, a quality score that represents a quality of the resource; for each of the images
- Generating an image resource quality score from the quality scores of the resources that include the image
- Identifying a set of similar images from the images, each similar image having a measure of similarity to the image that meets a similarity measure threshold
- Generating an image score based on image resource quality scores of the resources that include the similar images relative to image resource quality scores of the resources that include each of the images
- Generating an image authority score based on the image resource quality score and the image score.
The patent is more detailed than what I’ve written about here, and there’s discussion about similar images found on the Web, and how those can support decisions made as to what images to show.
This approach may work best for people where there may be a lot of images of them on the Web.
If you hover over the images shown for a person in a knowledge panel, those are linked to the pages where they come from, if you want to investigate the sources of those pictures in more detail. Note that in the knowledge panel pictures below for Tim Cook and for Larry Page, One of the images for each of those persons come from their company’s websites.