How Google May Decide Upon knowledge Graph Images
Google has been showing Knowledge panels in response to queries when Google recognizes an entity within a query. Google has collected enough information about that entity for it to display a knowledge panel about the entity. I’ve written about these knowledge panels before in How Google Decides What To Know In Knowledge Graph Results, and in Google’s Knowledge Cards.
I mentioned knowledge graph images in those, but not how entity images represent those entities those panels are about, especially when the entities are people.
A patent granted to Google earlier this month describes how entity images might get selected as high authority images to represent people returned in knowledge panels in response to a query.
There may be other issues determining what image gets selected, but this patent lays out a nice framework for determining what entity images to show. When you’re talking about someone like Thomas Edison or Benjamin Franklin, or George Washington, there are potentially many images that could get shown for each of those; so how does Google decide what to show?
Keep in mind that when you do an image search for a person, such as Thomas Edison, the image returned from that query appears on a web page. The web page may become referred to as a “landing page.” This web page may have a quality score associated with it that could get used to rank the image from that page, that compares web pages where images have gotten located for that person, and compares them to each other.
Images themselves could also get scored based upon the quality of the image.
The combination of image scores and quality scores for web pages that contain entity images might get used to generate an image authority score.
The images may then become ranked based upon these image authority scores. The highest-ranked images may be the ones displayed to a searcher as knowledge graph images.
We have gotten told that quality scores for web pages may get determined independently of the content of those pages.
Click Logs and Query Logs may get used to identifying entity images, with images that have gotten clicked upon a lot in an image search for an entity possibly scoring higher than other images for that entity. The quality score for a page that an image gets found upon could get based in part upon the number and quality of links pointing to that page.
Images for a person may get selected in part using facial recognition software. Creating a score based upon a confidence level that the person displayed is like other person images. A “portrait score” may be part of that image score, making sure that the image shown contains matching features to other images that have gotten determined to be similar (eyes, a nose, a mouth, ears, and other features that may show a face).
Advantages of the method in the Knowledge Graph Images Patent
- Authoritative or high-quality images, may become identified based on many high-quality resources. Interestingly, the fact that there are similar images also from high-quality pages
- A Comparison of image resource quality scores for similar images to image resource quality scores for dissimilar images provides a relative measure of image quality that can become used to select images with a high degree of authority about an entity relative to the authority of other images for the same entity
- Images that have a relatively high authority for an entity are more likely to meet a user’s informational need than images with a relatively low authority for the entity
- Images with high image scores for an entity are likely to be good choices (as in visually representative, clear, and distinguishable from other images related to that entity).
The knowledge graph images patent is:
Scoring images related to entities
Invented by: Adam Hartwig, Sylvain Gelly, Yuan Li, Taehee Lee
Assigned to: Google
US Patent 9,098,552
Granted August 4, 2015
Filed: February 5, 2013
Abstract
Methods, systems, and apparatus for scoring images related to entities. In one aspect, a method includes:
- Identifying images associated with a person, each image gets included in one or more resources
- Obtaining, for each resource that includes one of the images, a quality score that represents a quality of the resource; for each of the images
- Generating an image resource quality score from the quality scores of the resources that include the image
- Identifying a set of similar images from the images, each similar image having a measure of similarity to the image that meets a similarity measure threshold
- Generating an image score based on image resource quality scores of the resources that include the similar images relative to image resource quality scores of the resources that include each of the images
- Generating an image authority score based on the image resource quality score and the image scores.
Knowledge Graph Images Take Aways
The patent is more detailed than what I’ve written about here. There’s a discussion about similar images found on the Web, and how those can support decisions made about what images to show.
This approach may work best for people where there may be many images of them on the Web.
If you hover over the images shown for a person in a knowledge panel, those get linked to the pages where they come from. Suppose you want to investigate the sources of those pictures in more detail. Note that in the knowledge graph images below for Tim Cook and Larry Page, one of those persons’ images come from their company’s websites.
Google makes big (semantic) things but then it has big bug like this:
https://www.facebook.com/photo.php?fbid=10154083957016258&set=a.10151282764301258.553833.684036257&type=1
trattore = tractor
trattori = tractors
trattoria = restaurant (not tractor farm!)
Nice share, is there any mention of linked data? I thought u just add linked data and then Google chooses the image based on a trust score of the connections.
Hi bhd,
The patent doesn’t say anything about Linked Data, but that is one way you can use to identify an entity that is associated with a page, or an image that is associated with a person, and that could play a role in the identification of an image for person, too.
Hi Merlinox,
Thanks for pointing that out. Google is going to make mistakes from time to time. Hopefully, they can improve upon those and keep getting better and better. They’ve shown an ability to do that. Hopefully they can with the Semantic Web, too.
Thanks Bill. Unfortunaly Google has big difficulties with italian language. It’s not the first time it did that.
Hi Merlinox,
Sorry to hear that. It would be nice to see problems like that one go away.
Thanks Bill. Unfortunaly italian is an “unregular” language. Too rulez, too exceptions, too “common saying” used to natural language.
Some weeks ago Google confused Milano (city in northen Italy) with Milano Marittima (sea location at center of Italy)… 😉
Hi Merlinox,
Google has overcome a lot of issues over the years. Hopefully, they will get better at understanding Italian,
Excellent post, Bill! Even as Google Image Search is doing a wonderful job already, it’s still not free from glitches as discovered and shared by various users on social media from time to time. Identifying and filtering data based on entity are based on complex dynamics. I’m sure Google will improve the results soon.
I really enjoyed reading your blog, you have lots of great content. I look forward to reading more posts from you.
Hi Susanta
Thank you. This post isn’t about Google Image Search, though that plays a role in which images show up in Knowledge Panels at Google. Google’s image search has grown better by leaps and bounds in the past few years. We’ll see where it goes from here. This article from the end of last year shows some of image searches’ growth:
http://googleresearch.blogspot.com/2014/11/a-picture-is-worth-thousand-coherent.html
Hi Marissa,
Thank you.
Not sure how all of this will work out if its concerning many people with the same name. If you have persons with a unique name its easy to collect images about them and compare them and know its this person. But if you have many people with the same name, even from the same states, I think this will fail. Nevertheless, very good information to know. Thank you.
Hi Johnny,
Thanks. Part of these algorithm involves looking for similar images, which would be different for people that might share the same name. So the point of failure you point out should be avoided. Especially if facial recognition software is used, as is also mentioned in the patent. So the concern you had is one that the inventors of the patent had as well, and it looks like they planned for the possibility of similar names for the same entities.