Michael Jordan playing Basketball

How Google May Annotate Images to Improve Search Results

Sharing is caring!

How might Google improve on information from sources such as knowledge bases to help them answer search queries?

That information may be learned from or inferred from sources outside of those knowledge bases when Google may:

  • Analyze and annotate images
  • Consider other data sources

A recent Google patent on this topic defines knowledge bases for us, why those are important, and it points out examples of how Google looks at entities while it may annotate images:

A knowledge base is an important repository of structured and unstructured data. The data stored in a knowledge base may include information such as entities, facts about entities, and relationships between entities. This information can be used to assist with or satisfy user search queries processed by a search engine.

Examples of knowledge bases include Google Knowledge Graph and Knowledge Vault, Microsoft Satori Knowledge Base, DBpedia, Yahoo! Knowledge Base, and Wolfram Knowledgebase.

The focus of this patent is upon improving upon information that can be found in knowledge bases:

The data stored in a knowledge base may be enriched or expanded by harvesting information from wide variety of sources. For example, entities and facts may be obtained by crawling text included in Internet web pages. As another example, entities and facts may be collected using machine learning algorithms.

All gathered information may be stored in a knowledge base to enrich the information that is available for processing search queries.

Analyzing Images to Enrich Knowledge Base Information

This approach may annotate images and select object entities contained in those images. It reminded me of a post I recently wrote about Google annotating images, How Google May Map Image Queries

This is an effort to better understand and annotate images, and explore related entities in images, so Google can focus on “relationships between the object entities and attribute entities, and store the relationships in a knowledge base.”

Google can learn from images of real-world objects (a phrase they used for entities when they started the Knowledge Graph in 2012.)

I wrote another post about image search becoming more semantic, in the labels they added to categories in Google image search results. I wrote about those in Google Image Search Labels Becoming More Semantic?

When writing about mapping image queries, I couldn’t help but think about labels helping to organize information in a useful way. I’ve suggested using those labels to better learn about entities when creating content or doing keyword research. Doing image searches and looking at those semantic labels can be worth the effort.

This new patent tells us how Google may assign annotations to images to identify entities contained in the images. While labeling, they may select an object entity from the entities pictured and then choose at least one attribute entity from the annotated images that contain the object entity. They could also infer a relationship between the object entity and the attribute entity or entities and include that relationship in a knowledge base.

In accordance with one exemplary embodiment, a computer-implemented method is provided for enriching a knowledge base for search queries. The method includes assigning annotations to images stored in a database. The annotations may identify entities contained in the images. An object entity among the entities may be selected based on the annotations. At least one attribute entity may be determined using the annotated images containing the object entity. A relationship between the object entity and the at least one attribute entity may be inferred and stored in a knowledge base.

For example, when I search for my hometown, Carlsbad in Google image search, one of the category labels is for Legoland, which is an amusement park located in Carlsbad, California. Showing that as a label tells us that Legoland is located in Carlsbad (the captions for the pictures of Legoland tell us that it is located in Carlsbad.)

Carlsbad-Legoland-Attribute Entity

This patent can be found at:

Computerized systems and methods for enriching a knowledge base for search queries
Inventors: Ran El Manor and Yaniv Leviathan
Assignee: Google LLC
US Patent: 10,534,810
Granted: January 14, 2020
Filed: February 29, 2016

Abstract

Systems and methods are disclosed for enriching a knowledge base for search queries. According to certain embodiments, images are assigned annotations that identify entities contained in the images. An object entity is selected among the entities based on the annotations and at least one attribute entity is determined using annotated images containing the object entity. A relationship between the object entity and the at least one attribute entity is inferred and stored in the knowledge base. In some embodiments, confidence may be calculated for the entities. The confidence scores may be aggregated across a plurality of images to identify an object entity.

Confidence Scores While Labeling of Entities in Images

One of the first phrases to jump out at me when I scanned this patent to decide that I wanted to write about it was the phrase, “confidence scores,” which reminded me of association scores which I wrote about discussing Google trying to extract information about entities and relationships with other entities and confidence scores about the relationships between those entities, and about attributes involving the entities. I mentioned association scores in the post Entity Extractions for Knowledge Graphs at Google, because those scores were described in the patent Computerized systems and methods for extracting and storing information regarding entities.

I also referred to these confidence scores when I wrote about Answering Questions Using Knowledge Graphs, because association scores or confidence scores can lead to better answers to questions about entities in search results, which is an aim of this patent, and how it attempts to analyze and label images and understand the relationships between entities shown in those images.

The patent lays out the purpose it serves when it may analyze and annotate images like this:

Embodiments of the present disclosure provide improved systems and methods for enriching a knowledge base for search queries. The information used to enrich a knowledge base may be learned or inferred from analyzing images and other data sources.

In accordance with some embodiments, object recognition technology is used to annotate images stored in databases or harvested from Internet web pages. The annotations may identify who and/or what is contained in the images.

The disclosed embodiments can learn which annotations are good indicators for facts by aggregating annotations over object entities and facts that are already known to be true. Grouping annotated images by object entity helps identify the top annotations for the object entity.

Top annotations can be selected as attributes for the object entities and relationships can be inferred between the object entities and the attributes.

As used herein, the term “inferring” refers to operations where an entity relationship is inferred from or determined using indirect factors such as image context, known entity relationships, and data stored in a knowledge base to draw an entity relationship conclusion instead of learning the entity-relationship from an explicit statement of the relationship such as in text on an Internet web page.

The inferred relationships may be stored in a knowledge base and subsequently used to assist with or respond to user search queries processed by a search engine.

The patent then tells us about how confidence scores are used, that they calculate confidence scores for annotations assigned to images. Those “confidence scores may reflect the likelihood that an entity identified by an annotation is actually contained in an image.”

If you look back up at the pictures for Legoland above, it may be considered an attribute entity of the Object Entity Carlsbad, because Legoland is located in Carlsbad. The label annotations indicate what the images portray, and infer a relationship between the entities.

Just like an image search for Milan Italy shows a category label for Duomo, a Cathedral located in the City. The Duomo is an attribute entity of the Object Entity of Milan because it is located in Milan Italy.

In those examples, we are inferring from Legoland being included under pictures of Carlsbad that it is an attribute entity of Carlsbad, and that the Duomo is an attribute entity of Milan because it is included in results of a search for Milan.

Milan Duomo Attribute Entity

A search engine may learn from label annotations and because of confidence scores about images because the search engine (or indexing engine thereof) may index:

  • Image annotations
  • Object entities
  • Attribute entities
  • Relationships between object entities and attribute entities
  • Facts learned about object entities

The Illustrations from the patent show us images of a Bear, eating a Fish, to tell us that the Bear is an Object Entity, and the Fish is an Attribute Entity and that Bears eat Fish.

Bear (Object Entity) & Fish (Attribute-Entity)

We are also shown that Bears, as object Entities have other Attribute Entities associated with them, since they will go into the water to hunt fish, and roam around on the grass.

Bears and attribute Entities

Annotations may be detailed and cover objects within photos or images, like the bear eating the fish above. The patent points out a range of entities that might appear in a single image by telling us about a photo from a baseball game:

An annotation may identify an entity contained in an image. An entity may be a person, place, thing, or concept. For example, an image taken at a baseball game may contain entities such as “baseball fan”, “grass”, “baseball player”, “baseball stadium”, etc.

An entity may also be a specific person, place, thing, or concept. For example, the image taken at the baseball game may contain entities such as “Nationals Park” and “Ryan Zimmerman”.

Defining an Object Entity in an Image

The patent provides more insights into what object entities are and how they might be selected:

An object entity may be an entity selected among the entities contained in a plurality of annotated images. Object entities may be used to group images to learn facts about those object entities. In some embodiments, a server may select a plurality of images and assign annotations to those images.

A server may select an object entity based on the entity contained in the greatest number of annotated images as identified by the annotations.

For example, a group of 50 images may be assigned annotations that identify George Washington in 30 of those images. Accordingly, a server may select George Washington as the object entity if 30 out of 50 annotated images is the greatest number for any identified entity.

Confidence scores may also be determined for annotations. Confidence scores are an indication that an entity identified by an annotation is actually contained in an image. It “quantifies a level of confidence in an annotation being accurate.” That confidence score could be calculated by using a template matching algorithm. The annotated image may be compared with a template image.

Defining an Attribute Entity in an Image

An attribute entity may be an entity that is among the entities contained in images that contain the object entity. They are entities other than the object entity.

Annotated images that contain the object entity may be grouped and an attribute entity may be selected based on what entity might be contained in the greatest number of grouped images as identified by the annotations.

So, a group of 30 annotated images containing object entity “George Washington” may also include 20 images that contain “Martha Washington.”

In that case, “Martha Washington,” may be considered an attribute entity

(Of Course, “Martha Washington Could be an object Entity, and “George Washington, appearing in a number of the “Martha Washington” labeled images could be considered the attribute entity.)

Infering Relationships between entities by Analyzing Images

If more than a threshold of images of “Michael Jordon” contains a basketball in his hand, a relationship between “Michael Jordan” and basketball might be made (That Michael Jordan is a basketball player.)

From analyzing images of bears hunting for fish in water, and roaming around on grassy fields, some relationships between bears and fish and water and grass can be made also:

inferences between entities

By analyzing images of Michael Jordan with a basketball in his hand wearing a Chicago Bulls jersey, a search query asking a question such as “What basketball team does Michael Jordan play for?” may be satisfied with the answer “Chicago Bulls”.

To answer a query such as “What team did Michael Jordan play basketball for, Google could perform an image search for “Michael Jordan playing basketball”. Having those images that contain the object entity of interest can allow the images to be analyzed and an answer provided. See the picture at the top of this post, showing Michael Jordan in a Bulls jersey.

Take Aways

This process to collect and annotate images can be done using any images found on the Web, and isn’t limited to images that might be found in places like Wikipedia.

Google can analyze images online in a way that scales on a web-wide basis, and by analyzing images, it may provide insights that a knowledge graph might not, such as to answer the question, “where do Grizzly Bears hunt?” an analysis of photos reveals that they like to hunt near water so that they can eat fish.

The confidence scores in this patent aren’t like the association scores in the other patents about entities that I wrote about, because they are trying to gauge how likely it is that what is in a photo or image is indeed the entity that it might then be labeled with.

The association scores that I wrote about were trying to gauge how likely relationships between entities and attributes might be more likely to be true based upon things such as the reliability and popularity of the sources of that information.

So, Google is trying to learn about real-world objects (entities) by analyzing pictures of those entities (ones that it has confidence in), as an alternative way of learning about the world and the things within it.

Sharing is caring!

14 thoughts on “How Google May Annotate Images to Improve Search Results”

  1. Hi Chris,

    It does seem a little redundant, because it seems like information that Google could learn from places like Wikipedia, but it means that Google can collect information from all the images that it comes across on the Web, and they run an image search, so labeling images means that they have a wider choice of results to show searchers. I also really like the more semantic categories that they show at the top of image search results. I like that they are doing this.

  2. Hi Saijo,

    So Far, this is my favorite patent of 2020 so far. The knowledge that they are learning from images is the same in any language, too. Bears hunt for fish in Rivers – that is what you get when you stop focusing upon matching the words in a search box, with words on a web page, and start focusing upon what real-world objects (entities) are doing.

  3. Hey Bill! First of all, love reading your stuff. Good job on making it easier even for people who aren’t exactly like a fish in the water in the search engine industry. Love the example about grizzly bears, makes things so much easier to understand.

    Do you think this will be done using some sort of AI or will it be a similar story we currently have with Captcha where the actual users “do the work” for Google?

  4. Hi Greg,

    Glad to hear that you are liking my posts. A lot of concepts like this one where Google is working on learning from images that have been posted to the Web is new. The example about grizzly bears is one that I had to include, because when you see a lot of pictures of Grizzly bears hunting, and they all take place with the bears in the middle of rivers, you don’t realize how much they rely upon fish.

    The analysis of pictures like this is a machine learning approach with AI involved. But, Google has shown us that they are using overlapping approaches, and in addition to having computers analyze pictures for the entities in the, have also returned to using the image labeler game to get humans involved. The patent mentions the inventor of CAPTCHAs when taling about the Image Labeler game, because he is the inventor of the ESP game, which inspired the Image Labeler game. Combining both approaches makes a lot of sense. Google has lots of seemingly redundant practices that act as check on each other. So, yes, both AI and Crowdsourcing using humans will be involved in learning from images.

  5. Thanks for sharing this informative article. Today images play more role in any search to attract users, so we give more priority to image optimization.

  6. Hey Bill,

    Your research work is commendable! I always appreciate those articles from which I have learned a lot, your article is one of them.

    Thanks for sharing!

  7. Hey Bill,

    Your research work is commendable! I always appreciate those articles from which I have learned a lot Thanks for sharing!

  8. Thanks Bill,

    This is some very interesting information. I had no clue that google image search was becoming so intricate with entities and how they relate.

  9. Hi Mike,

    They are getting gooe with the labels they are using for images, and the categories those labels indicated do a good job of providing related information about things in the images. They can recognize objects in images, look at metadata in images, and read text from images, too.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.