contextually disambiguating queries

Disambiguating Image Queries at Google

Sharing is caring!

Better Understanding Image Queries

Years ago, I wouldn’t have expected a search engine to tell a searcher about objects in a photograph or video. Search engines have been evolving and getting better at what they do

In February, Google was granted a patent returning image results identifying objects in photographs and videos. A search engine may have trouble understanding a natural language query. This patent focuses on disambiguating image queries.

The patent provides the following example:

For example, a user may ask a question about a photograph that the user is viewing on the computing device, such as “What is this?”

The patent tells us that it may work for image queries, with text, or video queries, or any combination of those.

In response to a searcher asking to identify image queries, a computing device may:

  • Capture a respective image that the user is viewing
  • Transcribe the question
  • Transmit that transcription and the image to a server

The server may receive the transcription and the image from the computing device, and:

  • Identify visual and textual content in the image
  • Generate labels for images in the image, such as locations, entities, names, types of animals, etc.
  • Identify a particular sub-image in the image, which may be a photograph or drawing

The Server may:

  • Identify part of a sub-image of primary interest to a searcher, such as a historical landmark in the image
  • Perform image recognition on the sub-image to generate labels for that sub-image
  • Generate labels for text in the image, likes comments about the sub-image, using text recognition on a part of the image other than the sub-image
  • Generate a search query based on the transcription and the generated labels
  • That query may ben be provided to a search engine

The Process Behind Disambiguating a Visual Query

The process described in this patent includes:

  • Receiving an image presented on, or corresponding to, at least a part of a display of a computing device
  • Receiving a transcription of an utterance spoken by a searcher, when presenting the image
  • Identifying a sub-image included in the image, and based on performing image recognition on the sub-image
  • Determining one or more first labels that show a context of the particular sub-image
  • Performing text recognition on a part of the image other than the particular sub-image
  • Determining one or more second labels showing the context of the sub-image, based on the transcription, the first labels, and the second labels
  • Generating a search query
  • Providing, for output, the search query

query Images

Other Aspects of performing such image queries searches may involve:

  • Weighting the first label differently than a second label: the search query may substitute the first labels or the second labels based upon terms in the transcription
  • Generating, for each of the first labels and the second labels, a label confidence score indicating a likelihood that the label corresponds to a part of the sub-image that is of primary interest to the user
  • Selecting one or more of the first labels and second labels based on the respective label confidence scores, wherein the search query is based on the one or more selected first labels and second labels
  • Accessing historical query data including previous search queries provided by other users
  • Generating, based on the transcription, the first labels, and the second labels, one or more candidate search queries
  • Comparing the historical query data to the one or more candidate search queries
  • Selecting a search query from among the one or more candidate search queries, based on comparing the historical query data to the one or more candidate search queries

The method may also include:

  • Generating, based on the transcription, the first labels, and the second labels, one or more candidate search queries
  • Determining, for each of the one or more candidate search queries, a query confidence score that indicates a likelihood that the candidate search query is an accurate rewrite of the transcription
  • Selecting, based on the query confidence scores, a candidate search query as the search query
  • Identifying one or more images included in the image
  • Generating for each of the one or more images included in the image, an image confidence score that indicates a likelihood that an image is an image of primary interest to the user
  • Selecting the sub-image, based on the image confidence scores for the one or more images
  • Receiving data indicating a selection of a control event at the computing device, wherein the control event identifies the sub-image. (The computing device may capture the image and capture audio data that corresponds to the utterance in response to detecting a predefined hotword.)

Further, the method may also include:

  • Receiving an additional image of the computing device and an additional transcription of an additional utterance spoken by a user of the computing device
  • Identifying an additional sub-image that is included in the additional image, based on performing image recognition on the additional sub-image
  • Determining one or more additional first labels that indicate a context of the additional sub-image, based on performing text recognition on a portion of the additional image other than the additional sub-image Determining one or more additional second labels that indicate the context of the additional sub-image, based on the additional transcription, the additional first labels, and the additional second labels
  • Generating a command, and performing the command

Performing the command can include:

  • Storing the additional image in memory
  • Storing the sub-image in the memory
  • Uploading the additional image to a server
  • Uploading the sub-image to the server
  • Importing the additional image to an application of the computing device
  • Importing the sub-image to the application of the computing device
  • Identifying metadata associated with the sub-image, wherein determining the one or more first labels that indicate the context of the sub-image based further on the metadata associated with the sub-image

Advantages of following the image query process described in the patent can include:

  • The methods can determine the context of an image corresponding to a portion of a display of a computing device to aid in the processing of natural language queries
  • The context of the image may be determined through image and/or text recognition
  • The context of the image may be used to rewrite a transcription of an utterance of a user
  • The methods may generate labels that refer to the context of the image, and substitute the labels for portions of the transcription, such as “Where was this taken?”)
  • The methods may determine that the user is referring to the photo on the screen of the computing device
  • The methods can extract information about the photo to determine the context of the photo, as well as a context of other portions of the image that do not include the photo, such as a location that the photo was taken

This patent can be found at:

Contextually disambiguating queries
Inventors: Ibrahim Badr, Nils Grimsmo, Gokhan H. Bakir, Kamil Anikiej, Aayush Kumar, and Viacheslav Kuznetsov
Assignee: Google LLC
US Patent: 10,565,256
Granted: February 18, 2020
Filed: March 20, 2017

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for contextually disambiguating queries are disclosed. In an aspect, a method includes receiving an image being presented on a display of a computing device and a transcription of an utterance spoken by a user of the computing device, identifying a particular sub-image that is included in the image, and based on performing image recognition on the particular sub-image, determining one or more first labels that indicate a context of the particular sub-image. The method also includes, based on performing text recognition on a portion of the image other than the particular sub-image, determining one or more second labels that indicate the context of the particular sub-image, based on the transcription, the first labels, and the second labels, generating a search query, and providing, for output, the search query.

Sharing is caring!

8 thoughts on “Disambiguating Image Queries at Google”

  1. Love the content, keep up the good work!

    You may want to run a spellcheck on this one though. There are a few errors, e.g.: “That query may ben be providee to a search engine”

  2. Thanks, Joe.

    I may rely on Grammarly a little too much. It doesn’t catch everything, like that “providee” which it should have caught.

  3. Very informative, thank you! I recently started my own business and am trying to learn about SEO. Feeling a little overwhelmed with all the info out there but taking baby steps one at a time

  4. Hi,
    I heard about you in Germany (evergreenmedia).
    Patents and Google SEO are very intresting.
    I rellly like your writing stil. I’m looking foward to read your next article.
    Best Regards from Germany,
    Bennet Arp

  5. Thanks for the great article, Bill. It might be also helpful to share some insights about how to utilize the image queries to help rank a website. SEO sometimes can be so complicated.

  6. Hi Raymond,

    When you might post an image, take a look at it, and decide if it might suggest some particular questions. For a photo of a skyscraper, someone might wonder how tall that skyscraper might be. Include that height in the content of the page. If there are some well-known tenants of that building, make sure that they are identified on the page. People might want to know where the skyscraper may be located, so including information about the address, and what else might be nearby would be good ideas.

  7. Hi Bill,

    Thanks for sharing the new patent and some of your insights. It keeps us up-to-date!

    I was really impressed that you’ve been doing internet marketing since 1996 and you have a JD degree. I’ll bookmark this website and follow your new posts.

    Thank you!
    Alvin, Vaughan Roofer

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.