The Future of Google’s Visual Phone Search?

Google Goggles lets you search by taking a picture of landmarks, books, business cards, artwork, product labels, logos, and text. It can use Optical Character Recognition to transform text in an image to searchable text on the Web, reads barcodes, finds similar images in databases of artwork and landmarks and other databases. But, we’re only seeing the surface of the capabilities that a phone based visual search can offer with Google Goggles.

A Google patent application published this week shows us what Google’s visual Search for phones might evolve into. When you take a picture of a city street, your picture may include buildings, street signs, people’s faces, cars, and many other objects. If you send that picture as a query, the search engine might break the image into parts and search for many of the objects in the image, and give you a mix of search results based upon all of those parts.

The patent filing is:

User Interface for Presenting Search Results for Multiple Regions of a Visual Query
Invented by David Petrou and Theodore Power
US Patent Application 20110035406
Published February 10, 2011
Filed: August 4, 2010

Abstract

A visual query such as a photograph, screen shot, scanned image, or video frame is submitted to a visual query search system from a client system. The search system processes the visual query by sending it to a plurality of parallel search systems, each implementing a distinct visual query search process.

A plurality of results is received from the parallel search systems. Utilizing the search results, an interactive results document is created and sent to the client system. The interactive results document has at least one visual identifier for a sub-portion of the visual query with a selectable link to at least one search result for that sub-portion.

The visual identifier may be a bounding box around the respective sub-portion, or a semi-transparent label over the respective sub-portion. Optionally, the bounding box or label is color coded by type of result.

The different kinds of searches that might be performed simultaneously could include:

  • A facial recognition search
  • An OCR search for text in the image
  • An image-to-terms search system, which may use object recognition
  • A product recognition search, which could recognize two dimensional images such as book covers and CDs,and three dimensional images such as furniture
  • A bar code recognition search
  • A named entity recognition search, which could provide information about specific people, places, and things
  • A landmark recognition search, recognizing actual landmarks and possibly images advertised on billboards
  • A place recognition search that might be aided by geo-location information provided by something like a GPS receiver
  • A color recognition search, and
  • A similar image search, which looks for images similar to the one that you’ve used as a query

Results that could be returned from a visual search could include links to web pages, product search results, images, videos, Google Map results and place pages and streetview scenes, and many more.

A person searching could isolate certain parts of a picture to search upon, and even annotate those selections before searching.

The patent goes into considerable detail on how this system might work together, but the images from the patent provide a great illustration of how this system would work.

Someone takes a picture of a sports drink box, and the image includes some text describing the drink, a celebrity drinking out of a bottle, a larger image of the bottle, and a logo for the product.

A screenshot from the patent showing an image of a box of a sports drink.

The visual search system breaks the box down into different segments to search upon:

The same image as above, but with diagonal lines indicating different sections of the box hat might be searched upon

Results appear on the image itself as text links, facial and object recognition links, product links, and trademark links.

The same image, but with different sections with labels such as trademark search, product search, facial recognition link, image recognition link, and text link.

A mix of the different types of results could be shown to a searcher, broken down into categories such as product match, logo match, facial recognition match, image match, and web results. Interesting that in the section of Google’s visual search under thefacial recognition match, one of the results shown is a “Social Network Friend” presumably matched up with a profile picture or avatar.

A view of the different types of results that might be returned in response to the use of the image as a query, including products, logos, facial recognition, product and advertising matches, and web results.

If you’ve used the Google Goggle’s search, you have an idea of what Google’s Visual Phone search can do. This patent filing shows how the individual types of searches available there might be joined together.

Will we start seeing search consultants being hired in the future for the design of product boxes, billboards, and other outdoor signs?

Share

27 thoughts on “The Future of Google’s Visual Phone Search?”

  1. Pictures contain more information than text. Perhaps too, they are in context so the next question is how to take it fairly in account, and finally : Do we add keywords on product boxes / design. Product searchable design. But for me the next big thing is to come : If visual search works, we can apply it to video captures : movies, videoclip, TV series. welcome to Pervasive Search.

  2. This is awesome! I wonder how accurate it could be though. It could be a new frontier for business application.

  3. I think that this is a great idea, but it has a long way to go. I have played around with Goggle’s on my Android phone and have not gotten the best results to say the least. I can take a picture of something very obvious and the program returns results that have completely nothing to do with my request.

    The idea of pictures used like this kind of reminds me of voice recognition software – something that eventually will find it’s way and be the norm, only it has a long way to go before it works the way that it is suppossed to.

  4. “The different kinds of searches that might be performed simultaneously could include:

    A facial recognition search”

    The better the Google algorithm gets the more we move towards a stalkers paradise. Can you imagine widespread and accurate facial recognition searches impact on the already growing stalking phenomenon?

  5. Hey Bill,

    I completely agree, we haven’t seen the full capabilities of what a phone based visual search could provide with Google Goggles. I can definitely see consultants being hired in the future for the design of product boxes and other outdoor signs.

    When this search method becomes popular as I know it will, I think all businesses will have to pay more attention to the signs that represent their company or product. Thanks for sharing, I learned a lot from reading the patents.

  6. I have to agree with Matthew Hamilton here. This kind of facial recognition, especially if linked to something like the photo galleries of something like facebook, would be really creepy. Though I’m not as worried about in the creepy stalker sense so much as the Big Brother sense.

    Regardless of the negative possibilities, still a VERY interesting concept, and outside of the facial recognition, it would be really cool to see where this goes as far as text recognition and object and place recognition.

  7. Hi Ben,

    I know that Google acquired at least one patent involving facial recognition technology when they purchased Neven Engineering back in 2006. Wondered if they were ever going to do something with that kind of technology. It looks like they might.

  8. Hi Renaud,

    Good point. Not only might visual search be used effectively in the world around us, but also in previously recorded films and video. That would add a whole different element to YouTube, wouldn’t it?

  9. HI Eric,

    I’m not sure what you are questioning the accuracy of. Google has published a patent on visual search which combines a number of different algorithms that could be used within different regions of images. They’ve purchased technology in the past in the form of patent filings that cover a number of the types of search in question, and they’ve developed a visual search in Google Goggles that uses some of that technology, but which is definitely in it’s early stages.

  10. Hi Onlinehandyman,

    It does seem like this technology still has a long way to go. The major search engines have made great strides when it comes to indexing pages on the Web, but visual search is a whole new frontier, and it’s going to take some work for this technology to work well.

  11. Hi Matthew,

    I’m not sure how much we should be concerned about the inclusion of a facial recognition algorithm in visual search at this point. While the patent doesn’t go into too much detail about how it might be implemented, the images show it in use in recognizing the face of a celebrity on a sports drink box. It’s possible that the search engine might limit facial recognition to well known people/celebrities whose images are already extremely well known.

  12. Hi John,

    Thanks. I do think that visual search has the impact to add to things that designers need to think about when they come up with products and product boxes, billboards and other advertising that hasn’t been designed specifically for the Web. I think it will be interesting when we start seeing companies begin to advertise those kinds of services in the future.

  13. Hi FinallyFast,

    It will be interesting to see where Google takes this, and it will be interesting to see what kinds of reactions people might have to the potential for harm that something like facial recognition searches might have. I suspect that aspect of a visual search may be met by a fair amount of resistence from a large segment of the public.

  14. The idea of pictures used like this kind of reminds me of voice recognition software – something that eventually will find it’s way and be the norm, only it has a long way to go before it works the way that it is suppossed to.

  15. Hi Jankovic,

    I think that’s a good comparison. We’re getting glimpses at a lot of technology that we might take for granted in a few years, and we’re seeing them in their infancy at this point.

  16. I have played around with Goggle’s on my Android phone and have not gotten the best results to say the least. I can take a picture of something very obvious and the program returns results that have completely nothing to do with my request.

    The major search engines have made great strides when it comes to indexing pages on the Web, but visual search is a whole new frontier, and it’s going to take some work for this technology to work well.

    The idea of pictures used like this kind of reminds me of voice recognition software – something that eventually will find it’s way and be the norm, only it has a long way to go before it works the way that it is suppossed to.

  17. Hi Dan,

    Google’s published a number of whitepapers on the kinds of technologies involved in visual search, acquired more than a couple of companies that specialized in object and facial recognition, and has patented a few processes describing those technologies, but I agree completely – we’re still at the very early stages of visual search being a viable technology.

  18. i’m not comfortable with the facial recognition option but the other abilities this has is great.

  19. Hi Scott,

    I’m not sure that a lot of people are ready to see a search engine introduce the ability for searchers to do facial searches. I like the technology that’s hinted at in Google’s presentation in this patent, and I agree that many of the other features are impressive. Guess we wait and see if Google can pull it all together.

  20. We probably have a long way to go before this is usable. We can’t even get voice recognition to work right half of the time. However, I am very excited about the applications for this sort of thing. It’s really awesome. I saw an iPhone app that was being developed that can actually translate words from using your camera. At the time it only did so for Spanish to English, but a very awesome and useful looking app. Especially if you’re a tourist in a city full of important signs you can’t read!

  21. Hi Penny,

    There are a number of different aspects of visual search, and while Google is pretty good with some of them, others are still a work in progress. While voice search is one of those areas that is improving, this type of visual search is just as exciting.

Comments are closed.