When Google purchased Neven Vision last August, I expected some interesting things coming out of the combination of Google resources and technical knowledge from the Neven Vision team.
A newly published patent application from Google about a visual mobile search system describes how that system can use mobile phones with cameras in many different ways. The patent filing claims one small part of this system, but also provides a list of other related patent applications.
There are a lot of great examples of how this visual mobile search system could be used, and I cover many of those below.
Image-based Contextual Advertisement Method and Branded Barcodes
Invented by Harmut Neven
US Patent Application 20070159522
Published July 12, 2007
Filed: December 7, 2006
Abstract
Content media having images associated with remotely stored information are provided with barcodes marked with indicia to indicate a source of the information. In this manner, a user, having, for example, a camera phone, will become aware that the particular content medium has images that can be scanned to retrieve additional information (from the remote information store) via their camera phone.
This filing covers taking a picture of a barcode on an object, and being pointed towards a Web page that can provide additional information about the object. The description included in the patent application describes the infrastructure for an image based search service.
That service takes an image from a camera phone, and uses visual recognition engines to recognize objects shown in the image, and return search results based upon that recognition. We’re shown the infrastructure for that image search system, as well as a number of example inventions and commercial applications.
Multiple Recognition Algorithms
A number of different visual recognition engines might be used to match an incoming picture against object representations stored in a database. Some examples:
- Facial recognition – to recognize textured objects.
- Optical character recognizers – to understand text strings in images
- Barcode readers – to interpret barcodes on objects
In response to the image search, a number of different types of results could be sent back to the searcher from a media server, such as text, images, music or audio clips, or a URL that can be viewed on the phone using an inbuilt web browser.
This combination of different recognition algorithms is described in the patent application:
Years of experience in machine vision have shown that it is very difficult to design a recognition engine that is equally well suited for diverse recognition tasks. For instance, engines exist that are well suited to recognize well textured rigid objects. Other engines are useful to recognize deformable objects such as faces or articulate objects such as persons. Yet other engines are well suited for optical character recognition.
To implement an effective vision-based search engine it will be important to combine multiple algorithms in one recognition engine or alternatively install multiple specialized recognition engines that analyze the query images with respect to different objects.
Filtering Objects Being Compared
This would be a slow process if images sent to be searched upon were compared to all images in a database. To make object recognition work quickly and effectively, limiting those comparisons is helpful. Some additional information may be sent to the search engine when a search like this is conducted, such as:
- Time,
- Location of the handset,
- User profile,
- Recent phone transactions, and;
- Other sources of external image information through additional inputs provided by the user.
This example from the patent filing shows a couple of the benefits of this approach
It may be beneficial to make use of this information to narrow down the search. For instance, if one attempts to get information about a hotel by taking a picture of its facade and knows it is 10 pm in the evening than it will increase the likelihood of correct recognition if one selects from the available images those that have been taken close to 10 pm. The main reason is that the illumination conditions are likely to be more similar.
Location information may also be used. Staying with the hotel example, one would arrange the search process such that only object representations of hotels are activated in the query of hotels that are close to the current location of the user.
So, searching for objects that may be closer in time and space may make sense in some instances.
Algorithmic Principles
There’s some discussion of how the algorithms work in the object recognition engine, involving things such as:
- Extraction of feature vectors from key interest points,
- Comparison of corresponding feature vectors,
- Similarity measurement and comparison against a threshold to determine if the objects are identical or not,
- Recognition of objects from multiple viewpoints,
- Coarse-to-fine simple to complex search strategy,
- Color histograms and texture descriptors.
If you want to delve into those in more detail, the patent application provides more information and some citations. While a look at the algorithms is informative, the many examples included in the patent application are really worth spending some time thinking about.
Example of Usefulness of Image-Based Search
Great example in the document on how image-based mobile search could be used:
Let us start the discussion of the usefulness of image-based search with an anecdote. Imagine you are on travel in Paris and you visit a museum. If a picture catches your attention you can simply take a photo and send it to the VMS service. Within seconds you will receive an audio-visual narrative explaining the image to you. If you happen to be connected a 3G network the response time would be below a second.
After the museum visit you might step outside and see a coffeehouse. Just taking another snapshot from within the VMS client application is all you have to do in order to retrieve travel guide information. In this case location information is available through triangulation or inbuilt GPS it can assist the recognition process.
Inside the coffeehouse you study the menu but your French happens to be a bit rusty. Your image based search engine supports you in translating words from the menu so that you have at least an idea of what you can order.
This anecdote could of course easily be extended further. Taking a more abstract viewpoint one can say that image-based search hyperlinks the physical world in that any recognizable object, text string, logo, face, etc. can be annotated with multimedia information.
The “image-based intelligent museum guide” example is built upon in more detail (I’d love to see this in action as described in the patent filing). Here are characteristics of such a system using visual image search:
1) Users can interactively perform queries about different aspects of an artwork, such as: “Who is this person in the cloud?”.
2) Visitors can keep a log of the information that they asked about the artworks and cross-reference them;
3) Visitors can share their gathered information with their friends;
4) Developing an integrated global museum guide is possible;
5) No extra hardware is necessary as many visitors carry cell-phones with inbuilt camera; and,
6) The service can be a source of additional income where applicable.
A similar approach could be applied to other objects that might be interesting to tourists, such as landmarks, hotels, restaurants, etc.
Other Examples
A nice list of many other examples of how visual mobile search could be effectively used:
1) Translations – of printed documents through optical character recognition.
2) New print-to-Internet applications – for example, take a picture of a movie ad in a newspaper or on a billboard, and find out with a click which movie theaters it is playing at.
3) Conducting retail transactions – take a picture of a product, and send it to a server that recognizes it, and will associate the input with the user. That person could be entered into a sweepstake, or receive a coupon or offer for a rebate, or guided to an information page or ordering page.
4) Annotations of printed pages – can allow the picture taker to receive additional, real-time information about the text.
5) Ad to phone number feature – take a picture of an add, and a phone number might become available to contact the advertiser (or email, or SMS, or Web addresses).
6) Digital Billboards – take a picture of a digital billboard or large television screen, and get more information about the ad:
A user may take of picture of the billboard and the displayed advertisement to get additional information about the advertised product, enter a contest, etc. The effectiveness of the advertisement can be measured in real time by counting the number of “clicks” the advertisement generates from camera phone users. The content of the advertisement may be adjusted to increase its effectiveness based on the click rate.
The billboard may provide time sensitive advertisements that are target to passing camera phone users such as factory workers arriving leaving work, parents picking up kids from school, or the like. The real-time click rate of the targeted billboard advertisements may confirm or refute assumptions used to generate the targeted advertisement.
7) Payment systems – imagine going shopping, and bringing your camera phone with you. Take a picture of the barcode or label on a product, and a commercial transaction such as credit card payment might take place with a record of the transaction sent to some controller to make certain that payment was made.
8) Educational Games – providing a way for children to learn things by having them take pictures of numbers or letters or countries on a map or parts of the body.
Essentially a child could read a picture book just by herself by clicking on the various pictures and listen to audio streams triggered by the outputs of the recognition engine.
9) Accessibility assistance – for people with visual handicaps.
10) New Forms of Games – Scavenger hunt type games, where a picture of the correct object will result in instructions in a task to perform or a new image to capture to continue.
11) Service technician information – Take a picture of a machine part and query about it, and you may be lead to information identifying the par, and a user manual.
12) Meeting real time information needs – example, take a picture of a bus stop sign, and you could retrieve real-time information on when the next bus will come because the location information available to the phone is often accurate enough to decide which bus stand you are closest to.
13) Virtual post it notes – Take a picture and annotate it, for later retrieval and use.
A newly published patent application from Google about a visual mobile search system describes how that system can use mobile phones with cameras in many different ways. The patent filing claims one small part of this system, but also provides a list of other related patent applications.
As opposed to the Neomedia Technoloiges IP that already covers this type of ‘one click’ technology to the physical world?
Something sounds fishy, IMO. Why not just buy the technology if it is already there?????
Google kind of blew it with Neven Vision.
But there is still time! I think. HP is working with Neomedia and Gavitec on a Universal mobile reader for all codes. Could Google be smart enough to swipe it out from under them?
Qualcomm had them under the BREW umbrella at the CTIA show in 2007. Neomedia was the only U.S. mobile barcode scanning application displayed.
What is MSFT going to do if they get this????? Could they give Google a run for the money if they own one click to content and sell it to the brands??????
A lot of questions who has the answers???? Anyone???????
Hi Swampthing,
Appreciate your comments, and Neomeida Technologies does seem to have some pretty interesting patents in this area, including this one:
Automatic access of internet content with a camera-enabled cell phone
What’s the difference in these barcode approaches? I’m not certain that there is a lot of difference. That may be a problem when it comes to the patent examiner reviewing the patent application
Unfortunately, Swampthing, I felt that I had to remove the last sentence of your comment, which was addressed to someone who linked here, but didn’t post or comment here.
I know Google is one of the big boys and has billions of dollars backing them up, but they had better start lining up their defensive law team, because this “new” visual mobile search system is a blatant patent infringement and attempt to steal or incorporate an already existing technology. And, oh yes, this ten-year old technology is already protected by patents and it is owned by a small Ft. Pierce, FLorida company called NEOMEDIA. It is also referred to as “Qode” and it is a simple way to read barcodes on products, rfid’s, or company logos, and be directly connected (via your cell phone) to the URL or website page connected with that product. The QODE system is already installed on forty or fifty of the most commeonly used handsets in the world, and is being used by more an more people catch on to the advantages of its use.
GOOGLE-you should be ashamed of yourself to tout a stolen product/process as your own and think you can get away with it. Consumers beware, and please, check out the legitimate physical world connection technology (system)- QODE by NEOMEDIA. They have a website (www.neom.com)
that explains it all.
Thanks, John.
The visual search system in the filing’s description goes beyond reading barcodes and associating them with web pages or URLs, in a number of significant ways.
This particular patent application makes claims for a barcode aspect of that system, which seems to be distinquished by the use of specifically branded barcodes, and could differ in a number of other ways. I don’t know if those differences are ones that make too much of a difference, but the filing of a patent application itself isn’t necessarily patent infringement.
For example, what if instead of bringing someone using the barcode technology to a specific product page, this system instead brings a person to Google search results for the product. Is that difference a significant one?
Here’s a snippet from the patent application that describes that use:
William,
You are right for removing. I was having second thoughts after I hit the send button. Thanks.
If a mobile web user / consumer wants to use the mobile device to find information they are going to move toward the easiest way to find it. They are not going to want pop up adds, other images, click thru adds, or other usless info. They are not going to want to watse their time wading thru clicks. There is not free Wi-Fi and seemless mobile web plan, yet.
They are going to want to click on the logo, trademark, keyword, slogan, billboard, 1D,2D, UPC, RIFD, etc. Google should review the IP agian.
There the consumer can find schedules, buy tickets, get a recall sent by the brand to their mobile, have a message sent to the consumer when the product they want goes on sale, price comparison, location to the nearest resturant, get mobile coupons, get further information, etc. The possibilities are endless. This is just a scratch on the surface. (Oh, speaking of Surface, what a useless product developed by Microsoft, IMO.)
THIS IS THE START OF THE INTERNET OF THINGS.
Better yet why doesn’t Google grasp the idea and own it for themselves.
They already have the largest PC database.
Doesn’t it make sense in today’s word to become the largest data base to the internet of things.
It is not just about barcodes?
This is about a “one click” application to anything.
A Universal Reader/qode seeking to become the world’s leading Active Mobile and Optically Initiated transaction provider for the mobile device.
Nokia, Deutsche Telekom, Telefónica O2 Europe, and KPN have all recently joined the Hewlett Packard backed Mobile Codes Consortium.
http://www.mobilecodes.org/
http://mobilecodes.org/JuneAnnouncement.pdf
20+ parties attended the inaugural Mobile Codes Consortium meeting in London on February 27, 2007 consisting of carriers, handset manufacturers, and technology innovators such as Nokia, Sony Ericsson, Vodafone, France Telecom, Telefonica O2, Deutsche Telekom, Hutchison Whampoa, Telecom Italia, and Cingular just to name a few. With such predominate backing from these 20+ deeply involved parties, MC2 aims to set the global standards for mobile smartcode dissemination
The soon to be released Universal Code Reader will be able to read and decipher the industry standard 1D UPC/EAN and 2D (Datamatrix, QR, Aztec, and Maxi) codes.
Thanks, Swampthing,
Agreed completely. But, let’s take that to the next logical step. Why should someone have to look for the barcode and include it in a photo? Why not just take a picture of the object itself?
For a product, I won’t necessarily want to go to the homepage of the product manufacturer. I may want to see who else sells the product, for what prices, especially if I’m standing in the aisle of a store, trying to decide if I’m getting a good deal on something. In that case, those ads may be relevant for my situation, as well as other sites that offer the product.
Thanks also, streetstylz
Appreciate the addition information and context that your links have to offer, and the information about the mobile codes consortium.
William,
I agree start image recoginition.
But either way you slice it the, Automatic access of internet content with a camera-enabled cell phone, IMO, has been covered.
If I am standing in Best Buy and I click on the objects name, lets just say Motorola Razr phone, just for an idea and this discussion. I should be able to link to the type of phone to find further information, or where I can get one, or find a price comparison. If I want to know more about Motorola I can click on the name and get content.
We share the same ideas.
My point is why hasn’t Google seen this and started incorporating something like the ‘qode’ platform into their mobile ideas.
Everyone keeps talking about it and tap dancing around it.
Is it the carriers holding it up mass adoption of a “one click” application?
Why hasn’t Google developed a mobile phone to do just that? Surf the mobile web and internet of things and avoid the carriers. Why doesn’t Google develop a way around the carriers by bouncing seemingless internet so that the consumer never loses internet connection no matter where they are?
Is this to much for them to handle as big of a company as they are?
Again, to many questions. Who has the answers?
Where is this mobile industry and the internet of things?
Who holds the key?
Better yet? Let’s take it one step further.
There are over 5 -10 trillion barcodes produced anually.
These are already in use on every product in the world.
I understand the next question, what do you do if is out of the package. Say the name of the product in the mobile browser.
Does Neomedia Cover this also. I am pretty sure but I am not a patent expert.
Why just link to a photo when there is so much more?
Google ranks .mobi sites separately than sites built for standard desktops and laptops, so when a query is sent from a mobile phone, the serps/results will be drastically different than if sent from a regular computer/browser
Hi San Diego Web Designer,
That’s a good point.
Something else to consider. One of the patent filings from Google published last year described how the search engine might blend together web search results and mobile search results:
Blending Mobile Search Results
In blending these results, resources that are focused upon finding resources that are ideal for mobile devices as well as web pages that are good resources for the queries submitted.
Right now the company I work for utilizes Facial recognition, Optical character recognizers & Barcode readers. The problem wit using these items for search is the open nature of the system. It is fine when it has a huge database and tries to figure out one object. But when you open up that object and possibly make it something outside of the database it has, well, let say it doesn’t work. The database really is the key for all these technologies and search is primarily about expanding the database in ways you couldn’t expect. I just can’t see it working.
Hi Sean,
A very good point. I recently wrote a post about a new patent filing from Google that attempts to get people involved in playing a image treasure hunt game to identify objects within images. Google has another game where they try to get people to label images. It’s possible that they are turning to games like this to try to get humans to solve some problems that computers have difficulties with.