Added 6/20/2020 – This image annotation patent application was granted as a patent to Google on November 22. 2011 – Method and apparatus for automatically annotating images
How effectively can a search engine automatically create annotations for images and videos, so that they can be good responses to searcher’s queries? How much of that image annotation can be done without human intervention and review?
A newly published Google patent application explores the topic and comes up with a method of image annotation by comparison to similar images found on the Web, and the text surrounding those similar images.
Method and apparatus for automatically annotating images
Invented by Jay N. Yagnik
US Patent Application 20080021928
Published January 24, 2008
Filed July 24, 2006
Abstract
One embodiment of the present invention provides a system that automatically performs Image annotation. During operation, the system receives the image. Next, the system extracts image features from the image.
The system then identifies other images that have similar image features. The system next obtains text associated with the other images and identifies intersecting keywords in the obtained text. Finally, the system annotates the image with the intersecting keywords.
Problems with Image Annotation
Connections to the Web have become faster and faster, with many higher bandwidth options available to people. This has led to a large increase in the use of pictures and videos on web pages.
Many of those images don’t have accompanying text-based information, such as labels, captions, or titles, that can help describe the content of the images.
Search is predominantly text-based, with keyword searches being the common way for someone to look for something – even pictures. It can be difficult to search for images through a search engine. The creation of annotations for images such as a set of keywords or a caption can make those searches easier for people.
Traditional methods of annotating images tend to be manual, expensive, and labor-intensive.
There have been some other approaches to the automatic annotation of images, like the one described in Formulating Semantic Image Annotation as a Supervised Learning Problem (pdf). While that kind of approach can make it much easier to remove manual efforts in annotation, they still require some human interaction and review.
An Approach to Automate Annotation of Images
The annotation system in a nutshell would go as follows:
- An image is received
- Image features are extracted
- Other Images which have similar features are identified
- Text associated with the other images is obtained
- Keywords from that text is identified
- The image annotation is made with those keywords
A more technical approach might call for:
- Generating color histograms
- Generating orientation histograms
- Using a direct cosine transform (DCT) technique
- Using a principal component analysis (PCA) technique; or,
- Using a Gabor wavelet technique.
Some other variations include:
- Identifying image features in terms of: shapes, colors, and textures
- Identifying the other images by searching through images on the Internet
- Finding other images with similar image features by using probability models
- Expansion of keywords in the obtained text by adding synonyms for the keywords
- Using images from videos
Using this Process with Videos
Videos without titles or descriptions can benefit from the use of the same approach.
They may be partitioned into a “set of representative frames,” and each of those can be processed as images, using the process described above. After those images are annotated with keywords, they can be analyzed to create a set of common annotations for the whole video.
Good idea, especially given the huge amount of accurate labels they have from the google image labeller game.
What I’m interested to find out is how they break down their image analysis – they must have huge variation between different images in such a massive collection.
Thats’ probably why I get so many hits for The Jonas Brothers. Instead of just posting the picture of the band with my daughter, I post the URL of the images (after uploading them on photobucket). The keyword on the hyperlink I use is The Jonas Brothers. That makes it easier for the post to be indexed and for it to be found by major search engines.
The result?
The particular site mentioned above(1 of 22)that is mainly about my 7 kids, autism and everyday life, gets many hits from Jonas Brothers fans. Oh and yes, I found a way to maonetize that traffic.
🙂
I think another interesting question about all of this is:
How can Google tie their image search to geocoding in photographs and then compare the digital image and the place it was taken ( Longitude, Latitude, Time, Direction and Perspective ) using the tons of information already found on Google Local, Google Earth and Google maps, to assess the place, context and “topic” of a particular photograph?
From Wikipedia on Geocoded Images–
If you can start comparing the growing use of geocoded information found in digital images ( video and pictures ) to the available and growing information already found in all of Google’s local, mapping and satellite databases, I think that is a major step forward in having one hell of a powerful image search.
Have you read any mention in any of Google’s patent filings on image searching regarding any of this? I think it has major potential. Thanks.
Geocoded picture does seem like a perfect application for the technology. It could allow Google to pack Google earth with millions of local pictures.
The problem is that many sites tweak their labels to help with the search result. I know someone who labels their picture of the Atlanta skyline with “Atlanta Houses For Rent”. This seems so common that there could be some really bad results.
It would really be to see totally inappropriate search results for the king of search but if anyone can do it, Google can.
Hi Supermom,
Congratulations on doing so well on that search. Making it easier for the search engines to index information like that makes it easier for you to do well. The annotations described here are for when people don’t make it as easy as you do.
Hi Tim,
Thanks. I didn’t go into a lot of detail on the different image analysis processes that Google uses, but they do mention a number of possibilities, which are worth looking at. The labeling game, based upon the ESP game, is something I thought about mentioning here. I think it probably does help them a lot.
I’m sure Google could come up with a user contribution system in that if users tag an image with the appropriate description, they get “Google Points” to be used for Adwords or other Google product purchases.
Some of those user contributions could be validations of other users’ descriptions.