Smaller Screens Means Smarter Image Processing by Search Engines

A recent commentor here asked if image search from the search engines would soon involve indexing text found within images with the use of optical character recognition (OCR) software, which tries to read words that are parts of images.

My answer was that it would be computationally expensive for a search engine to try to do that, so it might be a while before we see it.

So, it’s kind of fun to eat my words and unveil a new Google patent application which describes how the search engine might handle image map navigation when trying to render web pages for small screens while trying to use OCR to read text within those image maps.

Google Image Map Extraction Process

The document not only talks about using OCR software to read image text, but also using face recognition technology, so that it can crop larger “image map” images, and display faces and other interesting parts of pictures from those image maps.

Back in May, Google Blogoscoped reported on an unofficial way of showing human faces in image searches. Is Google also reading text that is part of an image? Are they developing technology for mobile devices that can be used in image search?

System and method for image processing
Invented by Michael F. Lueck
US Patent Application 20070201761
Published August 30, 2007
Filed: September 22, 2005


A computer implemented method of processing an image for display on a mobile communication device includes extracting a portion of an image based on an image map. The image map relates to the portion of the image.

The method also includes generating a document that comprises the extracted portion of the image and transmitting the generated document to a remote device for display. The method may also include assigning a selectable link to the extracted portion of the image and receiving a request from the remote device for an initial document having the image and image map.

Additionally, the method may include storing in a database the generated document and transmitting the stored generated document in response to future requests for the initial document.

Here are some of the methods described in the patent that might be used for taking parts or all of an image from an image map:

Presenting an image from an image map on a phone or PDA might involve:

  • Extracting part of an image based on an image map,
  • Creating a document which includes that extracted portion of the image,
  • Sending that generated document to be displayed on a handheld device, and;
  • Assigning a hyperlink to the extracted portion of the image.

Receiving a request from the phone or PDA could involve:

  • Receiving and understanding information about the display capabilities of the handheld
  • Modifying the dimensions of the extracted portion of the image based on those display capabilities
  • Cropping the extracted portion of the image based on the display capabilities, if necessary.

The inital document might be retrieved from a web server, and the image selected in a number of different ways:

  • Organizing elements in the initial document into a document object model tree and traversing the tree to locate the image map, or;
  • Serially parsing elements in the initial document to locate the image map, or;
  • Generating content of the image map by using a facial recognition algorithm, where the content includes coordinates used to specify the portion of the image for extraction, or;
  • Generating content of the image map by using an optical character recognition algorithm, where the content comprises coordinates used to specify the portion of the image for extraction.

4) Some specialized HTML tags for specific types of handheld devices might be used to display an image map, such as <PDA>, which could include attributes such as “height” and “width” for display on screens of different sizes for different phones or PDAs.

5) Looking at the HTML for the image map, including coordinates (postions) within the images in area elements, like the following, where the linked to documents can be associated with the images that are mapped out:

<MAP name="map1">
 <AREA href="guide.html" 
          alt="Access Guide" 
 <AREA href="search.html" 
 <AREA href="shortcut.html" 
 <AREA href="top10.html" 
          alt="Top Ten" 


This system attempts to associate parts of an image map with links contained within the map, even if it means breaking the image up into parts, cropping some of those parts, and possibly using both facial recognition software and optical character recognition software for text displayed within the image.

Being able to continue to use an image map as part of the navigational system that shows on a handheld may be the best way to provide a good user experience to someone viewing a web page through a phone or PDA.

While this patent application applies to pages which use image maps as navigation, when trying to take a web page and present it to someone on a handheld device, it may point to some useful image processing techniques that could be implimented in other ways in the future.

6 thoughts on “Smaller Screens Means Smarter Image Processing by Search Engines”

  1. Bill, sorry if this swings a little bit off topic but I wanted to comment on Image Annotation and Retrieval.

    In the UCSD piece it is stated by Nuno Vasconcelos “You might finally find all those unlabeled pictures of your kids playing soccer that are on your computer somewhere,”

    That statement screams desktop search, but I think to myself, “you” and who else? An algo that analyzes the images themselves would certainly open up a whole new dimension in information retrieval, along with creating new privacy concerns.

    And such technologies would most certainly be helpful with Detecting Online Commercial Intention. Google is certainly a dominant name in search, but I also like many of the ideas and prototyped predictive tools that come out of adlab.msn. com/ ResearchAudience.aspx (added: link now non-existent)

    If the Live algo takes the direction I predict, it will be a blessing for online merchants, all Live will need is users.

  2. interestingly, when Google purchased Nevenengineering (Neven Vision), an announcement was made on the Official Google Blog that the technology would be useful for Desktop search, too – A better way to organize photos?

    Determining commercial intent may be a beneficiary of better understanding of images, but I think that its usefulness will go beyond that.

  3. Pingback: This Week In SEO - 10/5/07 | TheVanBlog

Comments are closed.