How Do Images Get Ranked in Image Search?

When you perform a search for images at a search engine, do you ever wonder why some pictures show up before others?

A recently published paper from Google, PageRank for Product Image Search (pdf), provides some thoughts on how the actual content of images themselves can be incorporated into how images are ranked for terms at Google.

A patent application published last week from Microsoft provides another look at the ranking of images in image search, and some of the things that might be considered when ranking images.

It’s not quite as revolutionary an approach as the one suggested in the Google paper, but there may be some surprises in how images might be ranked. The patent filing is:

Ranking Images for Web Image Retrieval

Invented by Hugh J. Williams, Nick Craswell, Nicholas A. Whyte, Julie H. Farago, James E. Walsh, and Carsten Rother
Assigned to Microsoft
US Patent Application 20080097981
Published April 24, 2008
Filed: October 20, 2006

Abstract

A system, method, and computer-readable media are disclosed for providing images in a ranked order. The system can include an aggregation component for aggregating a plurality of images with corresponding text.

Additionally, the system can include a name detector a name detector for detecting names within a search query. Moreover, the system can include a ranking component for ranking the aggregated images based on whether the name detector detects a name.

Image search technology at the major search engines does mostly rely upon searches where images are associated with keywords rather than for specific details about the images themselves (such as file size, file type, resolution, etc). Images are indexed by the URL where they appear, and the text associated with the page at that URL.

Sometimes the text on those pages aren’t very relevant to the image presented on those pages. A way of ranking images using a mix of ranking factors might make it more likely that the image presented to people searching for them are related to the query they used during their search.

The authors of the patent filing tell us that the factors they list in the document are an illustration of what they might look at when ranking images, and that these signals may be modified or added to in the final process.

Associating Text and Names with Images

Deciding what images are associated with which queries first depends upon a search engine associating images with keywords that might be used as search queries.

A web crawling program travels through the web and aggregates images and text that appears on the same pages as those images. It might take all of the text from those pages and store it in a database, or text that is only a certain distance away from the pictures.

It might also look for text that is associated with an image, but is found on a different pages (perhaps links to the picture, and possibly text associated with those links).

Ranking factors are then used to determine the relevance of a picture to the query, and the order that these associated images are presented to a searcher.

A name detection program might recognize a query as a person’s name, and might trigger the use of a face detector program, to show people in response to queries that use people’s names.

Some ranking factors that can be employed when ranking images

Number of Websites that Contain an Identical Image

Images that appear on more than one web site might be more relevant for a query term than images that only show up on one web site, or they could be considered less relevant.

The reasoning behind this isn’t described, but maybe the text associated with each showing of the image is compared, and if it is similar from one to another it might be considered relevant for the text used. If that text differs with each display, it might be considered less relevant.

Finding whether images are identical might mean looking to see if the images shown on different pages are actually at the same address. For example, the same picture maybe show on ten different web pages, but the image itself is at one address, such as:

http://www.example.com/picture.jpg.

Identical pictures that aren’t at the same address might be compared by electronically reducing them to a computer readable hash value and comparing them to each other.

Number of Websites that Contain a Similar Image

Possibly following the same thinking as above, text associated with similar versions of images at different pages may reinforce the relevance of an image to text or may make it seem less relevant depending upon the similarity of text on the different pages.

A similar image is one that is resized to be larger or smaller, or has been cropped to contain only part of another image, or has had a border added to it.

Size of Images

The size of an image may also be considered a ranking factor. The patent application tells us that “users are more likely to click on images with the greater number of pixels,” so they may rank images higher if they have more pixels. But, we are also told that images with a great number of pixels might be ranked lower than images with a lesser number.

Not quite sure what the size of an image has to do with its relevance. Perhaps this ranking factor has more to do with providing a good user experience.

Link Relationships Between Images

Some images are linked together, such as a thumbnail version, and a larger version of the same picture. Information about the images could be shared, such as file sizes in pixels, as well as text associated with each picture.

Again, the patent application tells us that this information might cause the images to be ranked more highly or less highly for certain keywords, but doesn’t explain why.

It’s possible that if the linked pictures have text near each that is related in a meaningful manner that it could increase the relevance of those images for that text. If there is associated text, and it isn’t related, they might be seen as less relevant.

Frequency of an Image Within a Website

The number of times that an image is used on the same web site might influence the ranking of that image for certain keywords, both positively and negatively. This could be an image used on more than one page, or on the same page more than once.

If the image is part of the graphic design of the site, like a list bullet, rather than being meaningful on its own, it might be ranked lower. If it has more meaning on its own (perhaps a logo for the site?), it might receive a higher ranking.

Image Feature Levels

Features of a picture can also affect its ranking, such as “number of pixels, aspect ratio, image file size, image entropy, and image gradient.” Not sure how this is considered a relevance factor, but there might be some thought behind the quality of images as a quality or importance measure.

It could also be a nod again to the idea that the better quality an image is, the better the user experience in seeing it.

Other Ranking Factors

a) Total number of images on a page
b) Total number of images that are linked to by a particular page
c) Total number of thumbnail images that are located on the same webpage as the ranked image.
d) The total number of links there are to the URL of the an image.

Weighting Text by Its Distance from an Image

Text that is closer to an image on a web page may be more relevant to what the picture is about than text that is further away. The distance may be calculated a number of ways, including looking at different distance elements, such as:

a) The number of intervening words between the text and the image,
b) The number of intervening full stops such as “.” “?” “!” and other sentence-ending punctuation/symbols between the text and the image,
c) The number of intervening table data tags (<td>) between the text and the image, and;
d) The number of intervening table rows tags (<tr>) between the text and the image.

Face and Name Detection

Images with the greatest number of faces might be ranked higher than others in some instances, and with only one face in them ranked higher in other instances. Again, the rationale behind those factors aren’t detailed in the patent filing.

It’s possible that if the search query is detected to be for a specific person’s name, that a single face might be the one ranked higher.

Conclusion

The Microsoft patent application on ranking images doesn’t go into much detail on why some of the factors they point out might be helpful in determining how relevant an image might be for certain keywords, but it does describe a number of factors that go much beyond just associating text that appears upon a page with the image or images on those pages.

It leaves a lot to think about…

Share

39 thoughts on “How Do Images Get Ranked in Image Search?”

  1. Pingback: How Are Images Ranked in Image Search?
  2. Nice post, this is something that’s been on my ind for quite some time, and is something I don’t feel enough SEO’s are paying attention too, especially as Google universal search becomes more and more apparent. A savvy image optimizer could outrank the competition simply by having the most optimized image on their site.

  3. Thanks, Eric

    I really liked this patent filing because it expanded upon how images could be ranked in ways that aren’t so obvious.

    They didn’t spell out a lot of the specifics regarding how they might use some of the ranking factors that they listed, and I got a sense that they may have left more than a couple out of the document, but it’s interesting to see how a search engine could look at more than just the text that surrounds a picture, or that is included in alt text for an image, to determine how relevant it might be for a keyword based search.

    And as you note, with Universal Search being so important these days, it makes sense to delve deeper into image optimization.

  4. Thanks for alerting us to what may be happening in image search, Bill. I still find it an impenetrable jungle.

    I am not persuaded that the data they are using will do a significant part of the relevance job, unlike regular web search. It seems to me that this is an area where you have to bring human judgements into the equation. This is a prime area for incorporating ‘click through’ data from searchers. When they’ve got it working in images, they could then distill their wisdom and see what weight they might then give clicks in regular relevance measures for textual searching.

  5. I’ve never really understood how it all works, so I’m glad to see that it’s a number of different variables that ultimately rank images in their order.

  6. Pingback: Trouble with Google Images - Bloggeries Blog Forum
  7. Great post! I have about 150 RSS images on one of my pages I just can’t seem to get listed in searchs…it is kind of frustrating but I will see what I can do by following your post here.

    Thanks

  8. Pingback: MSN is still trying to keep up, now with Image Search | Greenlane Blog-O-Matic
  9. Great post – I just blogged it / linked it. I always think that it’s important to know what’s coming, to start working it into your strategies. You’re the man again, Bill.

  10. Hi Barry,

    Human judgment in determining what an image is about may be the way to go. I wonder how well Google is finding that the Google Image Labeler is working, speaking of getting humans to label pictures.

    It would be interesting to compare that data to click through data.

    Hi Jessi,

    I imagine that there may be some other factors that we haven’t been told about, too.

    Hi Collin,

    I think that I see the page that you are talking about on your site. It may sound like a lot of work, but I wonder if you added captions for each of those if it would make a difference.

    Hi Bill,

    Thanks. We can never be quite sure how much of the things described in each of these patent applications will actually become part of what a search engine does, but a lot of the ranking factors for images described in this patent application seem to make sense as things that a search engine might look at when ranking images.

  11. It’s a nice article.
    You were right, i ever have a question like this either, about why i getting visitor from Google image search every day. I have post about online game and take a screenshot of the game. But when i searched trough Google image using name of the games as a keywords , my blog is shown in first list before the real website of the game.

  12. Great Bill!
    Your posts are always fantastic!
    In my experiments in images optimization, i’ve found a connection between classic on-page factors (title, description-H1) and keywords related to images.
    So I’d add “On page SEO factors” in the list of ranking factors.
    Does anyone confirm my idea?
    Greetings from Italy!

  13. @DazzleCat
    I know that it seems logical, but it’s not so considered in images optimization.
    That’s why wikimedia commons images are so well ranked on Google Universal Search result pages.

  14. Hi mriza,

    It’s interesting that an image from your site ranks ahead of the game’s site in Google’s image search. Perhaps there are some features of your screen shot, or the text you use, or both, that Google likes about the image.

    Hi Andrea and DazzleCat,

    Thanks. The patent filing does note that “classic” on page factors may play a role in how images might be ranked, but it also mentions that those factors may be misleading if the textual content of a page, and an image on the page aren’t good matches for each other. Many of the additional factors listed attempt to provide more ways of learning what an image might be about, other than just the text and HTML that appears upon the same page.

  15. How about the ‘alt tag’ doesn’t that help in ranking the images according to the keywords.

  16. Hi Eva,

    Things like alt text probably still hold value in the way that images are ranked and indexed. Hopefully this post gave you some ideas on other things that a search engine might consider when it tries to decide which images to show people in response to a search.

  17. Hi Richard,

    Thank you very much for sharing your observations on image rankings.

    I like captions for images as well, especially if they are contained in the same HTML container as the image – paragraph, table cell, etc.

  18. Once more a very useful post. Thx for that!

    Since a part of my final paper is about the influence of google services on google serps, trying to figure out how image search works is quite a big issue. Actually it’s a much bigger issue to figure out why and when images appear within the serps since those few pics are almost never the same like the first images when performing a image search.

    Regarding the serps: After a lot of testing I’d say that size doesn’t really matter (hehe, what a nice statement ^^). The obvious things like text and links and so on obiviously do. furthermore the use of image labeler looks, at least for now, quite interesting. But you never know how much impact the use of this gadget really has… I Think I gotta do a lot more testing on this and hope to come up with some more detailed results.

    Cheers,
    Sascha

  19. Hi Sascha,

    You’re welcome. Thanks for your thoughts on this topic.

    I do think that size can play a role, especially in keeping very small or very large images from being ranked well.

    Images showing up in a Google Web search as blended results is interesting. I’d take a look at the patent that I discuss in this post for some ideas on which images show up in web search results and why:

    How Google Universal Search and Blended Results May Work

    Good luck with your testing.

  20. Content and images have helped our ranking and ranking of our clients more than anything. Quality content on quality sites have helped the most but I am going to concentrate a little more on the images.

  21. Hi Elizabeth,

    Quality content can help a site significantly, but many of the sites I look at have problems with search engines just being able to crawl and index their pages well. The greatest content hidden on pages that search engines can’t reach goes unseen, unfortunately.

  22. I’m recently approaching the SEO and your blog is really a great source of information.
    I get right on your website as a favorite. :)

  23. Some people say image hot-linking increases SERP rankings in Bing & MSN, but has an adverse impact on image rankings in Google… However, I’m more concerned that my bandwidth will suffer, and hot-linking isn’t such a good practice when it comes to running media-extensive websites… But, again I’ve read countless people saying that one should allow image hot-linking… What do you think?

Comments are closed.