How Search Engines May Use Images to Rank Web Pages

Images on a web page can provide a chance to express ideas in a visual way that can convey a considerable amount of information, and may also add to the attractiveness and perceived quality of a site.

When search engines rank pages in search results, images may have some impact in those rankings.

A search engine might look at the captions associated with pictures, or alt text provided as an alternative for when people browse the Web without images turned on or when those browsers are using screen reading software.

Search engines might also look at text surrounding an image, especially within the same HTML container, or block or segment.

Those indexing services could also associate other content on a page with an image, including the page’s title.

There are a number of uses that people might put images to when creating a Web page, such as:

  • Page and layout decorations
  • Bullets for lists
  • Site logos
  • Spacer images to help keep a layout in place
  • Topical images – pictures that add to and illustrate concepts discussed on pages
  • Galleries of pictures of people and places and things
  • Advertising images
  • Banners

Some images are more important to the content of a page than others, so how might a search engine use the text associated with some images and not others? What does a search engine look at when deciding how important an image might be to a page?

Image Scores

A new Microsoft patent application recently published provides some approaches to creating and using a score for images that may impact Web search results in some very interesting ways.

This scoring system identifies text associated with an image in a document, referred to as “image text,” and determines an “image score” based upon the image text for an image in that document.

The image score can be used as an indication of the relevance of the document to the image text. The image score may be used in a number of ways, including a signal in ranking web pages of a search result based on their image scores.

The patent application is:

Scoring Relevance of a Document Based on Image Text
Invented by Qing Yu, Shuming Shi, Zhiwei Li, Ji-Rong Wen, Wei-Ying Ma
Assigned to Microsoft
US Patent Application 20080215561
Published September 4, 2008
Filed March 1, 2007

Abstract

A method and system for determining relevance of a document having text and images to a text string is provided. A scoring system identifies image text associated with an image of the document.

The scoring system calculates an image score indicating relevance of the image text to the text string. The image score may be used in many applications, such as searching, summary generation, and document classification, image search, and image classification.

Applications that Can Use an Image Score

The image score may be used in many applications. Some examples mentioned in the patent application include:

1) A search engine may rank web pages of a search result based on their image scores. The image score may be combined with a text relevance score and a static ranking score (like PageRank) to provide an overall ranking.

2) A document summary system may look at an image score to determine whether an image of a document should be included in a summary, or snippet, of the document.

3) A document classification system may use an image score calculated from a comparison of image text to a textual description of a class as an indication of similarity between the document and the class.

4) A vertical search system may factor in an image score to search for items, such as products or news stories.

Calculating an Image Score

A web page may have a large image positioned at the top center of the web page and small images at the bottom of the page acting as links to other pages.

The large image may be more important to the web page, and the image text of that image may better represent the overall topic of the page than the image text of the other images.

The scoring system may increase the image score of that larger image, and decrease the image scores of the smaller pictures.

The importance of an image may be based on:

  • Image level features – taken from the image itself
  • Page level features – based upon the relationship of an image to a page, and,
  • Site level features – based upon the relationship of the image to the web site of its web page

A table from the patent gives us an idea of what might be looked at in creating an image score based upon image level features, page level features, and site level features:

Feature Level Feature Description
Image Level Size – Area of the image

Width/height ratio – Ratio of the width of the image to the height of the image

Blurriness – Degree of blur of the image

Contrast – Contrast within the image

Colorfulness – Measure of color within the image

Face – Flag indicating whether the image contains a face Photo vs. graphic

Flag indicating whether the image is a photograph or computer generated graphic
Page Level Relative position X – Relative horizontal position of the image within the web page

Relative position Y – Relative vertical position of the image within the web page

Relative size – Percentage of the web page occupied by the image

Relative width/height – Ratio of the width-to-height ratio of the image to the ratio width-to-height ratio of the page
Site Level Inner site – Flag indicating whether the image is contained within the web site of the web page

Frequency in site

Number of times the image appears on different web pages of the web site of the web page

Image Text – What the Picture is About

Image text may be the anchor text (if the image is also a link), URL text, alt text, and surrounding text associated with an image.

The scoring system may use various techniques for identifying surrounding text of an image:

  • Rendering a web page in memory and analyze its layout to identify the surrounding text based on distance from the image.
  • Using rules to identify surrounding text from the HTML document representing a web page (e.g., passages consisting of 20 terms before or after the image).
  • Using a Document Object Model (“DOM”) based technique for identifying surrounding text.

Discarding Some Images

Pictures that aren’t important may be removed to improve the accuracy of the scoring, such as really small images, or ones below a certain threshold in terms of importance.

Conclusion

It has been a fairly common belief that a search engine will look at text associated with web pages when ranking those pages, but information from the search engines about how they might actually use text related to images has been somewhat scarce.

This patent application provides us with an opportunity to think about how a search engine may place more importance upon text related to some images over other images based upon how important an image might seem to be to a page.

It also describes how a score associated with images might be used in other ways, such as including an image with a snippet of text that shows up in search results for News and Web searches.

How important are the different images that appear on your site, and what do they tell search engines about your pages?

Share

63 thoughts on “How Search Engines May Use Images to Rank Web Pages”

  1. as someone who works with images a lot, these are some nice insights into what future “updates” might do :)

    Thank you for sharing this Bill :)

  2. Hello Bill,

    it could also make a difference, if the image is referenced by an HTML img tag or via CSS (like background-image). That would be another point for Page Level.

    Maybe future search algorithms will utilize informations from EXIF data, I think some people try to use RDF in the EXIF data of an image (Image Level).

    And there is not just the binary graphics formats (.jpg, .gif, .png). We have SVG, which I like very much :)
    It’s possible to store a lot of additional information in SVG-Graphics, it has title and descriptions tags and as a markup language it is indexable and searchable.

  3. Hi Praveen,

    You’re welcome. This was one of those patent applications that I wondered while I was reading it how much of what is being described is actually happening today, and what search engines might be using some of the techniques involved. For example, we know that Google can identify faces in images. Is the inclusion of a face upon a page something that might cause that image to be considered more “important” than other images on the same page, causing the text and meta data associated with the image to carry more weight than the text and meta data related to other images on the same page.

    This patent filing carries a lot of questions with it. :)

  4. Hi Alphane Moon,

    Some very good points.

    When a patent application like this is published, and the authors describe the kinds of things that might be done with it, they usually don’t go on to discuss every aspect of how it might work and be used. You’ve come up with some additional image level and page level factors that would fit in nicely with how this patent application might work. Thanks for suggesting those.

    I know that Yahoo is taking advantage of EXIF data in creating automatic tags for Flickr for things like time and date that a picture was taken, and what kind of camera was used. Interesting to think that a search engine might be looking at that information when crawling the Web and looking at pictures. :)

    I’d like to see more done on the Web with SVG; hopefully we will sometime soon. This interview was interesting: Ted Gould: SVG, Inkscape and Web Standards

  5. Hi Bill,

    Yeah, this gives way to more questions than a def answer :)

    As with any new things, this opens a new set of challenges to work with and new avenues for more growth ;)

    Exciting times ahead for sure :)

  6. Very insightful post Bill! This just shows you the importance of having unique optimized images embedded inline in your content.

  7. Hi Praveen,

    I really enjoy running across a patent or paper that leads to a lot of questions. It does mean new challenges, new growth, and a chance experiment and observe.

    Hi mrspoton,

    Thanks. It does emphasize the importance of making sure that you have unique alt text, captions, etc., with images that are important to the content of a page. I really liked the lists of image level, page level, and site level features described in the patent application that can influence how important an image might be considered.

  8. Hi Bill,

    I use SVG inline in XHTML on every page of my blog. SVG is one of my topics :)
    The more I use it the more I love it.

    Thank you for writing this wonderful article, it’s very inspiring.

  9. I have always tried to keep the use of images on my sites to a minimum for search engine purposes. It is interesting to see Microsoft making advances into the ranking of images on sites. However, it seems that even with image and graphics search, content, context and text are still the three wise men.

  10. Thanks for the interesting read Bill.
    Ive always known the importance of alt tags and captions but never considered how or that images carry other SEO factors.

  11. I don’t tend to use many images on a website, because I think that the text is more important for the search engines but this is a very interesting article and would be good if this caught on. It’s always good to see advances in SEO. Thanks for this information.

  12. That’s a very interesting article and one which makes complete sense. I’ve always understood that for optimal SEO links should be ‘text links’ with more emphasis placed on these by search engines than an ‘image link’, even with ‘alt’ tags. With the large uptake in broadband, websites are much more reliant on images for their content and it is only just for these to count in search rankings.

  13. Hi Peoplefinder,

    I’ve seen many sites that have really benefitted from having the right images in the right places. One of the thoughts going through my head while reading this patent filing is that it raises some ideas on how many different kinds of content other than text might influence how pages are ranked by search engines – not just images, but also video and audio.

    With increases in broadband, with more people using video and audio and images on web sites, search engines need to consider how they are going to rank nontextual content, and how that content may influence the rankings of pages that it is contained upon. What unique features of images, of video, of audio might influence how a search engine may rank those pages?

    One of the statements in the early sections of the patent application tells us:

    [0005]Web pages are multimedia documents that include various media types such as text, images, video, and audio. The non-textual media types play an important role in conveying the information content of a web page to a user. Images in particular play an important role in conveying information to a user. The authors of web pages may prefer to express information as an image, rather than as text, because as the adage says, “a picture is worth a thousand words.” The authors may also prefer to use appealing images, rather than text, because the resulting web pages may be more attractive and may be perceived to be of higher quality.

    As described above, typical techniques for ranking web pages factor in textual relevance and static importance. These techniques, however, typically do not factor in the information associated with or based on the non-textual media types.

    I like your statement about content, context, and text, but we are going to see multimedia play an even larger role on the Web. :)

  14. Hi Mobility Creative,

    Thank you. It’s understandable that you’re considering search engines on your site, but if images may be beneficial to your actual visitors, you may want to consider them. Sometimes an image really can make a difference in helping someone understand what you are writing about.

  15. Hi Mark,

    I like text links myself, but since search engines will consider a variety of signals when ranking pages, it might not hurt to have a greater range of signals, including images with associated text – captions, alt tags, nearby text, etc. That range of signals may act to reinforce each other. strengthening all signals.

  16. Hi,

    William,

    Thanks for the reply. I am constantly tweaking my site in an attempt to achieve better results and I will certainly incorporate some of the things I have learnt here in my future updates.

    Thanks again.

  17. Until now I don’t optimize my images. Thanks to you I learn to make this and my site gain 2 spots in SERP’s in two days. Thanks William.
    Keep up the good work, here a new fan.

  18. Hi Tomelloso,

    You’re welcome, and congratulations. That’s good news. :)

    I’m happy to hear that you decided to make those changes, and have seen positive results. Nice going.

  19. Having experimented for YEARS with image SEO, it appears through trial and error that:

    Keywords in Image URL
    Alt Tags
    Surrounding text
    Captions
    Image Size
    Image Placement

    can have an enormous effect on placement. For many sites, image search is not important, but for many retail sites, it does bring in a significant amount of relevant traffic.

    It also appears that the on Google’s organic SERPs, three images will show up in the top spot for searches for celebrities. This brings in significant traffic. One thing worth noting about those algos is that, although Google will usually display the first three images that are ranked the image search – sometimes they will skip one image out of order.

    Analyzing the image that was substituted for the skipped image, it will often be larger and a higher quality image in terms of color and sharpness

  20. Hi AD New York,

    Thank you for sharing your experiences with Google’s image search, and their inclusion of images in organic search.

    Interesting to have some indication that Google does seem to make use of image size and quality when showing pictures when interleavening them in organic search.

  21. I haven’t tried optimizing for images, as text seems to be enough of a challenge. I’ve read a few tips before about SERP image optimization, but size, quality and face recognition were never included in the discussions. Those are some interesting points to consider.

  22. Hi Chris,

    It’s worth trying some image optimization on your pages, even if it’s just to add alt text for people who are using screen readers, and captions to add some context to the pictures you use. If you pick images that do a good job of illustrating what you are writing about, and are decent quality, you’re a good part of the way towards optimizing images for your pages.

  23. I think one aspect to SEO is keeping up with the way Google changes the way in which they rank or rate webpages. As images and graphical content becomes more important I’m sure their method of analysing pages tried to take this into account to keep their search results accurate. It’s an ever changing world in which site owners need to keep on top of.

  24. I still find it quite unbelievable that image optimization is going to be very important. Of course, it could be a minor factor when you are thinking about graphs, but there is only so much that Google can see in an image. Most scolar papers do not include many images and are usually of very high value to a searcher.
    That titles and alt texts are important is not really anything new. They are also very useful for readers that can’t view your images.

  25. Very interesting article–some very good points were raised both in the article and through the user comments. I had never really stopped to truly consider the effects images would have in rankings, as I had always assumed everything was based on text content. Definitely something I’ll be considering in the future.

    Thanks for another interesting read!

  26. Hi Mark Jones,

    Good point. From a business sense, it’s important for business owners who have a web site to understand as much as possible the atmosphere that surrounds them and their business online, including some idea of how a search engine might treat their site, and act as an index of their web pages.

    SEO at its primary level involves keeping an eye on search engines, and making sure that a client’s site has a chance to be indexed and found by people who might be interested in what it has to offer.

    With faster connection times, it does seem like images and graphical content are playing a larger role on the Web, and that’s a framework that the search engines as well as site owners all have to operate within.

  27. Hi SEO Sensation,

    With all of the patent filings and whitepapers that have been coming out from search engines about how they may index and rank images based upon more than just the text and meta data associated with those images, I think it might be important to consider the possiblity that image optimization is an area where we are seeing some interesting advances in indexing from the search engines.

  28. Hi Mark,

    This patent filing did raise some interesting points about how images might impact the rankings of web pages that I hadn’t seen before.

    One of the things I found most interesting was the breakdown of factors involved in creating an image score into image level features, page level features, and site level features to rank images. The idea of a search engine looking at more than one page at a time when considering a score for an image is interesting. What if a search engine does a similar kind of analysis for links to external pages, for instance. :)

  29. I am currently moving and updating images from Picasa to Flickr and somebody posts a comment to an old post of yours about images, so I guess the subject is images.

    That the use and acceptance of image meta data saved directly within the image is not used more and adopted is something I do not understand. If you use a service like Picasa or Flickr, the meta data associated with the images are not stored within the images itself, meaning that tagging images once and adding descriptions only once is far from reality. If you have a local catalog software, you have to tag and label every image again and if you upload it to any photo-sharing site or social network, you have to do it over and over again as well.

    There are some meta tags formats for images, similar to the ID3 tags for MP3 audio files.

    I just came across EXIF, which is used for JPEG images only from what it seams. Google Picasa seems to read some of such Meta data at least.

    I noticed that the Camera model that took the picture showed up in image descriptions from time to time. Some digital cameras add thos to the images.

    In addition to my wish that there will be a standard adopted that will work for all image formats, gif, png, jpg, bmp etc. and widespread supported by tools and services, I am wondering if Google uses some of the scarse Meta Data in images already and considers them for the image search results and ranking.

    It has to come one day and then should the SE be ready and not just start looking into this subject after it took off.

    video is similar and more used than the meta data for images, however, there are different formats today, depending on the video container. WMV tags are different from AVI tags for example. If other formats like MOV/MP4 or FLI support meta tags, I don’t know. But same thing here… it will come one day, because it makes sense and would be right.

    More stuff to think about :)

  30. Hi Carsten,

    Good to see you.

    I suspect that as more people want to use the kind of meta data that you’re writing about, such as EXIF, that more standards will develop around it. Of course, those standards will have to address privacy concerns as well, which should be the basis for some interesting discussions. I’m looking forward to that kind of information being more readily available.

    I understand that the ID3 meta data for audio isn’t used as much in the US as it is in other places such as Europe. I think that’s a shame. I’d love it if my car radio had a display that showed me the name of the artist, song, and album as it was playing.

    Thanks.

  31. I think the implementation of this would be a great idea. The internet has moved on, with higher resolution images and hd videos just a download click away. It may be a good idea to analyse this type of media because the increase of internet speeds = increased quality of media + quantities.

    Take image content and how we use it today, more and more social image websites are cropping up because its in demand! So looking into this exploration will be the way forward for ranking.

  32. Hi Simon,

    Good points. This does seem like an avenue that search engines should consider traveling down. The web will likely become more and more multi-media rich in the future.

  33. I think that image optimization has been playing a SMALL role for a while now. SEO’s have been saying to keep clear alt tags and try to fit the keywords you are looking to achieve within the tag if possible.

  34. Hi Graphic SEO,

    It’s likely that image optimization has been playing a role in how pages are ranked from a very early stage in the development of search engines. What I liked about this patent filing is that it provided us with more information on the kinds of things that search engines might be looking at when they do look at pictures. :)

  35. Very interesting article thankyou William! I’ll be very interest in seeing what Google implements when they roll out one of their ‘revolutionary’ updates! Its the third page I’ve bookmarked on your blog for easy reference now!

    Thanks

  36. Hi Horsham Web Designer,

    Thanks. :)

    This patent application is Microsoft’s, but it wouldn’t be a surprise if Google was considering some of the same factors.

  37. Excellent article, I agree just because it’s a Microsoft patent doesn’t mean that Google isn’t using something very similar. I think image optimisation is exceptionally important, especially if you want the upper hand on your competitors.

  38. Hi Creare Design,

    Thanks. I do think that Google would have come to many of the same conclusions and ideas about interpreting, understanding, and optimizing images that Microsoft has. I’m not sure that there’s any way for them to avoid doing so.

  39. A great insight here. Image search really is a valuable additon to SEO that needs to be thought about, however most people aren’t as sure on how to rank highly for images

  40. Thanks. There really hasn’t been much from the search engines on what they might favor when it comes to ranking images, so I was happy to see this patent filing from Microsoft that provided some ideas of what they might be looking for.

  41. This sounds like pretty advanced stuff. I wonder how well this will work in practice though, especially when judging the quality of the actual image in terms of bluriness, etc. Does this mean that people who aren’t very technical with web design may suffer somewhat because they might not know how to include scaled pictures in text, for example?

    James

  42. Hi James,

    I did really like that they broke ranking elements down into categories on three levels – image, page, and site. If you spend some time thinking about it, I believe the approach makes a lot of sense. It’s not as simple as just thinking about the file name, alt text, possible caption, and maybe some other text on the page the picture appears upon, but I think it falls into common sense when you take a step back and consider what they are doing.

    I’m not sure that I have a problem with higher quality images meaning that a page might rank more highly. I think someone who puts forth some effort, even if they aren’t professional and very experienced designers, can come up with good quality images on pages. If a search engine considers the quality of a page, and all of them do in one way or another when they rank pages, then higher quality images on a page might be one signal that search engines may look at. Of course, it’s not the only signal.

    A Microsoft paper from 2005, Learning to Rank using Gradient Descent, describes a method of ranking documents on the Web where the researchers involved looked at 569 different features from web pages:

    Query dependent features are extracted from the query combined with four different sources: the anchor text, the URL, the document title and the body of the text. Some additional query-independent features are also used. In all, we use 569 features, many of which are counts.

    So, if the quality of images are one feature that might be considered, there are many others that may be weighed as well. Someone who doesn’t know how to scale their pictures well may not suffer too badly. But if they can master doing that, it may help at least a little. :)

  43. Hi Richard,

    A lot of what I’ve read in the past about possible ranking signals for images, and how images may influence the rankings of pages has been pretty limited. This patent added a good number of other possible signals that search engines might also consider. You’re welcome.

  44. Hi Webdesign Hamburg,

    I’m not sure that the days of search engines reading text in images are too far off. Considering acquisitions by Google of neven engineering, and the Optical Character Recognition and object recognition technology that they developed, I think we’re getting closer. Google’s investment in understanding images in their streetviews technology, as seen in a couple of patent filings from Google, shows that they have a strong interest in understanding what is seen in those images.

    See my post from a couple of years ago:

    Better Business Location Search using OCR with Street Views

    If Google is investing effort in reading signs from images found on streets in the world, text within images on web pages may not be as difficult a challenge as might be thought.

  45. I would agree with you there Bill. If you look at speed camera technology or the technology they use for the Congestion Charge cameras in London they already use image and optical character recognition so it’s highly likely search engines already use it to rank pages based on image data. For example, when you look at image and text data as a whole they can generally both be as abstract as each other, especially when trying to place that data within the context of a subject.

  46. Hi Rich,

    The growth of this kind of technology is incredible, and object and optical character recognition software is being used in cameras around the world. It’s hard to tell what stage the search engines are at in using this kind of technology on the Web, but as you note, they are able to compare what they see in text that they associate with images and what their recognition technology might tell them about those images. Those would be fun experiments to conduct.

    I do see a lot of images that are placed upon the Web that don’t have alt text or captions, or that are misleading in what they include as alt text. Having the ability to use recognition software could make indexing of those images much better.

  47. Image links and alt tags have been abused for years now and I don’t see search engines using them highly for relevancy anytime in the near future. Just ask a search engineer what they think about meta tags.

  48. Hi Jacob,

    There have been a number of whitepapers and patent filings over the past few years from the search engines that describe how they might use meta description elements for determining snippets, and the alternatives that they might use instead, and I’ve written about a number of those. They do a decent job of documenting how they feel about those.

    There have also been many that describe how they might use image links and alt attributes for ranking both images in image search and pages in organic search. The patent filing I wrote about in this post actually takes a fairly informed approach to looking at a wide range of ranking signals that goes beyond a reliance upon image links and alt attributes.

  49. Good grief, some of the attributes seem arbitrary. Blurriness? How are you supposed to optimize a monochromatic or purposely desaturated image? seems silly.

    Others are kind of genius. Multiple uses across the site, for example, would be an easy way to discern whether or not the image has something to do with the immediate content surrounding it, or if it’s just a generic icon or stock image… using sources like iStockPhoto could become a problem.

    But things like html5 page/article segmenting are probably a better use of time.. until you’ve exhausted everything else.

    Great article, though. Food for thought.

  50. Oh man, I just realized my comment was pretty late in the game. Bill, has any of this changed or proved itself important?

  51. Hi Houston SEO,

    What I liked about this patent is that it really expanded the kinds of things that search engines might be looking for to determine how important some images are on a page, and how much of an impact information that can be gleaned about those images can impact that pages rankings.

    I think there’s a lot of value in trying to do some more of the kinds of things listed here, especially for images that might be very meaningful to the content that might be on the pages of your site. If you have multiple images on the same page, and the ones that seem to be the most important (looking at some of the signals listed in the patent filing) focus upon the same topic/keywords/categories as the rest of the content of your page, then they may help to increase the ranking of that page.

  52. Do you think that an image could be used as anchor material from a referring site?

    Like text anchor links know being beneficial, would an image being used as a link instead of text be as good as text?

  53. Hi Durham,

    No, I don’t believe that an image used as an anchor wouldn’t pass along quite the same value as actual text that a search engine can read and index.

    It’s possible that alt text associated with that image might have some value, but I’m not sure that it would caount as much as anchor text.

  54. Interesting post, now in 2012 it seems that still the engines are fairly useless at reading anything but text. Maybe within the next couple of years technology that can assess an image’s content and/or relevance may come to fruition.

Comments are closed.