How a Search Engine Might be able to Tell Whether an Image is an Advertisement

Recently, many are pointing to Google’s Panda update as one that considers things like how much advertising and where advertisements are located on a web page as indications of the quality of a Web site.

Of course, there are likely other factors that the search engine would consider when scoring a site based upon quality signals, but the ratio of advertising to content seems to be an important signal.

An image of someone posting a wall advertisement between two other posters on a brick wall.

Many sites rely upon advertisements as a source of revenue, and being able to offer ads that are relevant to a visitors informational and transactional needs isn’t a bad thing. Many sites referred to as content farms primarily offer enough information to have their pages rank highly in search engines for certain terms without providing a range and depth of information on topics related to those terms.

Their goal is to attract visitors and get them to click upon advertising found on their pages. Some of those content farm sites do this by using content found on other pages on the web, with or without permission or the licensing of that content. Others hire people to write articles at very low prices, and don’t focus much upon the actual quality of that content.

Over at Google’s Webmaster Central help forums, a Google Search Engineer started a thread for people who believe that they were negatively impacted by the Panda Update who “shouldn’t” have been. At the time of writing this post there are more than 700 responses in that thread, Think you’re affected by the recent algorithm change? Post here.

Since advertising seems to be one of the things considered in the update, I thought it was interesting to see a patent filing published from Microsoft that explores the differences between images that are used as advertising, and images that aren’t. I imagine that some people who were carrying text-based ads like Google’s Adsense might consider replacing some of the textual ads with image based ads. I’m not sure that would make much of a difference. As the Microsoft patent points out, there are ways to determine whether an image is an ad.

The Microsoft patent filing is a machine training approach that learns as it goes, to be able to classify images as advertisements by looking at a range of features associated with images. The patent application is:

Classification of Images as Advertisement Images or Non-Advertisement Images
Invented by Mingjing Li, Zhiwei Li, Dongfang Li, Bin Wang
Assigned to Microsoft
US Patent Application 20110058734
Published March 10, 2011
Filed: November 12, 2010

Abstract

An advertisement image classification system trains a binary classifier to classify images as advertisement images or non-advertisement images and then uses the binary classifier to classify images of web pages as advertisement images or non-advertisement images. During a training phase, the classification system generates training data of feature vectors representing the images and labels indicating whether an image is an advertisement image or a non-advertisement Image.

The classification system trains a binary classifier to classify Images using training data. During a classification phase, the classification system inputs a web page with an image and generates a feature vector for the image. The classification system then applies the trained binary classifier to the feature vector to generate a score indicating whether the image is an advertisement image or a non-advertisement image.

While this is Microsoft’s patented process, chances are that Google could be looking at very similar information to distinguish between images that are ads and images that aren’t.

There are a number of reasons to distinguish between advertising and non-advertising images. One is that a search engine wouldn’t want to include images that are advertisements in its image search results. Another might be to get a sense of how much advertising is on a page compared to non-advertising content.

Feature Types Associated with Classification

The patent filing provides us with four different possible feature types that the search engine might look at, and examples or reasons why these features are useful in determining whether or not an image is an advertisement. These include text features, link features, visual layout features, or content features.

A text feature looks at words that might be associated with an image that could be found in places like the URL of the image, the ALT text from the image, or text that surrounds the image. These words might provide an indication that the image is an advertisement, including words like “pop-up” and “advertisement.”

Link features are ones involving where links associated with an image might point. Does the image link to a page on the same site where the image is found, or to a different location which might be an advertisement server or a web page where an advertised item might be purchased.

A visual layout feature identifies where an image is visually laid out within a web page.

Content features of an image relate to the content of the image itself, and can include:

  • Aspect ratio of the image,
  • Image format,
  • Whether the image is a photograph or a graphic,
  • Size of the image,
  • Number of different colors of the image,
  • Percentage of gray area of the image, and
  • An indication of whether the image has high contrast.

Some of these content features may be helpful in determining whether or not an image is an ad.

We’re told by the authors of the patent that an aspect ratio may be useful in identifying banner advertisements which tend to be short and wide.

The image format can identify images with multiple frames (used to provide animation).

Distinguishing between photos and graphics is important because most ads tend to be graphics which incorporate information about an advertising offer.

Size is useful because ads need to be at least a certain size to be noticed.

The number of colors used in a graphic is important because advertising images generally have fewer different colors than non-advertisement images.

Gray area is worth considering because “advertisement images often have varying shades of gray as a background.”

Contrast is looked at because advertisement images usually are created to have sharp contrast.

Conclusion

While this patent provides a number of examples of how a search engine might decide whether or not an image is an advertisement or not, chances are there there are other things the search engine might look at as well.

In a system like the one that Google is using to rerank search results based upon the “quality” of pages, chances are that when the search engine is looking at the advertisements on pages it isn’t just looking at the text based ads, but also images that are advertisements as well.

Share

52 thoughts on “How a Search Engine Might be able to Tell Whether an Image is an Advertisement”

  1. Bill – While this comment is not related to the classification of advertising, we are testing the impact of image classification. Previously, we pro-actively reused images on our site in order to take advantage of the fast downloading of cached images. However, since the Panda update, we are experimenting with cropping images so that it may appear to a search engine with an image classification system that there is additional unique content on the site. It is to early to get a read on this test. Given that it reduces the usability of our site and slows download speed, it is not an effort that we will expand unless we see positive results for search rankings.

  2. Advertising image classification certainly creates conflicts of interests for search engines. Google would benefit financially from penalizing sites that had too many image ads, unless they were Adwords display ads. I am not suggesting that Google is favoring their ad publishers, as the negative publicity if they were found to be doing so could outweigh the benefits of the increased revenue, but it is another example of a Google conflict of interest.

  3. “We’re told by the authors of the patent that an aspect ratio may be useful in identifying banner advertisements which tend to be short and wide.”

    That sounds like one of the few useful things you can do, since aspect ratio and placement are, to some extent, standardized throughout large portions of the industry.

    But otherwise it’s just too vague, iffy, and unrelated to the core user experience. It’s right up there with how alt-text somehow gets a substantial boost in Google, even though the vast majority of users don’t know what alt-text is, much less look at it.

  4. So that’s what they are calling the latest google (spank),err..update. Interesting. I find it humorous to be looking for something online and find a forum post from 5 years ago that is not even relevant any more at the top of page one and some article that is exactly on point buried on page 15. I don’t think google can can ever program relevance into an AI. Just sayin’.

  5. I heard that Google was changing up it’s algorithem, I wasn’t sure what would be changing though. I knew it was changing it’s search engine, but the picture aspect is knew, I hadn’t heard this before—I’m sure as soon as we figure it out, it’ll change to something else.

  6. I think you can still go away for quite some time when you have a very large image ad broken down into several pieces. I think there is still some work to do for the engineers to make this system bullet-proof while it will always hit some innocent people.

  7. Hi Tracy,

    I’m afraid that I’m not sure that cropping images to make them a different size is something that might have too much of an impact given Google’s image similarity search technology, but it does sound like an interesting experiment.

  8. Hi Sandy,

    There are a few different reasons for a search engine like Google to take a step like this. The simplest one is that it really doesn’t want to index advertising images and associate them with the web pages that they appear upon.

    It’s also important that the search engine doesn’t deliver searchers to pages that have very little helpful content, and are covered with ads, regardless of whether those ads are text-based or image based. I believe that advertising is a very valid business model for many sites, and I’m not arguing here for or against the use of ads on web pages. Instead, I thought people would find it interesting to see how a search engine might distinquish between images that are advertisements and images that aren’t, especially in light of Google’s Panda update which seems to be concerned about the quantity and placement of advertising on web pages.

  9. Hi Max,

    I’m not sure how your analogy to alt text as a ranking signal applies here. A web page that contains considerably more advertising, and advertising in places where most visitors to pages expect to see content is an important part of the experience of a user on a web page. Those visitors don’t need to be technically proficient in things like aspect ratios, image resolution, use of alt text, or other related topics. When you visit a page and you have to scroll down past big blocks of both text-based ads, and image based ads, see ads set up in ways that makes it look like navigation for a page, and so on, it’s probably not seen by visitors as a good experience.

  10. Hi Kathy,

    I think Google would be the last to say that they’ve developed an artificial intelligence that can distinguish between relevant and irrelevant pages, or pages that are higher quality or lower quality. We’re probably a long way away from a computer being able to look at a page and make that kind of judgment. Instead, what we have is an algorithm that looks at certain signals from a page to compare it to other pages. It’s quite possible that a system like that may be flawed based upon the assumptions programmed into it, and the data that it’s been given as examples of higher quality sites.

  11. Hi Michael,

    Google has been changing and amending it’s ranking algorithms on a regular basis for more than a few years. Representatives from Google like their head of Web Spam, Matt Cutts, has noted publicly that they’ve been averaging around 400 changes a year.

  12. Hi Andreas,

    Good point, though I suspect that someone may have considered the possibility of images found in an analysis like this to be broken down into different tiles. It wasn’t that uncommon an approach from many designers concerned about things from people stealing images to load times for those images to have them sliced into pieces.

    The “visual layout feature” described in the patent may be one way of identifying when an image is tiled like that.

  13. Hi David,

    I’m not sure that would be too much of a problem by itself.

    Reducing file sizes by doing things such as reducing the number of colors in an image might make that image potential seem more like an ad, but that’s just one feature amongst many. Chances are really good that this algorithm wouldn’t place too much weight upon any one feature. Things like where an image is pointing to might potentially carry more weight than how well optimized an image might be to load quickly.

    I’d also say that optimizing the file size and an image size are usually worth doing, especially if you use a number of images on a page and the original sizes can negatively impact the loading times of a page. I see too many sites where images are way too large.

  14. Another excellent post, although I suspect that Google’s algorithms (and those of other search engines) are already quite adept at distinguishing images from adverts; there’s one simple – albeit a bit superficial – test of this, and that’s looking at how many adverts are returned in image searches. The number of adverts returned is low in most cases.

    Ironically, even for a search on ‘advert’ most of the results that are returned are non-adverts.

    One important element that’s missed from the post (unless I’ve overlooked it) is the position of the adverts on a page, which ties in with one of your earlier posts on page segmentation. Adverts are usually located in several key areas.

    Cheers

  15. This is interesting Bill; however being skeptical there is the danger that google missreads some pictures and misinterprets them.
    They may need to set up a help line for people that have had their pictures misjudged.

  16. Hi Bill,
    Even though I’ve had some of my more commercial websites up for almost 4 years (e.g. one sells actual goods that need to be shipped by freight to customers), complete with what I believe is good quality and descriptive content on all my pages, it astonished me when I went from a page rank of 3 to a page rank of 1 after one of Google’s changes. And, at that time, I didn’t even HAVE any external ads anywhere on the site.

    Since then, that particular site has faired a little better w/ my page ranking (inching back up, and now at a 2/10) – and now I DO have some external ads – some w/ photos and some w/o. I guess these algorithms are just a bit beyond my understanding. But, I just keep plugging away, doing the best I can to add appropriate content. It’s sometimes tough when you have a site that only deals with a specific number of products. Sometimes there’s only so much one can write about then.

    Kath

  17. Brilliant information,images play an important role in ranking,i think that image carry more weight than text due to fact that a picture is worth than 1000 words

  18. Bill you write some interesting topics that I can sit and discuss forever, my comments on this could be larger than the post itself.

  19. Hi Michal,

    There’s no denying the value of the right image on a page or in an article. You can describe a shoe to someone, but you really need to show them a picture of it (and even better, a couple of images from different angles) before they might buy it online. You can tell someone about a news event, but an image that helps them visualize what happened adds tremendously to that story.

    I don’t know if the image itself carries more weight than the text on a page because much of the rankings associated with images tend to rely upon text associated with those images – file name, alt text, captions, possibly some text surrounding the image, but I think that when those types of signals are added to the other text-based signals on a page, and match up well together, it does have an impact. See my post from a couple of years ago, How Search Engines May Use Images to Rank Web Pages

  20. Hi Twosteps,

    We’re on the same wavelength there – one of the first things I asked myself as I was reading through the patent was how many advertising images I could recall seeing in Google’s image search. While that might seem to be a rough measure, I think it’s one worth considering.

    The patent does include looking at “visual layout features,” which involved where an image might appear on a page. The patent didn’t go into much detail regarding those features, and I didn’t either, but I agree with you that it’s the kind of thing that could be used with a segmentation process to add one more thing to examine in deciding whether an image is an advertisement or not.

    I have seem people put pictures in sidebars, often where other would put ads, that might be rotating images from a flickr account or images that might be used to help describe or define different sections of a site. So the placement of an image by itself may not be a strong signal – but that’s why the process in this patent looks at a range of signals.

  21. Hi Dave,

    There’s always the chance that Google and their algorithms might misread or misinterpret something. That’s part of the reason why I try to pay as much attention as I do to patents like this one. This patent on it’s own isn’t really grandbreaking, but it does give us lots of details about the depth of analysis that a search engine might go to in deciding whether or not an image is an advertisement, and it raises a lot of questions that we can ask ourselves about how a search engine might be interpreting what it sees on our pages.

  22. Hi Kathy,

    The toolbar pagerank you see is a little misleading. It supposedly is tied only to PageRank, and gives us a hint at the quantity and quality of links pointed to our pages. It’s not updated very frequently, and so only can be seen as a snapshot of what the pagerank of a page might have been at one point in time in the past.

    Signals that might help determine how relevant a page might be for a particular query, or the quality of the content of our pages isn’t something that we can learn from toolbar pagerank.

    If the pagerank of a page on your site went from a 3 to a 1, it’s possible that the reason was that some links that you might have had to the page where lost, or that the pages that were linking to you may have lost some of their pagerank. But the toolbar pagerank doesn’t tell you anything about how Google feels about advertising on your pages or the quality of your content.

    Now, if a page on your site isn’t showing any pagerank in the toolbar, that could be because the page is newer than the last time Google updated toolbar pagerank, or it might be a sign that Google hasn’t crawled and indexed that page. You could check with a “site” search to see if it is included in Google’s index (see this page from Google for more about their search operators.

  23. Hi James,

    Thank you. I’ve found myself writing comments on blog posts, and wondering if I should take what I’ve written and turn it into a blog post instead, with a link to the original post, and I’ve done that sometimes, and published the comment as it was other times. It’s not always easy to decide what might be the best way to approach those, but I appreciate the thought that goes into your comments.

  24. Hi Bill,

    “Google feels about advertising on your pages or the quality of your content”.

    So Google has an issue, if a site that sales widgets is indexed for the keyword “we sale widgets” but also displays advertising that is not related too widgets?

  25. Just my two cents about any kind of images and SEO:

    Alt Tags (attributes!) are useful when used correctly
    Use Alt Tags for humans first, Google second
    Describe the image in an accessible manner
    I surmise Alt Tags would be heavily policed – spam them at your own risk
    Use empty ALTs for design elements, descriptive text for pictures
    Opinion – Optimised ALT tags are about as useful as a link to a page
    without the word on it that’s in the anchor text. It’s second order.

  26. Interesting thoughts and information. It’d be interesting to see the amount of advertisement banners removed from search. This seems to plague my mind with the question though – what if you’re specifically searching for an advertisement in the image search? Seems a little difficult to assess.

  27. Thanks for the informative post.

    I’m pretty sure search engines can tell the difference between an advert and an image. A check of Google image search would tend to confirm that. I think you’re right in saying that the Panda update seems to be mostly about quantity and placement of ads.

    I’m wondering though about link-through banner ads that are primarily charity related, and if this type of banner could be taken as a commercial-type advertisement?

  28. I wonder how much of images in web are related with commercials and how much with some content, i mean how it would look if we would consider percentage, because it may be big problem for google because i think right now there is really much ads on websites.

  29. Do you guys think it’s possible to “overuse” the ALT tags for images? I recently re-did my site and optimized the ALT tag for each individual page (about a 30 page site). I didn’t just repeat keywords, I think I have about a sentence in the ALT tag. Is that overdoing it?

    Great information here, I love reading this site!

  30. Hi Dave,

    What I said was that:

    Recently, many are pointing to Google’s Panda update as one that considers things like how much advertising and where advertisements are located on a web page as indications of the quality of a Web site.

    A lot of the articles and forum commentary that I’ve been seeing mentions those things as possibilities, but Google hasn’t necessarily said that. I think the Wired interview with Amit Singhal and Matt Cutts gives us a better sense of what Google is saying about this update – TED 2011: The ‘Panda’ That Hates Farms: A Q&A With Google’s Top Search Engineers

  31. Hi Peter,

    I agree with you that alt text can be very helpful when done right.

    When it comes to a search engine attempting to distinguish between when an image is an advertisement or a non-advertisment, alt text can be helpful, but I think search engineers are right in wanting to look at as many signals as they can when making that distinction.

    I’m not talking about people using images in malicious ways with this post, but rather a process by which a search engine might understand the purpose of an image on a page better.

    There can be a few reasons for making this distinction. A couple of them could include:

    1. A search engine probably doesn’t want to fill up image search with advertising images
    2. If a search engine includes a thumbnail from a page (like they often do with news articles), they would want to use an image that supports the story rather than an ad that might accompany it.

  32. Hi Christian,

    I don’t know how interested Google might be in showing images that are advertisements in image search or pulled into web search from image search. I imagine that they would want to avoid that.

  33. Hi Jay,

    I would think that the search engines would still consider images that are advertisements for nonprofits to still be advertisments. I think their focus is more upon what they believe that searchers would want to see when they perform a search.

  34. Hi Mike,

    I’m not sure, but I’m tempted to do a few thousand images searches, and see how many of the images I see are advertisements.

  35. Hi Nick,

    A sentences worth of words as alt text probably wouldn’t be a problem. I’m not sure that I’ve ever seen a specific amount of words or characters one should use as alt text for an image, but the W3C does recommend that when your alt text starts getting to long that you use a long desc.

    I do think it’s possible to overuse alt text, and in some pretty spammy ways. I’ve seen people do it before. I would guess that it’s something that might have the potential to harm your rankings for pages if you placed a lot of spammy content in alt text for images.

  36. Hi Bill, for the most part banners comply with IAB Sizes ie. 300×250, 728×90, 160×600. I created a banner a month ago and a week later after searching for the product in the banner, it appeared in Google image search, first row. I just checked it today after reading your post and its not showing up in the first page section. This particular banner wasn’t being included with Google Ad Manager or OpenX, just a simple anchor link. Maybe it takes into account file naming convention, campaign-name_300x250_en_v1.gif; naming files with the dimensions is pretty standard.
    However, I found that custom sized banners for the homepage for example were still indexed and weren’t left out of the first section of Google Image Search results. It seems Google’s got IAB’s standard ad sizes number!
    Thanks for the post Bill!

  37. Hi Bill,

    An informative post, so much so that it was cited by Rand Fishkin last Friday at the SEOmoz Seminar in London. Being new to the industry I´ve been on a quest to soak up as much info as possible. I definitely reach the point of information overload at times. My suspicions were validated at the SEOMoz seminar that your blog is a trusted source. Having been lucky enough to attend I took the mention of your site as a glowing referral from the community and will definitely be back for more.

    Cheers!

  38. Hi King,

    Thanks for sharing your experiences with advertisements showing up (or not showing up) in image searches. The patent that I wrote about was Microsoft’s rather than Google’s, but I would definitely say that Google does something similar.

    It’s likely that the search engines have a good idea of standard sizes of banner advertisementsm and file name standards as well. Interesting that your custom sized banners were indexed and showing in Google’s image search.

  39. Hi Daniel,

    Good to meet you.

    Thanks for sharing that information about Rand’s mention of my post. I didn’t know he mentioned it. There is a lot of information to absorb about SEO, which is why I like spending a lot of time with the patents and whitepapers from the search engines, and part of the reason why I share information about those here. I consider my blog to be my workbook – a place where I can revisit my research, and share it with others.

  40. You say “We’re told by the authors of the patent that an aspect ratio may be useful in identifying banner advertisements which tend to be short and wide” does this imply that we should avoid using any image size similar to an ad, or the exact sizes? Any clarifications here?

  41. I would imagine alt text counts very little in the determination. Aren’t a lot of ad images search engines want to avoid created by savvy online marketers? This make sense to anyone else? I understand it as a determining factor in all traditional ways, but for ad distinction, it seems less manipulative areas are going to be more important.

  42. Well surely alt text I think is not relevant (very simple to make it to confuse spider) but what i don’t understand is .. if i have images (photos) with really high contrasts (and I have someone) them could be affected by this new evaluating system.
    But if on a website on 1000 images uploaded I have only 100 can all the website be affected by this problem.. or only pages where are places the 100 images?.

  43. Gustissimo, can you restate that question, slightly confusing even for me :/

    Alt text is most definetely relevant, based on page content, name of the image, and many other factors that allow analysis of the alt text.

    If you have photos of high contrast? Do you mean high resolutions (bigger images in width and height)?

  44. Hi Herman,

    I’m not suggesting or advising that you do anything to purposefully manipulate the way that you use images on your site, but instead rather sharing information from Microsoft’s patent filing on how they attempt to analyze images that they find on web pages.

    If you use images on your pages as content, you probably don’t want them to be interpreted as advertisements. And if you use images as advertisements, you probably don’t want those interpreted as images for your content. The patent describes a number of factors that explain how they might distinquish between the two, and it seems like they’ve done a good job of differentiating between ads and non-ads.

  45. Hi John,

    I’m not sure why a marketer would really want their advertising image to be seen as something other than an ad. I agreethat the search engines probably pay more attention to things other than alt text when making that decision. Features such as placement on a page and where the image might link to are probably more helpful than the alt text being used.

  46. Hi Gustissimo

    Like James, I’m not quite sure what it is that you are asking.

    This process looks at a number of different features to try to distinquish between images that are advertising and images that aren’t, so if your nonadvertising images tend to be high contrast (a feature often common in ads), but don’t meet a number of the other features such as linking to pages where people can buy products or services, chances are that high contrast alone isn’t going to get them classified as ads.

Comments are closed.