How might a search engine approve or reject ads automatically, without human review, on the basis that the ads are annoying or displeasing in some way?
Without considering the very large volume of ads that get presented to Google everyday, you might think that they would manually review every ad that advertisers present for publication, which would take a lot of people. While ads should attract some attention, they shouldn’t be annoying or offensive. There are a number of standards set from Google for image ads, video ads, and for text ads.
A patent application from Google goes into a good amount of depth on how it might take a programmatic approach to identifying ads, and Web pages that are “annoying.”
The patent filing describes some of the methods used when reviewing images and text and audio, with tools like Optical Character Recognition and pattern matching against large databases of images and sounds. It also details how Flash and animated images might be reviewed, but is silent on what it is looking at when it refers to things like a “Trust Score.”
Detecting and rejecting annoying documents
Invented by Deepak Jindal and Anurag Agarwal
Assigned to Google.
US Patent Application 20070133034
Published June 14, 2007
Filed December 14, 2005
Abstract
A system and method for evaluating documents for approval or rejection and/or rating. The method comprises comparing the document to one or more criteria determining whether the document contains an element that is substantially identical to one or more of a visual element, an audio element or a textual element that is determined to be displeasing.
Internet advertisements could contain offensive language or annoying actions such as flashing or strobing or be of poor image quality. Manual review of an ad is one possible way to filter such ads, but reviewing ads can require a lot of time and expense.
The sheer volume of ads may make manual review practically impossible. Processing ads using a machine-implemented process and/or without human input or intervention would be an ideal approach.
The method in this patent application would go through a document processor that would look at images, sound files, and other data to identify text, images (as well as spoken words and other data), and actions in the ad. Optical character recognition (OCR) technology would be used to review text in ads, and to rate and review them.
It is possible that this system could be used to assess other pages on the Web other than just advertisements, such as web page content.
The kinds of information that might be reviewed includes:
- Document information,
- Document performance information,
- Document characteristics rating information,
- Sensitivity rating information,
- Suitability standard information,
- Trust score information,
- Provider information,
- Link information, and;
- Other information.
Document information may include:
- The document itself,
- Any languages used in the document,
- Length information,
- Information regarding the types of files in the document (e.g., html, doc, zip, etc.),
- Type of document (advertisement, educational document),
- Summary information,
- Audio content (e.g., song lyrics),
- Visual content (e.g., pictures of faces),
- Pornographic content,
- Other offensiveness content (e.g., use of potentially offensive words),
- Programming code,
- Image quality,
- Actions associated with the document,
- Age-related content,
- The identity of the document owner and/or the document creator,
- Information about the document’s intended audience (such as geographic area, age range, gender, race, national origin, religion, other demographic information), and;
- Any other information related to a document or to the server, providers, or document sources.
A characteristics database may identify documents of a certain type, based upon certain features, such as:
- Subject matter,
- Characteristics rating,
- Aggregate characteristics rating,
- Sensitivity score,
- Characteristics type,
- Language,
- Geographic origin (e.g., country or city of origin),
- Geographic area of target audience,
- Document source,
- Owner of content,
- Creator of content,
- Target demographic,
- Actions (such as image flashing),
- Image movement,
- Hardware usable by the document (such as a mouse, game controllers, camera, or microphone),
- Whether user interaction is provided by the document (which may indicate a game),
- Whether the document’s programming involves random number generation, or;
- Other criteria.
Documents may be identified according to:
- Their offensiveness/appropriateness characteristics,
- Associated keywords,
- Associated site (e.g., a site explicitly or implicitly linked from the document),
- Status of associated site (e.g., whether a link in a document is broken and/or points to an invalid URL),
- Flesh content (e.g., state of undress of human images),
- Pornographic or other prurient content,
- Adult content,
- Drug or alcohol related content,
- Children’s content.
The documents may also contain annoying actions such as:
- Flashing,
- Strobing,
- Repetitive movement,
- Infinitely looping animation,
- Use of streaming video and/or audio,
- Open network connections,
- Involve game playing,
- Poor image quality, or;
- Other actions which the provider may wish to use for criteria for approving or rejecting a document.
The patent goes into great detail on how Flash documents might be reviewed, using text extraction, checking for disallowed actions such as streaming audio or video or opening network connections, looking for games, reviewing the use of sound and video, and policing for infinitely looping animations.
Animated GIFs would be checked for flashing, flickering and jiggling images.
Static Images may be checked for poor quality and layout problems.
Images that are intended to trick a user, by doing something like containing text boxes, drop downs, and buttons that aren’t functional, but are rather just present as images would be checked for using edge/corner detection techniques.
My gosh..frankly, I’m stunned that they could build a program like this. Technology continues to blow me away.
When you say that the ads are annoying, Bill, if Google deemed an ad to be this way…what would the outcome be? They would reject it? They wouldn’t allow it to be shown on their properties?
And, if a web page were found to be annoying, do you think that would mean a lowering in the serps, a penalty, banning?
I’m trying to get a handle on Google’s ultimate goal with this. It’s so interesting!
Kind Regards,
Miriam
Hi Miriam,
It was a surprise to see the processes described in this patent application, but it makes sense for them to try to automate this process as much as possible.
If the ads were annoying, chances are that they result would be for them to not show them. For an annoying page on the web, I would suspect that they might filter those out (I am extremely interested in what they might actually do with those – but we aren’t given a lot of clues in that area).
The easiest of the “annoying” documents to point out are the kinds of ads that have buttons or text boxes on them that look like you could interact with them, but are in actuality an image link to another page – tricking people into clicking upon them.
Likewise, ads that are too bright, too animated, and take too much attention away from the pages they are placed upon may create a negative impression of the ads for both consumers and publishers who display image ads on their sites.
Myspace would be an ideal testing ground, wouldn’t it.
There’s a rich diversity of images and information on those pages.
Well at least myspace gives them a perfect testing ground for the new technologies as they are developed.
Thanks, Bill.
Ultimately, I think this will be a good thing. I really dislike obnoxious advertising…so does everyone. It’s nice to think that it could be reduced through an effort like this.
I really enjoyed this post!
Miriam
Thanks, Miriam,
unfortunately google doesn’t seem to have found a way of stopping scam websites from advertising but i hope they do soon.
Hi Craig
I’m not sure how easy it is for us to tell how effective Google has been on that point because we don’t have access to the advertisements that they might see everyday. Hopefully they are successfully filtering a lot of those sites out.
At least google are considering doing something to stop companies from just willy nilly placing an ad that flashes and gets in peoples way. I don’t know if google will ever be able to stop all bad ads but at least they are trying.
Hi Craig,
Having to manually check each ad is an impossible endeavor that just wouldn’t scale well at all, so the idea that Google has come up with an algorithmic approach to filter ads isn’t surprising. What’s described in the patent does seem like a pretty thoughtful and intelligent approach.