My last post was on examples from Google’s recent audio patent applications. Today, I’m going to take a closer look at the processes described in the patent filings. The two documents are:
- Using speech recognition to determine advertisements relevant to audio content and/or audio content relevant to advertisements
- Advertising with audio content
Assumptions behind Google’s advertising
Interactive advertising lets advertisers target ads to a receptive audience, and the relevancy of targeted ads makes it more likely that the ads will be useful to end users. Query-based keyword targeting has been effective in delivering relevant ads, and AdSense has been an effective way of serving ads that are relevant to the content found on Web pages.
In other words, serving ads relevant to concepts on Web pages and to keywords in search queries is useful because the ads often match current user interests.
Present day audio advertising:
Ads for audio content, such as podcasts or Internet radio stations, often use a “reservation” model, where advertisers reserve spots in audio streams for fixed fees. It’s possible that this reservation model may not maximize revenue for audio publishers because many advertisers don’t have resources to negotiate agreements for ad spots and don’t compete for them.
Google wants to make audio advertising more targeted and relevant. A good part of the patents involve the advertising environment and how relevancy might be determined.
Audio advertising might be placed within:
- Radio programs,
- Live or recorded musical works with lyrics,
- Live or recorded dramatic works with dialog or a monolog,
- Live or recorded talk shows,
- Voice mail,
- Segments of an audio conversation,
The kinds of devices these ads might play on include:
- Desktop and laptop computers,
- Radios and car radios,
- Mobile telephones,
- Audio players,
They may be transmitted from:
- Terrestrial radio (or television, or telephony, or data) stations,
- Cable television (or radio, or telephony, or data) stations,
- Satellite radio (or television, or telephony, or data) stations,
- Audio content servers (e.g., Webcasting servers, podcasting servers, audio streaming servers, audio download Websites, etc.),
- The Internet,
- Telephone service providers via networks such as the Public Switched Telephone Network (“PSTN”) and the Internet.
Relevancy in Audio Advertising
Some of the ways that relevant information might be extracted or determined from audio documents:
- Analyzing audio content to derive textual information,
- Analyzing textual information for relevancy,
- Textual information may be derived through speech recognition,
- Converting audio to text through automatic speech recognition techniques, such as described in Kai-Fu Lee’s Automatic Speech Recognition–The Development of the SPHINX System
- Once a rough transcription is available, relevance information may be derived from the transcription and used to select relevant ads.
Relevancy information may include:
- One or more of term vectors,
- Weighted term vectors,
- Weighted clusters,
- Categories (e.g., vertical categories),
- Weighted categories,
The clusters may be probabilistic hierarchical inferential learner (referred to as “PHIL”) clusters. Those are described in a couple of unpublished patent applications:
“Methods and Apparatus for Probabilistic Hierarchical Inferential Learner”
“Methods and Apparatus for Characterizing Documents Based on Cluster Related Words”
An audio publisher may annotate audio documents with textual information or encoded textual information in the audio content (e.g., in packets, portions of packets, portions of streams, headers, footers, etc.).
A radio broadcaster may provide in their broadcast, a station identifier, a song identifier, an artist identifier, an album identifier, a program identifier, location information, etc.
In this case, genre and location information might be taken from the audio broadcast and used to target relevant ads.
Compact disks may encode information about an album, an artist, a list of songs, etc. Genre information may be taken from the artist, album and/or songs, and lookup the lyrics of the songs.
A voice message could have an associated IP address, or a telephone conversation may have an area code, from which location information can be taken.
A program may be annotated with keywords, topics, etc. Such relevance information may be used to target relevant ads.
Other relevancy information:
Audio information may be analyzed to generate other types of relevancy information, such as:
- Gender (e.g., due to pitch, tone, etc,),
- nationality, and/or;
- ethnicity (e.g., due to language, accent, etc.) of a speaker in voice audio content (e.g., a participant in a conversation) may be determined from audio analysis.
These documents are also referenced on relevancy for audio:
1) M. A. Siegler, U. Jain, B. Raj, and R. M. Stern, “Automatic Segmentation, Classification and Clustering of Broadcast News Audio,” (pdf) Proceedings of the Ninth Spoken Language Systems Technology Workshop, Harriman, N.Y., 1996;
2) Greg Sanders, “Metadata Extraction for EARS,” (link no longer available) Rich Transcription Workshop, Vienna, Va., 2002