Google Mass Media Patent Applications

Sharing is caring!

Google Mass Media Patents Disclose an Interest in Entertainment

Is it time for a new way to rate television shows? Could television and radio give us a much richer and more interactive experience than it presently does?

The world of broadcast media has followed a pretty simple approach for years – broadcasters present entertainment and information, mixed with advertisements.

Google has been publishing a number of papers and filing patent applications that provide some innovations in the broadcast arena which are worth looking at closely. They allow for ways to interact with television and radio, and receive additional real-time content as well as interacting with others who are watching or listening to the same broadcast, real-time popularity ratings, and the bookmarking of mass media.

They also provide ways for people to advertise in some interesting ways. Imagine personalized information layers that correspond to shows you are watching on TV:

For example, while watching a news segment on a celebrity, a fashion layer is presented to the viewer on a television screen or a computer display device, which provides information and/or images related to the clothes and accessories the celebrity is wearing in the news segment. Additionally, personalized layers may include advertisements promoting products or services related to the news segment, such as a link to a clothing store that is selling clothes that the celebrity is wearing.

Two new Google mass media patent applications describe some of those innovations. How far off are these kinds of innovations? Frankly, I’d like to see some of them as quickly as possible, though the part of me that is concerned about privacy is concerned about how Google might collect this information by recording sounds and possibly taking pictures of audience members.

Social and Interactive Applications for Mass Media
Invented by Michele Covell, Shumeet Baluja, and Michael Fink
Assigned to Google
US Patent Application 20070130580
Published June 7, 2007
Filed: November 27, 2006


Systems, methods, apparatuses, user interfaces and computer program products provide social and interactive applications for mass media based on real-time ambient-audio and/or video identification.

Detecting Repeating Content in Broadcast Media
Invented by Michele Covell, Shumeet Baluja, and Michael Fink
Assigned to Google
Published May 31, 2007
Filed: November 27, 2006


Systems, methods, devices, and computer program products provide social and interactive applications for detecting repeating content in broadcast media. In some implementations, a method includes: generating a database of audio statistics from content; generating a query from the database of audio statistics; running the query against the database of audio statistics to determine a non-identity match; if a non-identity match exists, identifying the content corresponding to the matched query as repeating content.

Imagine being able to jump into a chat room or message board while watching a CNN broadcast, and being able to talk with others who are watching the same show. Or check into a social network where you can access a program guide for television or radio, and find ratings for those programs by people you know or folks who share similar interests with you.

A mass media personalization network from a system like this might work with:

  • Desktop or portable computers
  • Telephones with display screens
  • Televisions
  • Portable media players/recorders
  • Personal digital assistants (PDA)
  • Game consoles

Personalized information could be sent to a viewer or listener which could include:

  • advertisements
  • personalized information layers
  • popularity ratings, and
  • information associated with a commenting medium (e.g., ad hoc social peer communities, forums, discussion groups, video conferences, etc.).

Privacy Issues

The method involved in this Google Mass media system might include a microphone, which may monitor and record the ambient audio happening during the showing of a broadcast in your living room. Yes, this system would monitor what you are watching on television, or listening to on the radio. I see some possibly significant privacy issues there. The patent application seems to address those well, pointing us to a paper titled, Computer Vision for Music Identification

What happens is that snippets of sounds being recorded are transformed into images. The audio used to create those images can’t be reversed back into sounds:

The descriptors are sent to the audio database server 104. In some implementations, the descriptors are compressed statistical summaries of the ambient audio, a described in Ke et al. By sending statistical summaries, the user’s acoustic privacy is maintained because the statistical summaries are not reversible, i.e., the original audio cannot be recovered from the descriptor. Thus, any conversations by the user or other individuals monitored and recorded in the broadcast environment cannot be reproduced from the descriptor. In some implementations, the descriptors can be encrypted for extra privacy and security using one or more known encryption techniques (e.g., asymmetric or symmetric key encryption, elliptic encryption, etc.).

In some implementations, the descriptors are sent to the audio database server 104 as a query submission (also referred to as a query descriptor) in response to a trigger event detected by the monitoring process at the client-side interface 102. For example, a trigger event could be the opening theme of a television program (e.g., opening tune of “Seinfeld”) or dialogue spoken by the actors. In some implementations, the query descriptors can be sent to the audio database server 104 as part of a continuous streaming process. In some implementations, the query descriptors can be transmitted to the audio database server 104 in response to user input (e.g., via remote control, mouse clicks, etc.).

How do these audio snippet descriptors lead to personalization?

An example from the Google mass media patent application addresses that pretty well:

For example, if the user is watching the television show “Seinfeld,” then query descriptors generated from the show’s ambient audio will be matched with reference descriptors derived from previous “Seinfeld” broadcasts. Thus, the best matching candidate descriptors are used to aggregate personalized information relating to “Seinfeld” (e.g., news stories, discussion groups, links to ad hoc social peer communities or chat rooms, advertisements, etc.).

In some implementations, query descriptors from different viewers are directly matched rather than matching each query with a database of reference descriptors. Such an embodiment would enable the creation of ad hoc social peer communities on subject matter for which a database of reference descriptors is not available. Such an embodiment could match in real-time viewers who are in the same public form (e.g., stadium, bar, etc.) using portable electronic devices (e.g., mobile phones, PDAs, etc.).

Popularity Ratings

The audio snippets could be used to allow this system to create and collect a wide range of statistics, such as:

1) The average number of viewers watching the broadcast;
2) the average number of times viewers watched the broadcast;
3) other shows the viewers watched;
4) the minimum and peak number of viewers;
5) what viewers most often switched to when they left a broadcast;
6) how long viewers watch a broadcast;
7) how many times viewers flip a channel;
8) which advertisements were watched by viewers; and,
9) what viewers most often switched from when they entered a broadcast,

Different types of popularity ratings could be determined from these types of statistics.

1) Demographic group data or geographic group data.
2) Determinations of “what’s hot” that could be shown to people real-time
3) The popularity of a television broadcast versus a radio broadcast by demographics or time,
4) The popularity of times of day, i.e., peak watching/listening times,
5) The number of households in a given area,
6) The amount of channel surfing during particular shows (genre of shows, particular times of day),
7) The volume of the broadcast,
8) Etc.

Advertisers and content providers can also use popularity ratings to dynamically adjust the material shown in response to ratings. Since ads are shorter than most other content, they may be impacted more by this type of measuring since they can be easily adjusted based upon stats about viewership.

Broadcast Measurement Boxes and Image Capture Devices

In one version of implementing this system, it may use some hardware. This broadcast measurement box (BMB) may be a simple device similar to a set-top box, but without connecting to the broadcast device. It would be used to monitor ambient sounds used to create the descriptors described above.

A camera or video recorder, or some other image capture device might also be used to measure how many people are watching or listening to a broadcast. It would use a pattern-matching algorithm associated with images gathered to determine the number of viewers present in a broadcast environment during a particular broadcast. Information collected could be combined with the audio descriptors to provide more information for the system to “gather personalized information for a user, compute popularity ratings, or for any other purpose.”

Google Mass Media Patents Conclusion

The idea of Google’s new Streetview system is Google Maps is a little concerning. I did write above that part of the process described in this document could involve pictures taken of people watching TV or listening to the radio. I’m not fond of the idea of a “Living Room View” in Google. I’m assuming from what I read, that pictures wouldn’t be sent back to Google, but rather just information about numbers of people being photographed, such as numbers.

The sounds being recorded wouldn’t be transmitted to Google as sounds, but rather as images that couldn’t be transformed back into sounds. But they could be matched up with audio transformed into images (descriptors) that could be matched up with similar descriptors from other viewers, or from television shows or movies, or songs.

The recent cancellation (and resurrection) of the TV show Jericho makes me wonder if changing the way television shows are rated isn’t a bad idea. A system like this could transform what we watch on televisions and listen to on the radio, but it would require letting Google collect a lot of information about what we view and listen to. Is that a step that we are willing to take?

A couple of Google papers that provide a deeper look into Google Mass Media Coverage:

Social- and Interactive-Television Applications Based on Real-Time Ambient-Audio Identification (pdf)

Audio Fingerprinting: Combining Computer Vision & Data Stream Processing (pdf)

Added (6-8-2007 @ 5:0pm est)

Interesting post at Marketing Pilgrim by Paul Bennett on the Nielsen service – Mobile Metering To Follow? A short snippet:

Mobile Vector, the venture’s first product, will segment existing TV metering by wireless carrier using “in the home” media behavior.

Not quite sure how they will work that out. Paul notes that a full report on the subject will be out in July 2007.

Sharing is caring!

1 thought on “Google Mass Media Patent Applications”

Comments are closed.