Every business has the potential to gather an incredible amount of data, and a business like Yahoo can be an extreme case.
What kinds of things can you do with that data? Where do you even begin? What interesting patterms might you find in very large databases? How can that data be used within your business processes? What does it mean for blogs, and photosharing sites, and social search?
What does the use of that data mining mean to searchers, and to advertisers?
At the KDD 07 conference (Knowledge Discovery and Data Mining) in San Jose this past August, Usama Fayyad, Yahoo’s Chief Data Officer and Executive Vice President of Research & Strategic Data Solutions presented on a couple of topics, and was interviewed, and videos from those are now available on the Web.
The videos present a unique perspective of how Yahoo uses data mining in its everyday business (including advertising), and presents interesting looks at knowledge discovery and data mining itself, especially when it needs to scale to very large databases.
If you have time to only watch one of these videos, make it the first one, which provides some wonderful insights into how Yahoo works.
This presentation discusses a number of Yahoo case studies on how data mining might be used with advertising and behavioral targeting to increase relevance to users and value to advertisers, and discusses social media. Definitely skip past the first 8 minutes or so of introduction, and use the full screen mode (to the left of the volume control on the player.
In one case study involving Harris Direct (25:25 minutes in), people who viewed a graphical ad for Harris Direct (building brand awareness) on a Yahoo property as opposed to a control group who didn’t, up to a week later were 60% more likely to search in that category, 147% more likely to click on algorithmic results for that brand, and were 249% more likely to click through on sponsored results.
The presentation also delves into the topic of Social Media, including blogs, and sites like Flickr and YouTube. When Usama first heard about Flickr, he wondered why someone would want to share their images on the Web. The numbers show how wrong he was, with 50 million users adding images the first year that Yahoo acquired the service.
What makes Flickr so compelling? User tagging, commenting, user distributed content, viral growth, and user developed functionality with the open API and development of tools. Flickr is an entire ecosystem with 50 million users, run by 10 employees. He askes, “What makes a community grow?” and “What makes a community die?”
Search is in its early infancy, with 2.8 words in a query as the typical model we think about. When Yahoo started hiring economists, those people attended meetings and were shocked that deep business decisions were made by search engineers. In an example for Yahoo Dating, one economist introduced the concept of scarcity, with a limited number of roses that could be given by users each month, making some messages “special.” That helped the service grow, and was well liked by users.
Social search – computer vision is a very hard problem. The ESP game, from CMU, was aimed at getting people to label images for free. Many images got labeled in this game in a very short period of time. How does this work in Flickr? Tagging from the person who took the picture and put it online, and visitors who came later and added their own tags (people like to tag).
How does this type of image tagging roll over into social search? What motivations inspire people to participate (not always money)? How do you fight spam?
Yahoo Answers is people search. It went through three generations, including the use of PageRank, before the present version. Search engines can’t answer the kinds of questions that Yahoo Answers can, like “Where can I find a good plumber in Atlanta?” Yahoo Answers has approached 100 million users in a year. A mechanism of community voting and rating people helps with making these unverifiable answers worth listening to. Unlike Google’s Answering service which was a paid service, Yahoo Answers is free. Yet people take the time to answer questions that a search engine wouldn’t be able to answer.
Usama Fayyad describes the evolution of his personal approach to understanding what is important in data mining, including why he joined Yahoo, which he calls the world’s largest data source.