Yahoo on How Internet Activity Can Predict Event Outcomes

Two men outside a fortune teller's shop, with a sign advertising a free reading with every article sold.

Imagine that you run a search engine, and you find a way to predict the outcomes of certain events fairly closely based upon internet activity such as browsing and search histories, page clicks in search results, actions taken on social networking applications, and so on. The events might involve things such as winners of American Idol, political election outcomes, weekend movie revenues, or music album sales, attendance for sporting events, or television ratings for different shows.

What would you do with that power?

A Yahoo patent application granted today explores how the search engine might use data about how people act on the Web to predict that kind of information.

How successful might Yahoo be at making such a prediction?

Keep in mind that Yahoo originally started out with the name “Jerry and David’s Guide to the World Wide Web,” and switched over to the name Yahoo, in part because to them it was an acronym for “Yet Another Hierarchical Officious Oracle.” As an “oracle,” a search engine should be able to make predictions.

The patent describes, for example, how they might take information about people’s activities on the Web, such as the search term popularity behind the Yahoo Buzz Index, and create a system that could predict the financial success of movies. From the patent filing:

For example, consider the movie Wall-e.TM. which was released the weekend of Jun. 27, 2008. The Buzz score for Wall-e.TM. for the 7 day period ending in the opening weekend was 45.92. According to the prediction algorithm 140 above, prediction application 104 would predict opening weekend box office revenues of approximately $61.91 Million.

It may be noted that the actual box office collection during that weekend for Wall-e was $63 Million dollars, thus indicating the accuracy of the methodology.

The patent filing is:

Predicting the Outcome of Events Based on Related Internet Activity
Invented by Anurag Kumar, Supereeth Hosur Nagesh Rao, and N. S. Sekar
Assigned to Yahoo! Inc.
US Patent Application 20100205131
Published August 12, 2010
Filed: February 9, 2009

Abstract

Particular embodiments of the present invention are directed to systems and methods for predicting the outcome of events based on internet activity associated with such events. For example, an internet activity metric associated with one or more events may be determined. The internet activity metric may be based at least on the popularity of particular search terms related to the events. The outcomes of the events may also be determined.

The determined internet activity metrics and the outcomes of the one or more events may be analyzed to generate an algorithm for predicting the outcomes of subsequent events. For example, an internet activity metric associated with a particular subsequent event (e.g., based at least on the popularity of a particular search term related to the subsequent event) may be determined, and the algorithm may be applied to the internet activity metric to predict the outcome of the particular subsequent event.

Conclusion

The patent application provides a fair amount of detail behind the prediction algorithms that could be used, as well as some other examples, but I’m not sure if this is the most impressive Oracle-type patent in Yahoo’s IP closet.

Back in 2008, I wrote about Techniques for Searching Future Events, in the post The Oracle at Yahoo: Using Yahoo News to Search the Future.

Share

19 thoughts on “Yahoo on How Internet Activity Can Predict Event Outcomes”

  1. With respect to short term events, I believe that this prediction system would be fairly accurate. It is definitely possible. I read somewhere (unfortunately I don’t remember where) that Google was doing the same thing with Twitter data real-time. Looks like you can now get the results before they are released based on the buzz. Thanks for sharing, Bill.

  2. This is too good to be true. But the $61.91 Million revenue prediction is really close to what was achieved, $63 Million. So, I guess event outcome prediction really is possible as well as believable.

  3. Great post! I love all the technical points you bring out in your blog. Speaking of predicting the future, after reading your post I wrote a blog post based on this Yahoo patent that you might find interesting. I was thinking about what this could mean for internet marketing – specifically, if this technology that Yahoo now has becomes mainstream, wouldn’t internet marketers try to alter the predictions of events, and therefore possibly alter the actual outcome of these events? It’s a very interesting idea. Thanks again!

  4. Hi Mark and Andrew,

    It’s hard to tell how accurate this system might be, but I would love to see more predictions/results than just the one they included on revenues from Wall-E. Hopefully someone at Yahoo will publish a paper showing more, to give us an idea of how accurate this system might be.

    I hadn’t seen anything about Google trying to use Twitter data to make similar predictions – will have to look for that. Thanks.

  5. Hi Michael,

    Thanks. Interesting post.

    I think to a degree, internet marketers do often try to use search volume data to try to make predictions, by looking at things like search volumes for keywords when doing keyword research.

    Yahoo tells us in the patent filing that a system like theirs might be useful, for example, to theater owners to help them decide which movies to show, and how much room they should allocate to specific movies in their theaters, and how many showings they should have.

    Marketing does often involve creating buzz about new topics, and changing public discussion, and altering the outcomes of future events as well. A tool like this might be helpful in making decisions about directions to follow.

  6. New marketing idea for the future. Psychic computer hotlines…er…websites. Connect, pay a monthly or one time fee and get your psychic reading from the worlds best psychic computer! Thoughts?

  7. The ability to reliably predict financial gains and losses? Oh I can’t imagine anything bad coming from that.
    Like, say people start relying on Yahoo’s financial prediction algorithm, and then it predicts a catastrophe so everyone bails out – entire markets collapse and it becomes a self-fulfilling prophecy.

    Between this story and this one about the CIA and Google working together on ‘recorded future’, I’m thinking about getting some tinfoil and folding it into a hat.

    As a marketer, though, I’d love to see the data.

  8. Very interesting. I just finished reading The Predictioneer’s Game by Bruce Bueno De Mesquita. Would be fascinating to get his take on the concept. Search engines are not public services, at least not the Big 3, but perhaps they could take a hint from a relevant Prime Directive.

  9. I wish I can find out when I will win at the lottery! Kidding, since I’m not playing the answer is pretty much predictable.

    The art of prediction, or more science of prediction since is based on facts I guess is used on a large scale especially by Google and Yahoo search. Google accounted for 72.17 percent of all U.S. searches conducted in the four weeks ending May 29, 2010. Only thinking that almost 3 out of 4 Americans are using Google search on a daily basis is giving plenty of info about the market, info which can be used by these two big corporations.

    And of course is coming then Twitter and other social media websites with their buzz which are giving also a lot of info about users viewed of course as consumers.

  10. I see the point of the patent, but only 12% of Internet surfers use Yahoo!. And they don’t have access to Google’s online activity databases. So I wonder how accurate their predictions would be using such a small portion of the population. Like to see more predictions. lol…

  11. Hi Jason,

    It does sound a little silly that way, but I do think that it might be possible to come up with some interesting business decision making tools based upon information about internet activity.

  12. Hi Pavlicko,

    Does decision making become more risky or less risky when we start making decisions based upon algorithms were we don’t have much information about how the results of those algorithms are attained? I would say that there’s a considerable amount of risk in that kind of situation, and it might lead to self-fulfilling prophecies.

    Having said that, I’d love to see the data as well.

  13. Hi SEO Bever,

    Very good points. Prediction does seem to be at the core of what search engines provide. The concept of PageRank itself is a prediction of the likelihood that you will arrive at a certain page from anywhere else on the Web. Probabilities, and confidence scores and statistical language models are at the heart of many of the functions of a search engine.

    Regardless of whether or not a search engine offers an explicit type of prediction engine, the information that they do provide in response to our many queries can and often does influence our actions and the decisions that we make.

  14. Hi Ali,

    If Yahoo is providing answers to more than 3 billion queries a month, which is one claim that I’ve seen from a Yahoo search scientist who was working on snippets generation for Yahoo (and who is now with Bing), I would say that’s a significant amount of information to try to analyze and understand. I’m not sure that they could be faulted on having a small sample size. :)

  15. Hi Peter,

    You’re welcome. Things are starting to get interesting at Yahoo again these days, with a new CEO, and a number of new board members.

  16. Two points on this topic:

    1) As a marketer and amateur sociologist, I find the distribution of “raw search text” a fascinating indicator of how people actually “think” about a subject. Eg. given a choice of 5 – 10 logical words to describe a topic, which two prevail and get 80% of the impression opportunities. And what is the third and forth word in the query string. Google is a (manipulated) version of a global subconscious.

    Of course, you can have a little fun with this. For example, I ran across a SEO firm that ranked their client #1 in the “polite euphemism” the client used to describe their business (which wasn’t a something everyone wants on their business card) and few others did. They declared victory and moved on. The client now had the #1 slot in a 2,000 search/month keyword. And was effectively unlisted for half a dozen larger keywords (representing 500,000 searches/month) the CUSTOMERS actually used to describe the service, but that the firm was effectively “blinded” from ranking for by their own egos. Thanks to Google Adwords and Alexa, it was easy to quantify the gap :)

    Second – search is great, but the firehose I’d really like to use to run an event driven hedgefund is Linkedin. Imagine a feed of how often profiles were updated by company (particularly if you could detect the level of stealth involved – what % of people turned off the auto-publish feature)….

    I suspect a properly modeled view of that would be a few weeks ahead of the typical newspaper…

  17. Hi John,

    I have a friend who likes to go through the web sites of big companies and look for exectives’ names and look up their listings on LinkedIn, to see if they do have listings, to see if they are still working for the company, to get an idea of what they do, and so on. Funny thing is, a lot of times business websites continue to list employees who have left weeks, and even months after they’ve left, as indicated by LinkedIn. I guess LinkedIn is perceived by people as being useful enough as a networking tool and a way to possibly get new jobs that people will update their profiles there a lot quicker than their former employers do.

    It is interesting to think about how popular search terms related to specific events might tell you something about those events, but I agree with you that LinkedIn seems to be an interesting place to gather data as well.

    Vanity keywords can be extremely harmful to businesses> It eludes me sometimes why someone might want to rank for a term that no one will search for, to their own detriment.

Comments are closed.