Imagine exploring millions and millions of news pages and other documents to find information about events that are scheduled to happen in the future, to help predict the future.
This kind of future search, or future retrieval, might be able to support the making of decisions in many different fields.
News information could be used to obtain information about possible future events, and that information could be made searchable, so that it can help people plan for the future.
The Yahoo patent application is:
Techniques for Searching Future Events
Invented by Ricardo Alberto Baeza-Yates
Assigned to Yahoo
US Patent Application 20080040321
Published February 14, 2008
Filed August 11, 2006
Under this process, time would become a standard part of information collected about documents. A ranking model would be built based on time segments.
Much of the news does contain information about future events. The authors of the patent application tell us:
An exemplary sample from a web-based news service on Dec. 1st, 2003, included more than one-hundred thousand references to years 2004 and beyond. About 80% of the references related to the immediate future (e.g., within days, weeks, or a few months) and, on average, more than one future reference was included per article.
We estimated that there were at least half a million references to future events in the sample. Assuming that there is a ten-fold repetition redundancy (i.e., similar articles in different newspapers), this yielded an estimate of about fifty thousand unique articles about the future. A similar analysis only on headlines gave around 10% of that number.
They also looked closely at future event information for a date in 2005:
In a sample taken from the same news service on Jul. 15th, 2005, the number of references to years 2006 or later was over 250 thousand. For example, for the year 2034, news items relating to the following topics were included in a sample of almost 100 news items:
(1) The license of nuclear electric plants in Arkansas and Michigan will end;
(2) The ownership of Dolphin Square in London must revert to an insurance company;
(3) Voyager 2 should run out of fuel;
(4) Long-term care facilities may have to house 2.1 million people in the USA; and
(5) A human base in the moon would be in operation.
So, when searching for “energy” or “health” in the future, a future retrieval system should return, for example, items 1 and 4, preferably classified by year. On the other hand, when searching for “2034″ and “space,” the system should return items 3 and 5.
This kind of future search would include an information extraction system that would recognize expressions about time, dates, and durations, and the probabilities that certain events will happen.
It would also include an information retrieval system, so that people can search using text queries, and possibly specify time segments during their searches. So, if you search for the year 2034, you might find the most important topics or likely events or both, associated with that year.
In addition to providing information about possible futures that might be used to help support decision making in many fields, the same system could be turned backwards to look at past events, and perhaps understand them better.
Some related publications from the inventor listed on the patent application, Ricardo Alberto Baeza-Yates: