How does a search engine choose whether to show news items in web search results and when not to?
If you live in Bealton, Virginia, chances are that you may not be too interested in news of a car crash in Brooklyn, New York, when searching for information about Brooklyn. If you’re from Brooklyn, and want to find vacation information about the parks in Wisconsin, you may not be very concerned about the latest winning numbers in the Wisconsin lottery. Yet, someone searching for information about one of the states bordering the Gulf of Mexico these days might be likely to want to see news about the Oil spill in the region.
A Yahoo patent filing published recently describes how they might use a prediction system based upon the search engine’s query logs to decide whether or not to show news results. The prediction system uses a mixture of geographic information related to queries and to searchers as well as information about how “newsworthy” a location might be to make that determination. The patent tells us that it might create similar prediction models to determine whether or not to show other types of results as well. The patent application is:
System and Method of Geo-Based Prediction in Search Result Selection
Invented by Rosie Jones, Fernando Diaz, and Ahmed Hassan Awadallah
US Patent Application 20100161591
Published June 24, 2010
Filed December 22, 2008
A system and method is disclosed for determining a prediction measurement, or measure, using geo-spatial information which can be used to determine whether or not to include type of information in search results.
The prediction measurement comprises a measure of the likelihood that an item of the type of information for which the prediction measure is determined will be selected, or clicked on, by a user, if the item of the type of information is included in the search result. Without limitation, one such information type is news.
In the patent filing’s description, we are shown a prediction system might analyze queries to predict whether someone might be interested in news items showing up in their search results. It would look at a query submitted by a searcher as well as historic data, such as previous queries, to decide what kinds of results to show.
That historic data is taken from query logs from a certain amount of time, such as weeks. It could contain a few million queries submitted over that time, and include information associated with each query string, such as:
- The search terms used
- The search results shown to the searcher
- Details about the types of pages included in those results
- Whether news results were shown, and if shown, which ones were clicked upon, if any
- IP address or other information indicating locations of searchers
- Query and click or selection information
- Population information for the regions the searchers were from
- Population density information for the regions identified in queries
- Geographic information extracted from the query, such as a place name
- Geographic distance from the searcher and location indicated in the query
The geographic location of a query might be identified using processes like those described in the Yahoo patent Geographical Location Extraction, which looks at things like place names in a query, and scores them on a probability that they might be places included in a geographic places names database.
The authors of the patent filing tell us that testing has shown that there’s a correlation between “query location confidence” and the probability of a click on a news result. For instance, we are told that queries which contain a place name can be up to twice as likely to receive a news click as queries which do not.
Searchers supposedly also tend to use country and state names more often when they are looking for news, and they use town names more often when they are looking for non-news results like services and businesses.
A place name may also be assigned a click probability, which is a measure of the location’s “newsworthiness.”
The newsworthiness click probability of a location can be influenced by the amount of newsworthy events that have happened at the location. The patent provides some examples. For instance, a query that includes “kosovo” or “pakistan” is more likely to lead to a click on a news result than query which includes place names such as “cedar point” or “utah”.
The population density of the location of a searcher, taken from a source such as the United States Census Bureau population data, may also be used to predict whether or not that searcher might click upon a news results. We’re told that searchers from areas with high population density are more interested in news and are 20% more likely to click on news results than searchers from lower population density areas.
The distance between a searcher’s geographic location and a location indicated in a query can be used in making a prediction as to whether or not a searcher may be more likely to click upon a news result. For instance, some kinds of news may appeal to an audience on a national or regional level, such as news of large natural disasters. It may be much less likely that a searcher from a good distance away would be interested in news that might be considered more local, such as a state lottery result or a car crash.
In many of the patents and papers from the search engines on Universal Search or Blended search, where news items might be inserted into web search results, we’ve been told that the decision to include those kinds of results is based upon relevancy factors.
Here, we’re shown that a search engine might look at other information to make more informed decisions about whether or not to show news results, such as only showing news of local importance to people who are “local.”
It’s possible that similar prediction models might be used by Google and Bing as well, and for information other than whether or not to include news results. For instance, someone searching in Google for a local business might be shown a map for that business in their search results, while people further away might not be shown that map.
Another example. When I search for “New York” or “New Orleans” in Google, I’m shown news results. When I search for “Warrenton, Virginia,” my search results don’t contain any news items. Is that because Google predicts that I’m more likely to be interested in news from areas that have higher population densities? Or because not many “newsworthy” events have taken place in Warrenton? Both are possibilities.
10 thoughts on “How Search Engines May Use Geography and Population Info in Deciding to Show News in Web Searches”
“It would look at a query submitted by a searcher as well as historic data, such as previous queries, to decide what kinds of results to show.That historic data is taken from query logs from a certain amount of time, such as weeks.”
Feels like you’re being stalked but if it’s in the name of better search engine results then I guess it’s ok. Sometimes you really need to know the user better to be able to give a better service.
Thanks bill, it’s may just a prediction so it may not totally true for all because some people who is from other place may be interested in news result. Search engine just considering the nature of people on the basis of his location. However what they do to increase their search engine will help us.
Now that is interesting……I have always enjoyed watching how search continues to evolve, and I think that local, and hyper local is an obvious trend.
I agree with you. Having some information is better than having none at all, regardless of whether it’s individual or aggregated from a number of searchers, or you’re going to offer some level of personalization, or try to provide something like news results that might actually be of interest to someone who sees them.
On the Google Privacy Center page, we’re told that Google aims at five privacy principles in the way that they handle user data:
I think those are pretty reasonable objectives, and hopefully Google is attempting to follow them as closely as they can.
You’re welcome. The locations of people searching and locations associated with the information being searched for are definitely one part of this process, but they aren’t the only part.
Another aspect of the process can involve how newsworthy a topic might be, and I think that may overcome some of the filtering that location information might provide. For example, it’s unlikely that you would want to see news about a town council vote in my community rezoning part of a farm from rural to commercial so that they can include a farm store on their property. But, if Barack Obama visited my town and gave a speech about healthcare on the courthouse steps in front of a large crowd, that would be much more newsworthy and news results might be shown to you if you searched for my town’s name.
I really enjoy watching that evolution as well. The process in this patent filing definitely has some implications for local and hyperlocal type results. I’ve been doing a number of searches to see when I see news results within my web results and when I don’t
Thank you for the awesome article! Even when I’m searching the web for things other than Internet marketing, it’s so nice to be able to have local results–if I want to go to dinner in Washington DC, why would I care about restaurants in North Carolina? It’s great to see that SEO bloggers are recognizing the changes within the search engines and writing about it!
Changes in searches and search engines are a constant. If they didn’t happen, I wouldn’t have much to write about. 🙂
Excellent Article, Added to favs, keep up the good work.
Thank you very much, Jason.
Comments are closed.