If you search at Yahoo for the phrase “world cup” (without the quotation marks), chances are good that the search engine will show you mostly pages about the 2010 World Cup, even though the tournament is held every 4 years and there may be many pages relevant for the phrase that don’t focus specifically upon a particular year.
How likely is it that when someone searches for “world cup,” they are looking for information about the upcoming tournament, taking place in South Africa between June 11th, and July 11th, 2010? On the other hand, how likely might it be that they want to find information about the world cup held in 2006? Or just general pages about the sporting event?
If I told you that the search engine was likely reordering those search results based upon time-based data, would it surprise you? Would you expect a Yahoo or Google or Bing to focus upon rerank search results in a manner like this, when they have some temporal aspect to them, such as a search for the Olympics, or the World Series, or the World Cup?
It’s possible a search engine looking through its query logs, and seeing if a particular query is included in more specific searches that include some kind of temporal data such as a year, or month, or day or time of day, and rewrite a searcher’s query to include that time-based information. A recent Yahoo patent application explains one fairly simple approach towards showing such information. The patent application is:
Identifying and Expanding implicitly Temporally Qualified Queries
Invented by Rosie Jones, Donald Metzler, and Fuchun Peng
Assigned to Yahoo
US Patent Application 20100131538
Published May 27, 2010
Filed: November 24, 2008
Methods and apparatus are described for identifying implicitly temporally qualified queries, i.e., queries for which a period is implied but not explicitly stated, and for expanding such queries to include one or more temporal references.
I’ve written in the past about how Yahoo might look at some particular queries where a searcher doesn’t include geographical terms, but where the best search results might involve the search engine inferring a geographical location for search results in How Search Engines Might Divine the Intent behind Regional Queries vs. Global Queries. I’ve also written about a previous Yahoo patent that describes how temporal data like this might be used similarly, in How Search Engines May Try to Match Searchers’ Intents from Analysis of Search Engine Query Logs
This patent application provides more explicit details on how the search engine might analyze or update information about searcher’s intents taken from query logs to give more weight in search results to some search results that include time-based information.
For instance, a counting based approach could be used looking at all queries:
1) A query search engine query log may be analyzed to count the number of queries that explicitly include a specific year, for instance by using that year as a prefix or suffix in their search (“2004 Olympics, Olympics 2008, etc.).
2) The number of times the term is used in a query is also counted
3) A ratio of specific year/query combinations is calculated for all of the times the broader query was used. such as (olympics+2004)/olympics, or (olympics+2008)/oylympics.
4) If the ratio is over a certain threshold, then all queries that contain the query term will be considered to be ‘implicitly year qualified.”
This kind of analysis of query logs would likely be conducted offline rather than every time someone searches for a term like “world cup” or “Olympics,” and cached log file information may be updated upon an ongoing basis as the search behavior data changes.
In a search for “world cup” on Yahoo, the search engine provides several “try also” search suggestions at the top of the search result that includes “world cup 2010″ and world cup 2006.” The patent filing mentions that the search engine could provide that type of query suggestion or even a timeline that searchers could use to select the year most relevant to their search.
We’re told that the search engine could also note at the top of the search results that the results have been modified to include the most relevant year, and offer a way that a searcher could reject that modification.
An alternative approach to calculating a ratio of queries that might include a specific year would be for the search engine to look at individual query sessions from searchers, and see how many times a searcher reformulates his or her query to include a year.
So, if someone was interested in learning who won American Idol in a specific year, and their first search was “American Idol”, and their followup search included a specific year such as “American idol 2006”, that query data information from specific queries could be used instead of a count of queries like mentioned above.
While I only see one search result for “American Idol” that includes a year in the Yahoo search results for that query term, the “predictive” search results that appear under the Yahoo search box offer me the following suggestions, amongst others: “American Idol 2010”, “American Idol 2009″, and American Idol 2008.” It looks like time-based information might be offered in those predictive searches even when this kind of temporal data isn’t necessarily used to rerank search results.
The patent application tells us that it might look at temporal data other than years,such as:
- Times of day,
- Individual dates,
- Days of the week,
- Specific weeks,
- Specific months,
- Specific decades,
- Specific centuries,
- Specific millennia,
Search engines are attempting to look at more than the keywords that a searcher uses when they type a term into a search box. In their efforts to get at the intent behind a search, they may look through their query log files to see what other people have searched for about the query terms used.
That query log information could evidence a location-based intent in some instances, or time-based intent, or some other kind of intent completely.
What other kinds of signals in query logs besides location and time might the search engines decide to use to rerank the search results you see?
This kind of reranking of search results shouldn’t come as a surprise to people paying close attention to what search engines are doing these days.
The results of a search for “World Cup” seem like a pretty good example of search results for the tournament being reranked to focus upon 2010. I’m not sure that I’ve seen results reranked based upon periods other than years, but I’ll be looking. If you come up with any examples of those and are willing to share, let us know in the comments below. Thanks.
39 thoughts on “How a Search Engine Might Rerank Search Results Based upon Time-Based Data in Query Logs”
I’m not sure this is particularly useful to the average searcher. I would think that a query that’s not very specific (in terms of date or some other factor) deserves a result that’s not very specific either. If I search for [world cup] I’m looking for information about the tournament, and not necessarily this year’s tournament. But maybe that’s just how I search.
I can see how it would be useful to put this data to use in suggestions for other searches, however. I noticed about a year ago that Google was using very short-term trends in searches in determining what to suggest: apparently, if a number of people searched on numerous clues in a particular crossword puzzle, then when one searched on any of those clues the other searches would come up as suggestions.
Hey Bill, first time here 🙂 I recently started a full-time internet marketing consultant job, and the guys at work have referenced your site several times! So I guess I’ll be a regular from now on. And as for the Rerank Based on timebased data, I think it is very interesting, and you offered some great points, will show the guys at work.
This reminds me a bit of QDF, especially for huge volume queries and spikes at certain periods. Engines tend to favor relevant fresh content to an upcoming trend. I would also imagine a category-based results basing on your query. For ex. “World Cup” will include results like:
1. Upcoming Africa 2010 World Cup
2. History of the World Cup
3. Previous World Cup (say 2009, 2008, etc.)
4. World Cup academic facts
At least we can rest in the knowledge that Google is doing everything they can to bring the best possible search results to people.
too bad Yahoo! never used it.. lol
Bill- I thought you couldn’t get a patent for an idea? Isn’t that basically what most software is classified as?
I guess I was misinformed. I’m surprised to see that Yahoo! grabbed this up before Google.
This is a pretty cool blog. I did backlink checks from people to find it. You have a lot of interesting ideas and I like how you try to figure out what the search engines are doing/thinking. I normally do this myself and like your point of views.
Great article as usual, you might look at those searches in google as good example, if you search for “SF giants” you’ll get the team previous and next match.
Likewise, my intuition tells me that if I search for “attorney,” I’m likely looking for pages for an actual attorney, and if I’m searching for “attorneys,” I likely want a directory type page that I can search through to find a choice of attorneys.
I’m not always sure that we can trust our intuitions however. If yahoo is tracking query sessions, and they notice that most people searching for “world cup” look at the search results, and then type in “world cup 2010,” they might find themselves justified in making the assumption that they seem to be making by giving more weight to “world cup” results that include “2010.”
Interesting example with query suggestions and crossword puzzle clues. I need to experiment with that to see if it’s still something going on.
There’s likely some element of QDF or burstiness that might be found in this approach. Chances are that the search engine would limit how far back they might look in query logs to make sure that they are getting the lastest user-behavior information when it comes to the queries they review.
Category based results would be an interesting approach as well. Query log file information wouldn’t be a bad thing to look at to come up with a way of presenting search results based upon categories.
It’s likely that Google is looking at user-behavior data as well, but this particular patent filing is from Yahoo.
It’s hard to tell exactly which approach Yahoo has taken to incorporate query log file data into reranking search results. We don’t know for certain whether or not they are using the method described in this particular patent, or some other method, but we do know for certain that they are using query data to assign locations to search queries that don’t specifically state a geographical location.
Thank you. If nothing else, looking at patents and papers from the search engines raise a lot of questions. I think that’s a good thing.
You can’t get a patent for an idea. You need to present that idea as an actual process of some type that should be new, nonobvious, and useful. There are some other threshold tests as well that need to be passed by the inventors listed on a patent as to whether their invention is granted. Note that this particular patent is still pending, and it’s possible that it may not get granted.
But, frankly, the purpose behind my writing blog posts about patent filings from the search engines isn’t to argue whether or not they should be allowed, but rather to get an idea of what the search engines are doing, what assumptions they might be making about searchers and the Web, and what areas they may be conducting research in.
I guess the question for me is what should determine if the query deserves freshness. If we go back to the original example of the World Cup, I would think that once the tournament starts up, a search for [world cup] ought to return the latest match results. But what if I ran the search back in March? The patent would suggest that, if enough of the people running the search (or similar searches) ended up clicking through to a page about this year’s tournament, the results should be biased toward such information, but just because that’s what more people are selecting, it doesn’t mean that that’s what I’m looking for.
Very interesting article. It certainly makes sense that Google would be using other users search queries for relevancy I guess in a way thats what Google Insights gives you. Interestingly, everyone here assumes that if you do a search for ‘world cup’ you are automatically talking about the football world cup (or soccer if you are in US!) but there are many other types of world cups including the recent cricket 20/20 world cup.
I did a search on Google Insights for ‘world cup’ and the data it returned was interesting. In summary from an SEO perspective knowing how and why Google ranks a search for ‘world cup’ in terms of search demand is important if you want to target keywords and traffic.
As Dave mentions: “QDF”. I agree with this methodology as it assists the searcher finding topical; up to date information. If the searcher wants something else than what pops up in the SERP’s they can use “suggested” or re-enter a more specific long tail search. Either way it is educating the searcher to be as precise as they can to what they want. All in all; the user becomes more educated in terms of search and how to do it.
Interesting piece, Bill. I enjoy reading your attempts to analyze what the search engines are trying to do, and I think you are usually spot on. I agree with Dave and Lee that QDF is probably a sub that’s getting triggered somehow. I can’t imagine how they can possibly do it, given the horrific volume of queries they’re handling, but then, much of what they’re doing is over my head. Your posts, however, help simplify the process for me. Keep ’em coming!
I would prefer that the SE’s would stick to the principle of showing result pages based on relevancy solely related to the used search term.
The fact that many searches were looking for “world cup 2010” doesn’t neccessarily mean that I am interested in this years world cup. Maybe I am just interested in some general information about the world cup, and maybe there is a better general description on an older page.
Result is that I am presented with content of lower quality – this is not in my interest, and certainly not in the interest of the SE’s!
I have read in other blogs that many bloggers are actually removing their post dates using various techniques so as to negate this time-based effect. Should I be doing this?
I’m not sure that I understand what you mean by “team previous and next match” in reference to a search for SF Giants.
There are some similarities between this approach and what was described in a Google patent from 2007, which I wrote about in Google on Generating Statistics from Search Engine Query Logs (Hot Trends and More).
If the search engine is looking at how often people are making certain queries, and identify query terms that all suddenly appearing more frequently, then search results might be reranked to show fresher content in search results for those queries. There’s something like that in the Yahoo Patent, though not much discussion in it about tracking the frequency of queries over time to see if there is a burst of interest in a specific topic.
I’m not sure that either approach really helps educate searchers on how to become better searchers, but they could be helpful to someone searching who wants to find information about a certain topic, but doesn’t know which query terms might get them the best results.
In some ways, the approach in this patent does seem like it’s Yahoo’s approach to meeting Google’s “query deserves freshness” algorithm.
I think we are going to see more attempts from all of the major search engines to try to understand the intent behind the queries that we use when we search, and some of those are going to be more helpful than others. I agree – what should determine if the query deserves freshness?
In the example that you provide, about a search for World Cup in March, and a search for World Cup in a couple of weeks after the tournament has started, I would hope that more weight would be given towards 2010 results for the tournament in a couple of weeks, based upon a greater amount of queries and refinements during query sessions that indicate an increased interest over those searches performed in March.
Very good points. I purposefully chose “world cup” over “world series” as an example because there are so many sporting tournaments that are referring to themselves as a “world series.” I know that there are some other “world cups,” but I decided to use the football world cup because it’s only a couple of weeks away.
It is also possible that the search engine may look at other signals as well outside of this one as well. If you are doing a personalized search and your web history shows a very well defined interest in cricket, which would also possibly cause your search results to be given different weights, how would Google resolve the two different algorithms? Would it attempt to show you more cricket results as well as boosting fresher results for the 2010 football/soccer tournament?
I agree with you about understanding how search demand might influence keyword targeting and traffic. That’s part of the reason why I thought this particular patent filing was interesting enough to write about.
Thank you for your kind words.
We know that Google is doing something with the frequency of queries, by looking at their search logs and using statistics to determine how often those queries are used and if there is a burst of interest in a specific topic. If the patent filing I wrote about in my link from 4 comments up is one of the ones behind QDF (a decent possibility), then it shows how Google might address situations like this where it might make sense to boost pages that include the year 2010 in queries for world cup.
As I noted in the comments above, this is Yahoo’s patent filing, and it’s likely that Microsoft is doing something similar as well. An interesting paper worth looking at if you’re interested in the topic is one by Jon Kleinberg: Bursty and Hierarchical Structure in Streams (pdf)
I understand why you would prefer to see pages which might be most relevant for the query terms that you entered into the search engine, but it appears that search engines are exploring other facets of relevancy. From a pure keyword matching type of approach, they are moving towards trying to match the intent behind that search more than matching keywords.
On the positive side, they aren’t changing your original query such as “world cup” to a new one “world cup 2010”, but they are boosting the “world cup 2010” results so they appear higher in search results. You should still see results that are relevant and important for just “world cup” but you may need to look a little further down on the search results to see some pages that might have been ranked higher.
This isn’t necessarily a question of “older” pages versus “newer” pages either, for all of the discussion in the comments on Google’s Query Deserves Freshness (QDF). If more searchers were searching for “world cup 2006” than “world cup 2010”, then we would possibly be seeing older pages given more weight than newer ones.
This patent filing isn’t really about trying to show searchers “newer’ pages, over older ones. I wouldn’t recommend that anyone remove post dates, and I hate arriving at a blog post that is missing a date and making it impossible for me to know when it was originally written.
A better piece of advice might be to anticipate that searcher’s attention has changed, or will likely change, on a particular topic and take positive steps. For example, if you have a site about football/soccer, and a blog post about the 2006 world cup, instead of removing the blog post, it would probably be better to create a new page about the 2010 world cup.
See my post: Reblossoming Content: Transforming Events Pages from Transitory to Evergreen
It would be interesting to know if any search engine looks at a user’s location or previous searches rather than assuming its the football(soccer) world cup they’re interested in. In some countries users may be more likely to be looking for information on a cricket or rugby world cup perhaps?
Good question. There are a wide number of potential reasons why a search engine might rerank search results, and more than one of those reordering of results may be in effect when we perform searches.
We know that Google will change the order of the results you see based upon both location and previous searches, and I’ve written in the past about patents, white papers, and blog posts from Google that describe a number of reasons on how and why they reorder those results (though there are likely a good number that we don’t know about).
It’s interesting to see Yahoo also digging into their query logs and looking at statistics related to different searches to change the results that you see, and consider how the approaches that they take may be similar to or different from what Google is using.
There may be some bias on the part of the search engines to show football(soccer) results for “world cup” over cricket or rugby results for the term, but I haven’t been doing past searches for football(soccer) or for cricket or rugby. I don’t know if there is much of a country-based bias for football(soccer) over rugby or cricket either, in the US. If I look at Google Insights for Search, comparing searches for “rugby world cup,” “cricket world cup,” and “football world cup” in the United States over the past 12 months, my results show just a little more interest in the football results, mostly because that search increased highly over the past month or so.
But what was interesting at Yahoo was that most of the top search results for “world cup” at Yahoo appeared to focus upon “world cup 2010” results. It could be that those weren’t reranked, and were the highest results based upon relevance and importance and possibly even increased levels of traffic.
I would definitely have to look at considerably more search results to get a sense of whether or not Yahoo was using the process described in this patent filing, but because of the patent filing we now know that this is a question that we could be asking – is Yahoo giving more weight to search results that include a year, or some other time-based indicator, when that year is included in many queries submitted to the search engine?
For anyone doing something like that, it is important though to ask questions like the ones that you are – what else might be influencing those search results.
Thanks nice post. I’d like to tell this post as thesis post very informative. If we search on search engine it’s our duty to be as specific as possible rather than wait that engin will check out our mind. Thanks
Sometimes people searching don’t know much about the topic they are searching for, which may be why they are searching in the first place. When that is the case, it can be difficult to be “as specific as possible.”
I totally agree with you, few days back I was searching for “Toronto Wine And Spirit Festival” and all the results were for this year 2010.
I think as far events are concerned Google usually shows latest articles in results to provide schedules, timing etc
I have also noticed that it also includes Wikipedia page of the event if available in case users are looking for more information about the event.
Good example – thanks.
While my post was about Yahoo’s patent filing, I do think that both Google and Bing also try to pay attention to queries involving recurring events as well.
I really don’t understand why Yahoo makes it so complicated for themselves. It’s not weird Google have these big market shares of the search engine market.
Do you personally think there within 10 years or so will be any competitor to Google that grows bigger in the US + Europe?
@Albin Considering that Google only has 13 years under their belt, I’d say that ten years would be an e-ternity. 😉
A newcomer could conceivably (although admittedly, not easily) appear tomorrow and leave Google dying in the dirt in far less time. Five years is even a long time, in Internet terms.
Hi Albin and Doc,
I’ve been wondering that about Yahoo for years. Google has held a strong lead in Web search for a long time, but it’s possible that with some strong gains from others and some missteps from Google, they could find themselves with one or more serious competitors.
It’s possible that challengers to Google could emerge from places like companies that have been involved in enterprise and desktop search as well as newcomers, and as Doc notes, 10 years is an eternity on the Web.
Comments are closed.