A Role for Time and Query Quality in Search Results

Some search queries are better than others at returning results that searchers expect to see.

How would you measure how good a query is? If you were a search engineer, how might that knowledge help you in returning search results to a searcher that are relevant to what he or she might be seeking? If you were creating a web site about specific topics, how might this influence your choice of words to use when writing copy, page titles, anchor text, and other elements of the pages of your site?

What role might the age of documents returned in response to a query play in these determinations, and the decisions that might follow from them?

Measuring the Quality of Queries

How would you decide that one query is better than another? A study from 2002, as described in Predicting Query Performance (pdf) explains that the problem lies in the fact that a searcher has no idea what to expect from a collection of data being searched, and what type of results might be returned. An example that they use is of a person searching a database of news articles from 1998 using the query “world cup” and expecting to find information about the sport of soccer (or football, depending upon where you are located), not realizing that the majority of results from that year will be about chess.

They explore a number of ways to determine the quality of a query. A generalization of what they discuss is that a query where the highly ranked documents are about a single topic, or a small number of topics would be a high quality query. A query where the highly ranked documents returned in a search were about a wide variety of topics would be a lower quality query.

Adding Time to the Determination of Quality

Could the times that documents were last updated, or created, or where time is referred to within those documents provide some helpful information that might make it easier to determine the quality of a query, and find higher quality queries? Could a “temporal profile” for specific queries be created that could help show certain results might have been more relevant at one time than another be useful in deciding upon that quality?

A paper that builds upon the first one I cited above adds a temporal element to help determine the quality of searches – Using Temporal Profiles of Queries for Precision Prediction (pdf) and considers questions like those.

Which is the better result in a search for “world series”?

A page about the St. Louis Cardinals (winner of this year’s world series)

A page about the Chicago White Sox (winner of last year’s world series)

Depends upon what you were looking for, doesn’t it?

Should searchers be given the option of refining their searches based upon some type of timeline, especially if it would help return higher quality results? Of course, there’s often something that muddies the waters. What about the “World Series of Poker” as just as relevant a result? Regardless of that, if letting a searcher choose a timeframe to receive results from makes a meaningful difference in the quality of results, it might make it easier for a searcher to find what they were looking for if using some type of temporal distinction between results improves the quality of those results.

Yahoo Temporal Relevance

A patent application from Yahoo Researchers explores a number of these concepts. The researchers are also the same folks who wrote the paper about Using Temporal Profiles

Temporal search results
Invented by Rosie Jones and Fernando Diaz
US Patent Application 20060248073
Published November 2, 2006
Filed: April 28, 2005

Abstract

In certain implementations, a system and/or method is provided for providing search results in response to a search query. In this implementation a temporal profile of the search query is built from temporal data associated with documents retrieved in response to the search query. From features of the temporal profile it may determined whether the search query would benefit from relevance feedback from a user. If there is a determination that the search query will benefit from the relevance feedback, relevance feedback is sought from the user. Search results are provided based on the relevance feedback.

The patent filing describes a way to determine if higher quality queries might result if time makes a difference, and provides the opportunity for a searcher to use those type of query refinements. That might include the presentation of a timeline for the user of the system, with a display of some of the events relevant to the query, and summaries of those events.

Queries can be classified into three different types when it comes to time:

  • Atemporal Queries – There’s not a lot of change over time in results as to different topics being more or less relevant for a specific query.
  • Temporally Unambiguous Queries – These queries may refer to a specific event, and include temporal information within them, such as “world series 2005″.
  • Temporally Ambiguous Queries – These queries may involve a number of events which happened at different times, such as my “world series” example.

Queries may be classified into these different types, and that may be helpful in determining whether or not to display a timeline.

In some implementations, it may not be necessary to utilize all the above discussed features for classification. Furthermore, of the above classifications, temporally ambiguous queries indicate a strong candidate for temporal relevance feedback. For example, in some implementations, such as when seeking relevance feedback, it may only be necessary to identify temporally ambiguous queries as distinguished from the other classifications. Conversely, identifying a query as non atemporal and not temporally unambiguous may indicate that the query is a candidate for temporal relevance feedback. In other implementations, it may be useful to fully classify the query so that additional temporal analysis may be performed. Furthermore, full classification may be desirable to determine if the classification of the query is identifiable at all. Information about whether the trained predictive model is capable of classifying a query can be used to decide if further analysis is required to determine if relevance feedback would be beneficial.

The patent filing goes into a number of other details about this process, such as choosing which results to display, how to present a timeline, other factors to consider in determining which results belong to which time periods, and so on. It’s an interesting look at how the age of documents, or information within the documents about the time they refer to may be used to help searchers receive results for a search engine that are relevant to what those searchers are looking for.

Conclusion

I can see how presenting a timeline might be helpful to some searchers, and some searches.

If you are working to try to get the pages of your web site site to rank for certain phrases, do you look at the top results to see how wide and diverse, or how narrowly focused those results might be?

Do you include information on your pages about specific events, and the dates of those events? How do you attempt to distinquish them from other events?

Do you pay attention to changing uses of language when it comes to the way people talk about what you might have to offer them?

I’ve had people ask me if it was worth keeping older articles and documents on their web sites, especially when information in those documents might become outdated. My response has been that as long as the pages clearly indicate what time periods they are relevant to, and that if the site owners include updated information, it’s easy for people to know that, and find that new information, it can be helpful to them to keep those pages.

For example, if an organization holds a yearly conference, instead of deleting the pages of their site about older conferences, it doesn’t hurt them to write the information about those conferences in a way so that people interested in attending future conferences can get a good taste for what past conferences were like.

I think that these papers and the patent application shows that those older pages can have value.

Share

9 thoughts on “A Role for Time and Query Quality in Search Results”

  1. Bill,

    I think that the time relevance should be considered only in certain topical areas. News is news, websites have transitioned into being a more stable resource of specific content driven topics. I think that if time is a concern for the user, the engines should give this versatility to the user by sorting by date. They offer it now in news and blog searches, why not in standard search?

    Another idea, possibly give the user the top three – five results in each of the search categories. AJAX is being used by many online services to give the user more control, why not in standard search?

    Of course, they could just offer suggestions:
    for World Series
    Baseball
    - History
    - Recent Champions
    Poker
    - Players
    - Recent Champions

    You don’t have to read a user’s mind, just give them a chance and they will respond. They might not think you are trying to show them what you want them to see. A win-win for the User and the Engine!

  2. Hi Stephen.

    You raise some good points. The patent application does talk about email and news searches because the times on those tend to be more meaningful than the times on web pages.

    But I wonder if search engineers are considering web pages as a stable sources of information. The rate of change on web pages is supposed to be significant, and I think we’re seeing more and more dynamic sites.

    The patent also talks about burstiness, and as Kleinberg notes in this interview, blogs also have dates associated with them.

    The idea of using something like Ajax, and providing query refinements based upon some type of topical/temporal categorization is a great idea. That may be what Google Suggest, which is now a standard part of Google’s toolbar search, evolves into.

    Type “world ser” into the search box on the toolbar, and you’ll see the following suggestions:

    World Series
    World Series of Poker
    World Series Winners
    World Series History
    World Series Champion
    World Series Poker
    World Series of Poker 2005
    World Series trophy
    World Series 2004
    World Service

    I’m wondering, looking at the choices with years after them, if those reflect actual user queries, or some type of segmentation of results based upon something like we see in this patent application.

    It’s hard to tell. But worth looking into more.

  3. I certainly think it’s worth leaving any web pages with content up and running. This is an excellent point. Made me think that perhaps it would be useful to have a script for the end of a blog post that would automatically display related posts within the same blog, even including later posts.

  4. Hi Barry,

    I agree with you.

    It can be a good idea, depending upon how you create those pages in the first place.

    For instance, one site I’ve worked upon had a huge archive of newsletters announcing events, with links to pages specifically about those events. Sometime after the events ended, they would take those events pages down, but kept the newsletters online.

    So, the newsletters were filled with links to pages that no longer existed, and the descriptions of those events in the newsletters were minimal.

    We removed most of those older newsletters, and started writing new ones with a little more meat to them so that if a link stopped working, the newsletter was still a nice chronicle of what was planned, and what happened at those events.

    Upcoming events pages were also written from that point on, so that if people wanted to learn more about those past events, to see what they were like, and what they missed, they could. If the event was a recurring one, a link would be added to the older page to a newer one, when the new page was created.

    I’ve thought about ways to update older posts, too. When you have a new blog post that is very relevant to something you wrote in an older post, it doesn’t hurt to go back to that older post, and add a note to it, with a link to the new post. Internal trackbacks can also provide a link to newer posts when you link to those older posts.

    Cheers.

  5. Great topic Bill.

    No question a slider like Kayak employs to modify results based on timeframe can help deliver more relevance.

    I disagree with Stephen that it should only be used in certain topical areas. I’m having a hard time of thinking where it couldn’t provide benefit for certain users based on the goals of their query. Also, as more pages get indexed the value of this ability increases.

    I think this is something we’ll see from all the major engines in the next 24 months.

  6. Thanks, Jonathan.

    I’m assuming that you mean the sliders to modify times to leave and return on Kayak. Nice tool there. I’ve been booking my flights through them because they make it so easy to do.

    It does seem like an evolution to search that could make a lot of sense. Of the many things that search engines could do to try to improve the relevancy of results, this is one that I would really like to see them work upon.

  7. What the patent tells us is that we should always include relevant temporary information on the pages. But that’s just common sense, as it’d be logical to remember that pages last forever (unless deleted) and anyone can land on them from the search engines.

    So any old page should provide relevant information and links to the newer sources.

    I don’t think the results should be grouped by topic and that topics should be fixed as well. Some industries may have their events as well, why cloud the results with the most popular industries?

    I’d say that just making it clear on the temporary relation of articles (pages) on the site should do the job. Of course, that may include having descriptions, written for the future, as Bill suggested.

    Btw, Barry, there already is a related posts WordPress plugin. If you add the code into the loop, the old posts will have some related posts associated with them as well.

Comments are closed.