It’s interesting to see how a search engine might try to calculate relevance of search results.
A recently granted Yahoo patent investigates an approach that might help it identify how relevant the results it displays to searchers might actually be, and how likely those results are to show a variety of results when a searcher uses a query term that might cover a range of topics.
Before presenting their automated approach for checking relevance and variety, the patent tells us about some of the limitations it sees in using manual review or click data for determining how relevant results might be.
One option for checking on the relevancy of search results would be to manually screen results for each query. That might be pretty time consuming, involve the possibility of human error, and doesn’t seem like it would even begin to cover all of the queries that are conducted on the web.
I did see an ad on Craig’s List a few weeks ago from Lionbridge Technologies, Inc., asking for part-timers to act as Internet Judges. A little sleuthing on the Web revealed Google may have used Lionbridge in the past to hire people to rank the relevancy of search results, though Craig’s List posting didn’t identify the ultimate employer. From the job description from the ad:
Relevance measurement is the foundation of all search engines, without it no one can tell whether a change has made the system better or worse. As an Internet Judge, you will be a key participant in helping determine the relevance of search engines. We are looking for Internet Judges who would work from home; review and rate websites based on an objective set of guidelines. Candidates must be avid internet enthusiasts. If you love browsing the web and can follow a specific set of guidelines to rate websites, then we want to hear from you.
Search engines do use manual reviewers. So does baseball. They never make mistakes, now do they?
In a recent post, I described a patent filing from Yahoo, where they presented a method of ranking images based upon a method of predicting clickthroughs of images at different positions in search results.
The assumption behind that approach was that images that appeared to be relevant for a query would be clicked upon, and that a prediction rate for images at certain positions in the search results could be used to identify images which overperformed based upon where they appeared in the results and move those up, and to find images which underperformed based upon their position, and move them down in the results. With image search results showing a thumbnail of an image, that might work well for images.
Would tracking the number of times that web search results get clicked upon when they appeared in search results reveal that those results are relevant for the query terms that they rank for?
A problem with that approach is that searchers only see a page title, abstract (or snippet), and URL for web pages, and those may not accurately reflect the content that appears upon the pages they represent. That limitation means that clicks upon search results for web pages may not be a good indication of how relevant those results are for a particular query.
The image above is from a patent for an Automated Baseball Umpiring System. While it may possibly do a good job of calling balls and strikes, it probably isn’t going to be helpful with other tasks, such as determining whether a hitter was hit by a pitch, or if a runner is safe or out on a close play at the plate.
An Algorithm to Determine the Relevancy and Variety of Search Results
Yahoo’s patented process uses information from recent searches to see if search results match up well with the searches that people are performing at the search engine.
Automatic relevance and variety checking for web and vertical search engines
Invented by Jignashu G. Parikh
Assigned to Yahoo
US Patent 7,558,787
Granted July 7, 2009
Filed July 5, 2006
Techniques for automatically checking the relevance and variety of search results are provided.
A query is submitted to a search engine, which uses a search algorithm to obtain search results based on the query. A set of the top n related terms for the query is identified. For each related term in the set of terms, its relative frequency in relation to all terms in the set of terms is determined. If the term does not occur in any of the results, then a loss in variety proportional to the relative term frequency for the term has occurred.
Otherwise, the relevance of the search results is calculated by comparing the proportion of results containing the term with the relative term frequency for a term. This process is repeated for all terms in the set of related terms to produce a total variety and relevance for the results.
When someone searches at a search engine, they enter a query term into the search box and hit enter.
A set of results is returned by the search engine which ranks those results according to search algorithms. The actual algorithms used to rank those results usually include elements that measure both the relevance and the importance of pages matching the query searched for.
This patent filing describes a testing interface that search algorithm and search engine developers can use to test the variety and relevance of search results.
As I noted at the start of this post, it’s interesting to see how a search engine might attempt to determine how relevant search results might be.
Using Related Terms
This process of determining relevancy and variety in search results starts by identifying terms that might be related to a searcher’s query.
Someone searches for [Amazon] and the search engine retrieves results related to the query and displays results to the searcher.
The results that appear may be relevant to the online store at “Amazon.com” or to the “Amazon river.”
There’s no way to actually determine automatically whether the searcher wants information about one or the other, or about something even different.
But, the search engine might look at query logs and session-based search data and other data sets to determine sub-concepts for a query.
Those sub-concepts might be the kind that you see offered at query suggestions by a search engine. See my previous post, How Search Engines May Decide Upon and Optimize Query Suggestion for some ideas on how a search engine might identify and optimize query suggestions for a specific query.
The same kind of data that Yahoo might use to offer “Also Try” type queries or Yahoo predictive search suggestions might also be used to identify sets of related terms for a searcher’s query.
A search engine also tracks the times that queries are submitted to a search engine, which may be helpful in identifying time-sensitive queries.
Related terms may be collected from search engine query log data from the last week, rather than the last year, to make sure that the information is timely.
So, if an earthquake took place a couple of months ago, the query logs around that time might have included many searches for [Amazon earthquake]
A month or more later, there might be a lot less searches for that term, and [amazon earthquake] might not be considered a related query like it would be recently after the time of the event.
A search through recent query logs might show how many times queries that included, or co-occurred” with “Amazon” appeared in that data. So related queries such as “amazon books,” “amazon river,” and “amazon rainforest” might be determined to be related queries if they show up frequently enough in the query logs that are examined.
The search engine may also look at search sessions from searchers in the query logs, to see how often other queries appear in the same search sessions as queries for, or that contain “Amazon.”
A search session might be defined as multiple searches from a searcher within a specific amount of time, such as an hour or a day.
Relative Term Frequency and Checking for Relevancy
Once a search engine has come up with a set of related terms for a query, it might calculate the relative frequency of each of those related terms compared to the original searcher’s query in the query logs that were examined. Here’s an example of how that calculation might work from the patent filing.
For example, referring to table 216, the F.sub.term of the term “books” is 25, meaning that “books” co-occurred with “Amazon” 25 times within the selected portion of Query Log 210, represented by table 212. Further, the F.sub.total is 50, corresponding to the total number of co-occurrences for all terms within the set of table 216.
Therefore, a determination can be made that the F.sub.relative of the term “books” is 25/50 or 50%. Table 216 further contains the relative term frequencies of all the other terms within the set of related terms. Specifically, the term frequency of “rainforest” is 12/50, or 24%, of “river” is 8/50, or 16%; and of “fish” is 5/50, or 10%.
The relative term frequency of each related term in the set is used to determine both the relevance and variety of search results for a primary query as further described herein.
Those ratios might be used in looking at the search results for the original search query.
If you look at the titles and snippets (or the actual content) of the top ten results in a search for [amazon], does half of those results contain the word “books” like the query logs examined do? Does a quarter of them contain the word rainforest? Is there a mention of the word “river” in one or two of them? Is there at least one with the word “fish” in it?
If the ratios between the query logs and the search results match up well, it might be an indication that the relevance of those results is pretty good. It may also indicate that the variety of results is good as well.
The patent does warn that some search results may be very relevant, but may also completely lack variety if a searcher’s query doesn’t contain may sub-topics, or related terms involving different topics.
I thought it was interesting that this patent describes a way of finding related query suggestions that is very similar to the method described in a Microsoft patent application in my last post.
The idea that the frequency of appearance of words from related queries could be used to gauge the relevancy and variety of results for a searcher’s query is also worth thinking about.
If half the people using [amazon] in their searches include the word “books” in those searches, should half the search results in a search for [amazon] contains the word “books?” If 20 percent of searchers looking for [amazon] include the word “rainforest’ in those searches, should two of the top ten search results be results about the Amazon rainforest?
Presently, the top ten search results at Yahoo for [amazon] contain two results for the .com version of the bookstore, followed by two results for the .ca version of the bookstore, then the wikipedia page for amazon.com, an entry about the Amazon river, a couple of pages about Amazon’s web services, a result for the co.uk Amazon store, and a final result for Amazon seller services, which lets people sell their products through Amazon.
Do these results reflect recent searches contained in Yahoo’s query logs that include the word “Amazon,” or that show up in the same search sessions as a search for [amazon]?
Should the relevancy of search results be based upon the frequency of related terms in recent query logs? Is that a good measure of how relevant those results might be?
I’ve actually written an earlier post about this patent when it was published as a patent application in January of 2008. I didn’t realize that until I was most of the way done with this post, but I think that the two posts actually compliment each other, so I decided to go ahead and publish this post.
I think the two posts do a good job of emphasizing the importance of trying to understand what a search engine might see as “related queries” for a specific query, and how those might not only influence which search suggestions might be shown in a set of search results but also how relevant a search engine might believe those search results to be based upon those related queries.