User Behaviour: Deletion Predictions

Many of the searches conducted on a search engine involve using more than one word in a query, and search engines pay attention to which words are being used. And, it’s possible that they are tracking and counting those words used in queries, as well as which pages are selected as a result of that search.

But search engines can pay attention to more than just the words being used. They can also look at user behavior from one search to another.

The Importance of Deleted Query Terms

Imagine that a Yahoo or a Google is watching and connecting how a searcher acts in a string of searches. Someone performs a search that involves more than one word, looks at the results, and then deletes one of the search terms, and searches again. The searcher then selects a result from that second search. What does that tell the search engine about the original search query, the deleted term, and the result chosen?

Now consider that the search engines are watching and collecting this type of information for a very large number of searchers.

Can that information help them make their results better? Can it help them make their contextual advertising show more relevant ads for some queries?

A new patent application from Yahoo tells us that:

Determining the relative value of the many terms used in search engine queries of two or more terms can make many search queries of two or more terms valuable for use as advertising links and for improving search results.

Knowing the frequency with which a term has appeared in actual previous searches before the term itself or another term in the same query was deleted in actual subsequent searches by the same search-engine user can give a deletion probability for the term which can be used to calculate the relative value of a search engine query of two or more terms.

The Patent Application

System and methods for ranking the relative value of terms in a multi-term search query using deletion prediction
Invented by Rosemary Jones and Daniel C. Fain
US Patent Application 20060129534
Published June 15, 2006
Filed: December 14, 2004


The likely relevance of each term of a search-engine query of two or more terms is determined by their deletion probability scores. If the deletion probability scores are significantly different, the deletion probability score can be used to return targeted ads related to the more relevant term or terms along with the search results. Deletion probability scores are determined by first gathering historical records of search queries of two or more terms in which a subsequent query was submitted by the same user after one or more of the terms had been deleted. The deletion probability score for a particular term of a search query is calculated as the ratio of the number of times that particular term was itself deleted prior to a subsequent search by the same user divided by the number of times there were subsequent search queries by the same user in which any term or terms including that given term was deleted by the same user prior to the subsequent search. Terms are not limited to individual alphabetic words.

How is a Deletion Probability Determined?

An example, roughly paraphrased from the patent application:

  1. Choose a search query, like “Honda,” from the records of two word search queries.
  2. The other word in the two-word search query could be any other single word.
  3. Look if there are word deletions of either “Honda” or one of the other words in the two word query where there is a subsequent search by the same user.
  4. If so, calculate the deletion probability score for “Honda” by:
    • Counting how many times a word is deleted in a subsequent search by the same user from a two word search query which includes Honda. Imagine for this example that Honda was seen 6059 times in a sample of data in which a word was deleted from a two word search query prior to a subsequent search by the same user.
    • Looking at how many times the selected search term (Honda) was the term deleted. Now imagine that in those 6059 times, the word Honda was deleted 1874 times.
    • Calculating the number of times the selected search term (Honda) was deleted, divided by the total number of times any word in the two-word search queries that included Honda were deleted. For our example, this would be 1874/6059, or about 0.31. That number is the probability deletion score for Honda, for a two term query. Smoothing or other statistical methods could also be used to calculate the deletion probability score.
  5. After calculating the deletion probability score for Honda, add to the list of deletion probability scores for other terms.

The deletion probability score can then be used to compare the probability of the deletion of “Honda” to the deletion of any other term shared with Honda in a two-word query. This is the deletion probability score of Honda vs. the deletion probability score of “anything else” for two word-queries.


The patent application explores some additional ground, such as when two word queries might be determined to be a meaningful phrase, in and of itself (example – “ice” and “cream” can be joined together as “ice cream,” and mean something completely different than either part.)

But the basic premise is that there are times when someone searches with more than one term, and one of the terms could be more relevant to what a searcher is looking for than the other. By using something like this deletion probability score, it might be possible tell which one is more relevant.

This might be helpful in improving search results. The patent application notes that search results may be improved by this process, but the focus of examples described tend to be more on how this process can help deliver more relevant advertisements to searchers.

As for ads shown to searchers, where there are no ads that match all of the words of the query, if there are ads that match the more relevant of the terms used in the query, those ads may be shown to a searcher instead of ads that match the less relevant term or terms.

4 thoughts on “User Behaviour: Deletion Predictions”

  1. I’m not a fan of patents like this. Yahoo isn’t really doing very much that is truly striking here, in my own opinion anyway.

    People add terms because they need to both zero in on a concept and discard a lot of material that can be involved in multiple topics. There is no mystery or magic behind this.

    People delete terms because they have focused in on the wrong area or they have discarded the type of material they are looking for. There is no mystery or magic behind this either.

    Finding ways to predict keyword additions or deletions really means they are finding ways to figure out what concepts people association various keyword sets with, based on their behavior.

    I don’t think you should patent the process of studying human behavior and mapping out how they relate language to the topics.

    I could be barking up the wrong tree here… it’s not like I understand everything I take a glance at! 😉

  2. Hi Grokodile,

    I think that you have a good sense of what they are trying to do, and I’m in agreement that it is common sense. I think what they are patenting isn’t so much the way of understanding based upon the behavoir they see, but rather the monetizing of it by defining that concept so that they can show the right ads with the results that come back.

    I should add a link to the poster on this topic that Jean-Marie Le Ray located and shared in the comments section of another post – Query word deletion prediction.

    Reading through that, I think that you’re on target with how it works. Should that process be protected by a patent? Maybe, if someone comes out with something that is almost exactly the same as what Yahoo is doing.

    But chances are that if someone wants to do something similar, they can probably find a way that doesn’t step on the toes of this patent application.

Comments are closed.