How Google May Reform Queries Based on Co-Occurrence in Query Sessions
When you search, especially for topics that you know little about, chances are that you might not include the most relevant terms in your query, or you might use words that may have ambiguous meanings.
One of the areas where search engines focus a lot of attention upon is in reformulating queries through query suggestions and query expansion to help searchers better meet their situational and informational needs quickly.
When you search, you might see a number of query suggestions at the bottom of the results that were first returned, like the ones above on a search for [find airedale terrier puppies]. Or a search engine might include synonyms or substitute queries to expand your original query.
Search engines will sometimes also expand the query terms used to return advertisements that might be relevant to your search as well.
Google published a patent application recently that explores how to possibly make it easier to find what you are looking for by offering more when it comes to:
- Generating query suggestions
- Keyword suggestions
- Query expansions
- Keyword expanded matches
Co-Occurrence in Query Sessions
This process involves looking at user query sessions to find words that co-occur within query sessions, especially when they are consecutive to each other. For example, someone searches for “find airedale terrier puppies,” and then follows up that search with another for “find airedale terrier puppies for sale.” There’s a pretty high confidence level that the two searches are related based upon the co-occurring words within them, and the fact that they are consecutive searches. If a lot of searchers perform similar searches within query sessions, that indicates that there is a strong relationship between the two queries.
I’ve written about co-occurrence in the past when it comes to queries that might share co-occurring words within their sets of search results. I’ve been seeing a number of blog posts that equate co-occurrence with co-citation, and they are not the same thing.
I’ve also seen claims that when certain words tend to appear on pages near brand names and or links to pages, that is considered co-occurrence and it can impact the rankings of those brands and/or URLs. That’s really just not how co-occurrence is used in search engine rankings – the observations that led to such a conclusion could easily involve a search engine looking not just at the anchor text in links, but also the words that might appear in a window around the anchor text near a link as well.
For example, in the following screenshot, “Followerwonk” is a link, and the words “Twitter metrics and analytics” appear near the link. Under an incorrect application and analysis of co-occurrence, Google might associate those words with “Followerwonk”. It’s much more likely that Google is associating some of the words around the link itself with the destination page that the link is pointing to, such as a “window” of 25 words around the link.
This patent looks for co-occurring words within search sessions instead of on web pages or within search results for particular queries.
In many ways, this patent seems similar to a patent I wrote about very recently, which looked at relationships between search entities, and it shares a number of similarities, but it also digs a little deeper into relationships and probabilities between query terms that show up in search sessions.
Benefits of Query Suggestions, Substitution, and Expansion
The patent tells us that the benefit of using the process described within the patent can help with the following:
- Query suggestions incorporate information-theoretical interpretations of taxonomic relations such as specification and generalization (how queries might be related to smaller categories and larger categories).
- Query results may be improved though query substitution, and query expansion.
- Related keywords may be identified.
- The relevance of advertisement delivered to users may be improved.
- Query classification can be improved.
- Query completions may be improved to reflect semantic similarities between entered terms and suggested completions.
- Query suggestions may be adapted to match user intent in terms of generalization or specialization.
Successful refinements are closely related to the original query. This is not surprising as reformulations involve spelling corrections, morphological variants, and tend to reuse parts of the previous query. More precisely, reformulations are close to the previous query both syntactically, as sequences of characters or terms, and semantically, often involving transparent taxonomic relations. As an example, for the query \becoming a dentist”, the reformulation \becoming an oral surgeon” might have a higher chance of producing relevant results than \becoming a doctor”
Query terms that might be similar are selected in part on how closely they might be related semantically. For example, It’s much more likely to see “become a dentist” followed by a query for “become a dental assistant,” instead of being followed by “become a doctor.” in a set of query sessions. It’s likely that we’ll see people change their queries in such a manner when they are performing searches in a search session.
In addition to this kind of semantic relationship, we might also look at how queries are physically transformed when searchers make changes to them as well. We can look at how this is done with physical changes to the words within a query, or changes to the terms themselves.
For instance, when someone finishes a query for “become a dentist,” they might then keep the first two words the same, and change “dentist” to “dental,”, which means removing “ist” and adding “al” and add the term “assistant. This isn’t a big change in terms of an “edit distance” from one query to the other.
The patent explores the cost of changing one query to another with changes to strings of letters and/or the addition or removal of terms.
This combination of close co-occurrence values found in consecutive (or near consecutive) queries within a query session, and measuring edit distances between query terms to find smaller edit distances provides a framework for terms that might be near in meaning (semantics) and near in edit distance (or syntactically).
The patent is:
Generalized Edit Distance for Queries
Invented by Massimiliano Ciaramita, Amac Herdagdelen, and Daniel Mahler
US Patent Application 20130226950
Published August 29, 2013
Filed: April 3, 2013
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining a generalized edit distance for queries.
In one aspect, a method includes selecting query pairs of consecutive queries, each query pair being a first query and a second query consecutively submitted as separate queries, each first and second query including at least one term. For each query pair, the method includes selecting term pairs from the query pair, each term pair being a first term in the first query and a second term in the second query; and determining a co-occurrence value for each term pair.
The method also includes determining transition costs based on the co-occurrence values for term pairs, each transition cost indicative of a cost of transitioning from a first term in a first query to a second term in a second query consecutive to the first query.