How Google May Reform Queries Based on Co-Occurrence in Query Sessions

When you search, especially for topics that you know little about, chances are that you might not include the most relevant terms in your query, or you might use words that may have ambiguous meanings.

One of the areas where search engines focus a lot of attention upon is in reformulating queries through query suggestions and query expansion to help searchers better meet their situational and informational needs quickly.

"Google query suggestions on a search for [find airedale terrier puppies]"

When you search, you might see a number of query suggestions at the bottom of the results that were first returned, like the ones above on a search for [find airedale terrier puppies]. Or a search engine might include synonyms or substitute queries to expand your original query.

Search engines will sometimes also expand the query terms used to return advertisements that might be relevant to your search as well.
Google published a patent application recently that explores how to possibly make it easier to find what you are looking for by offering more when it comes to:

  • Generating query suggestions
  • Keyword suggestions
  • Query expansions
  • Keyword expanded matches

Co-Occurrence in Query Sessions

This process involves looking at user query sessions to find words that co-occur within query sessions, especially when they are consecutive to each other. For example, someone searches for “find airedale terrier puppies,” and then follows up that search with another for “find airedale terrier puppies for sale.” There’s a pretty high confidence level that the two searches are related based upon the co-occurring words within them, and the fact that they are consecutive searches. If a lot of searchers perform similar searches within query sessions, that indicates that there is a strong relationship between the two queries.

I’ve written about co-occurrence in the past when it comes to queries that might share co-occurring words within their sets of search results. I’ve been seeing a number of blog posts that equate co-occurrence with co-citation, and they are not the same thing.

I’ve also seen claims that when certain words tend to appear on pages near brand names and or links to pages, that is considered co-occurrence and it can impact the rankings of those brands and/or URLs. That’s really just not how co-occurrence is used in search engine rankings – the observations that led to such a conclusion could easily involve a search engine looking not just at the anchor text in links, but also the words that might appear in a window around the anchor text near a link as well.

For example, in the following screenshot, “Followerwonk” is a link, and the words “Twitter metrics and analytics” appear near the link. Under an incorrect application and analysis of co-occurrence, Google might associate those words with “Followerwonk”. It’s much more likely that Google is associating some of the words around the link itself with the destination page that the link is pointing to, such as a “window” of 25 words around the link.

A link to the Moz followerwonk program.

This patent looks for co-occurring words within search sessions instead of on web pages or within search results for particular queries.

In many ways, this patent seems similar to a patent I wrote about very recently, which looked at relationships between search entities, and it shares a number of similarities, but it also digs a little deeper into relationships and probabilities between query terms that show up in search sessions.

Benefits of Query Suggestions, Substitution, and Expansion

The patent tells us that the benefit of using the process described within the patent can help with the following:

  • Query suggestions incorporate information-theoretical interpretations of taxonomic relations such as specification and generalization (how queries might be related to smaller categories and larger categories).
  • Query results may be improved though query substitution, and query expansion.
  • Related keywords may be identified.
  • The relevance of advertisement delivered to users may be improved.
  • Query classification can be improved.
  • Query completions may be improved to reflect semantic similarities between entered terms and suggested completions.
  • Query suggestions may be adapted to match user intent in terms of generalization or specialization.

Edit Distances

Successful refinements are closely related to the original query. This is not surprising as reformulations involve spelling corrections, morphological variants, and tend to reuse parts of the previous query. More precisely, reformulations are close to the previous query both syntactically, as sequences of characters or terms, and semantically, often involving transparent taxonomic relations. As an example, for the query \becoming a dentist”, the reformulation \becoming an oral surgeon” might have a higher chance of producing relevant results than \becoming a doctor”

~ Generalized Syntactic and Semantic Models of Query Reformulation

Query terms that might be similar are selected in part on how closely they might be related semantically. For example, It’s much more likely to see “become a dentist” followed by a query for “become a dental assistant,” instead of being followed by “become a doctor.” in a set of query sessions. It’s likely that we’ll see people change their queries in such a manner when they are performing searches in a search session.

In addition to this kind of semantic relationship, we might also look at how queries are physically transformed when searchers make changes to them as well. We can look at how this is done with physical changes to the words within a query, or changes to the terms themselves.

For instance, when someone finishes a query for “become a dentist,” they might then keep the first two words the same, and change “dentist” to “dental,”, which means removing “ist” and adding “al” and add the term “assistant. This isn’t a big change in terms of an “edit distance” from one query to the other.

A flow chart showing the edit distance from one phrase to anothe.

The patent explores the cost of changing one query to another with changes to strings of letters and/or the addition or removal of terms.

This combination of close co-occurrence values found in consecutive (or near consecutive) queries within a query session, and measuring edit distances between query terms to find smaller edit distances provides a framework for terms that might be near in meaning (semantics) and near in edit distance (or syntactically).

The patent is:

Generalized Edit Distance for Queries
Invented by Massimiliano Ciaramita, Amac Herdagdelen, and Daniel Mahler
US Patent Application 20130226950
Published August 29, 2013
Filed: April 3, 2013

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining a generalized edit distance for queries.

In one aspect, a method includes selecting query pairs of consecutive queries, each query pair being a first query and a second query consecutively submitted as separate queries, each first and second query including at least one term. For each query pair, the method includes selecting term pairs from the query pair, each term pair being a first term in the first query and a second term in the second query; and determining a co-occurrence value for each term pair.

The method also includes determining transition costs based on the co-occurrence values for term pairs, each transition cost indicative of a cost of transitioning from a first term in a first query to a second term in a second query consecutive to the first query.

Share

16 thoughts on “How Google May Reform Queries Based on Co-Occurrence in Query Sessions”

  1. This is definitely in line with how we search and will yield better results. However, I just recall Ask.com’s interface from few years back where they enabled users to select a category for generic keywords and then drill down.

    For generic keywords where we are trying to figure out how to communicate our own intent, that may be a better experience

  2. Thanks Bill for the clarification on co-ocurance and co-citation.

    Do you think Google is using this just in-session? I would hope that they build models off this semantic analysis and not just serve the according results to the user, but also use the outcome to feed into other elements of their algorithm.

    One of these could be displayed in the ‘related searches’ and / or feed into the broad keyword selection – especially when narrowed by industry.

  3. Hi Rajat,

    To a degree, this is attempting to give searchers query suggestions, and in some cases, even expanding queries so that people might get results that may be more in line with their intent behind the queries that they perform. If there isn’t a high enough of a level of confidence in results that might be very related, Google likely wouldn’t expand the results of a query. But people can click upon query suggestions that might be a better match for what they were looking for – it’s not categories, but they would provide options based upon what other people searched for.

  4. Hi Andreas,

    You’re welcome regarding co-citation and co-occurrence. I think I’ve seen about 7-8 blog posts over the last couple of weeks where people wrote about both co-citation and co-occurrence, and in some cases treated them as the same thing or something very related, and they really just aren’t. Most of those also quoted Rand from Moz and his “hunch” about how Google might be treating co-occurrence. Unfortunately, there really isn’t anything from Google in patents or papers or blog posts or public statements that backs Rand’s guess.

    There have been a number of patents from Google that involve co-occurrence and describe different ways that it might be used, and none of them are like what Rand describes. I did respond to Rand’s WBT video in this post from here:

    http://www.seobythesea.com/2012/11/not-all-anchor-text-is-equal-other-co-citation-observations/

    While Rand changed the title and the post to removed “co-citation”, that didn’t change what some of those recent bloggers wrote, and they didn’t even refer to the change. I followed that post up with another one that took a deeper dive into co-occurrence:

    Ranking Webpages Based upon Relationships Between Words (Google’s Co-Occurrence Patent)
    http://www.seobythesea.com/2012/11/ranking-webpages-relationships-co-occurrence-patent/

    It describes co-occurrence in a very different way than how Rand does in his whiteboard Friday video.

    I can’t say with any certainty that Google is using the process described within this patent filing at this point, but it does describe a workable framework for using query session data that could produce useful and helpful query suggestions, or as you called them, related searches (see my airedale puppies query refinement suggestions at the top of this post.

    The patent does tell us that this process can be used to broaden the advertisements shown for a search as well,

  5. Hi Gregory,

    After the first 3-4 days that I spent working on paring this patent down to a summary that attempted to capture as much of it as I possibly could, there’s still a lot left to it. I definitely recommend that anyone interested in digging in deeper do so.

    This really is the simple version. :)

  6. OK I have to admit I’m trying to learn more about SEO, but a lot of that went way over my head, still a way to go on the learning curve :-)

    The stuff about query sessions is an interesting one that I hadn’t really considered. Makes sense though, it’s the way I search myself; if I’m not happy with the original search results I’ll refine the terms.

  7. First off, I wish to thank you for such an informative site. Far too often I’ve seen people purport to be SEO experts simply because “they know a guy who knows a guy that works at Google” or something similar. Finding good SEO articles has been quite the challenge for me and I was very happy when I stumbled across your site.

    I have noticed that Google’s search results are tailored to the specific user. I am curious how this is accomplished. Is it dealing with cookies on the client’s browser, or does Google use the user’s IP address? Or perhaps a little of both?

    I made some changes to a site of mine a few months ago, and was really happy when I Googled a prominent keyword and it came 4th on the 1st page. Unfortunately when my friends tried the same search, my site showed up on page 10. I deleted my cookies and sure enough, I got a similar result. That’s kind of why I’m wondering if it’s cookie based and/or IP based. Tailoring a search for each user is great, but as a developer, it makes it challenging to see when and if a site’s ranking has increased. :)

    Thanks!

  8. Hi JL,

    Thanks for your kind words.

    Google does personalize results, when you’re logged in, and even when you’re logged out. If you’re logged in, those personalized result might be based in part on your search history associated with your account. If you’re logged out, they might be associated with a cookie that might have captured some history of where you’ve been searching and browsing lately.

    I’ve written more than a couple of posts on the topic, and many of those can be found in my personalization category.

    Google also uses context for results as well, and they consider that separate from personalization. That can influence the language you see results in, your location, the time of day and time of year, and other things that are based more on the context of your search than the things that you might have shown some kind of interesting in during the past.

    I’ve had at least one potential client call and inquire about SEO services in the past, and complain about a specific competitor, who he saw as ranking really well for many of his keywords, when that person wasn’t really ranking well. He was so fixiated with them that he visited their site frequently, and personalization made that competitor’s site rank well for him.

Comments are closed.