How Google May Substitute Query Terms with Co-Occurrence

But I’m a substitute for another guy
I look pretty tall but my heels are high
The simple things you see are all complicated
I look pretty young, but I’m just backdated, yeah

- Peter Townsend

When you search at Google, how easy is it to find what you’re looking for? Do you search again, but try different but related words if your first attempt doesn’t uncover pages that you find useful?

If I search for “car repair” and follow it up on a search for “auto repair,” I would suspect that I would see a lot of the same pages, but perhaps not in the same order. I would also expect to see local search results for both, and I do. The local search results aren’t in the exact same order either. Some words or phrases do make good substitutes for others though, as can be seen in the image below:

A comparison of co-occurring terms for 'french open' and 'frenchopen'.

When I do the “car repair” search, I see some pages that have the word “auto” instead of car, and it’s bolded, as described in a Google Official Blog post from 2010, Helping computers understand language. Google has been expanding queries by using synonyms instead of the original search term you may have used for a while.

Is Keyword Matching Dying?

As a site owner, or designer, or developer, when you create a webpage, you may try to optimize the page for a specific term or phrase, and hope that it ranks in search results that people interested in what you offer might use in a search, and might expect to see on your page. But what if Google finds a way to match the concepts and the things, or entities, that your page is about without relying exactly upon matching those words, to better enable searchers to find what they are looking for?

We’ve seen Google attempt this with the use of synonyms to expand queries people are searching with. Synonyms for categories that different businesses might be found for, may also help expand the results returned in local search results.

One of the really tricky aspects of a search engine using synonyms, is that sometimes words change meanings in different contexts. For example, a car and an automobile might be synonyms when you see them in relation to car or auto “mechanics,” or car or auto “repair,” but not when you’re discussing a Ford auto and a railroad car. They are just not the same thing anymore, in that context.

One of my favorite approaches that Google uses to finding synonyms within context, is a statistical language translation approach where a term or phrase might be translated into a different language, and then translated back into the first language. For example, “car mechanic” might be translated from English into French, and upon translating it back into into English, two options might be returned – “car mechanic” and “auto mechanic.” If there’s enough confidence that they mean the same thing, they might be considered synonyms of each other.

Substitute Query Terms Rather than Synonym Query Terms

A patent granted to Google this past week also explores the idea of finding terms or phrases to use to expand queries, but calls those terms “substitute terms” rather than “synonyms.” The image above displaying comparing co-occurring words in search results for “french open” and “frenchopen” involves a process that can be used to explore other words as well, though sometimes they aren’t good substitutes for each other, such as “warrant” and “warranty,” as seen in the screenshot from the patent below:

A comparison of co-occurring terms for 'warrant' and for 'warranty'.

In an example from the patent, two words that might potentially be substitutes for each other are “felines” and “cats”.

The process used to find substitute terms focuses upon the use of the co-occurrence of words found on pages returned in response to a query, and to a potential substitute query. These candidate substitute terms might originally show up in documents ranking for the first query term, or in meta data associated with those documents.

For example, to find a potential substitute query terms for “cats,” terms that appear in documents ranking for “cats” may be explored. One of those might be “feline.” If we perform a search for “cats”, and look through the top 10 (or top 20, or even top 100) results for words that tend to co-occur on those pages, we might see words such as “furry”, “domesticated”, “carnivorous” and ” mammal” appear on a lot of the top pages returned for that query. If those are terms that tend to co-occur often in the results on a search for “cats,” they are considered co-occurring terms.

If we perform a search for “felines,” we might see a lot of the same terms or phrases co-occurring on the top results for that search that we see for “cats.” The patent tells us that:

One particular indicator of how good a particular candidate substitute term is for an original query term is to compare co-occurrence frequencies for terms that co-occur with the original term and with the candidate substitute term in search queries.

Building Substitute Rules

On a search for “cats”, some of the terms that might show up frequently on some top pages in results might be terms like “Broadway,” and “Acting” and “T.S. Eliot”. Those pages aren’t about cats themselves, but rather a play about cats. When Google analyzes terms that co-occur in search results, it may come up with rules that it will follow to determine which pages to use, and to not use.

Some pages might be included in an analysis looking for terms that might be reliable substitutes for each other because they share a number of search results that contain many of the same co-occurring terms. The pages that appear to have other contexts completely might be ruled out from those computations.

The Google patent is:

Evaluation of substitute terms
Invented by Daisuke Ikeda and Ke Yang
Assigned to Google
US Patent 8,504,562
Granted August 6, 2013
Filed: April 3, 2012

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for evaluating substitute terms. One of the methods includes selecting a first term and a candidate substitute term for the first term.

A first vector is generated for the first term using co-occurrence frequencies of terms that occur in search queries that include the first term.

A second vector is generated for the candidate substitute term using co-occurrence frequencies of terms that occur in search queries that include the candidate substitute term.

The first vector and the second vector are compared to score an association between the first term and the candidate substitute term.

Knowledge Base Substitutes

Google’s efforts to build a knowledge base will likely explore similarities between different entities that might share similar names or concepts. For example, the knowledge panel that shows up on a search for “cats” includes the Broadway Play “Cats”, the Charlotte Area Transit System (CATS), and “Felidae” – the biological family of the cats.

Google's knowledge panel results on a search for cats shows alternatives based upon the word and the concept both.

When I search for “cats”, I’m as likely looking for the domesticated variety as I am for all kinds of cats. Google’s knowledge base provides alternative results that let me decide upon what I want to substitute for my original search.

Some previous posts about co-occurrence:

Share

26 thoughts on “How Google May Substitute Query Terms with Co-Occurrence”

  1. Do you think this technology will be used for geographies as well? E.g. NYC and New York City etc…

  2. Bill, thank you for all the hard work in putting this together and it is an additional signal in how semantic search is developing and the semantic web (Google’s version of it, at any rate) is being built. Spam sites and Black Hat SEO have always relied on the fact that it got them fast results that could then be repeated after the site got burnt, so from a profit point of view, this made perfect sense. It no longer does. If it takes just as long to rank a site using Black Hat techniques as it does to do normally then the answer is obvious. We may well end up with a more honest web and a more honest SEO industry.

  3. It is only logical that with the advances of the knowledge graph by google, they will start and implement it further and further into they’re core algorithm. thus as said in the patent start providing search results based on synonyms and relevant words to the query.

    Once google will figure out the real intent of users (if that is even possible) i believe we will see much more of that, as they will start targeting solutions for problems and will also show you relevant solutions that may not target you by keyword.

  4. I think users behavior on these substitute query also play a great role in deciding whether this can be displayed in SERP or not. Suppose Google provides auto related results with car related results for a query related to car. If users engagement with such substitute results is satisfactory, then that may be a factor while deciding for providing such results.

  5. Hi Alex and Patrick,

    It’s possible that this substitute query approach might be used, but it’s also just as likely that synonyms for geographies might be used as well. For example, the patent I write about in one of the posts I linked to above about synonyms uses some examples that involve geography, such as searches for “Fort Wayne” showing results for “Ft. Wayne”, because that’s often how the place is referred to. That post is http://www.seobythesea.com/2009/12/how-google-may-expand-searches-using-synonyms-for-words-in-queries/

  6. Hi David,

    Thank you.

    The more attention needed for different signals that might be involved in how something ranks at Google, the less likely that quick and automated approaches will be used to rank pages. At some point, it may be easier and less expensive to actually attempt to market pages than it is to manipulate them, by people using high risk methods like those. That definitely seems to be true with social signals like reputation scores at Google Plus as well. If creating as human a profile as possible might make a difference in how things rank, spending efforts to create realistic human profiles might not be economically feasible.

  7. Hi Or,

    There are a number of different ways that Google might attempt to use the context of a query to understand how different pages might be boosted or reduced in rankings, or which queries might potentially be good synonyms or substitutes for others. By “context of a query,” we could mean, as in this patent, that the top results for those might contain a certain amount of the same terms or phrases showing up as co-occurring terms. I’ve seen patents that look for the same documents in the search results for both as well, to try to understand the similarity of search terms.

    The “categories” that different queries might be classified under, and the “intent” that they might been seen for them could depend upon your location. For example, if I search for “tour guide” in the area of semi-rural Virginia I’m located in, I might not see a lot of results. If I do the same search closer to Washington DC, where there are a lot of tour guides, that’s another story – local search results will show up, and there will be a lot more local web results (likely under the Venice algorithm). The context of a query can be measured a number of different ways. And the impact of that does seem like it will sometimes involve showing substitute terms for the queries you do use, as described in this patent.

  8. I can’t help but wonder if Google’s strategy to move away from terms is part of a concerted response to the explosion of the SEO industry, which has served to make their algorithm more readily exploited by those with the time, knowledge, and skills to do so. Do you have any sense that co-occurence will be more difficult for people to manipulate than query terms?

  9. Hey Bill, great research. I think it’s only a matter of time before Google starts to implement this strategy. Keyword optimization has always been the forefront of search results. It’s been a little too easy for people to control their results. Substitute terms and queries might be the better route for the search giant. Thanks again

  10. It seems that this goes hand in hand with Google’s latest in-depth article update. If you want to rank for something, it is becoming seemingly more important to write an exhaustive resource on the subject. This has already been true for some competitive phrases (like learn seo) but I think this is now true across all verticals.

    -B

  11. Hey Bryant,

    Very good point, although becoming a dominant resource and authority in your vertical has always been recommended by quality SEO’s it seems to finally be the best strategy for ranking in the long term.

    Co-Occurrence is already the future and trying to rank for one specific keyword only is becoming a bad strategy for any good SEO.

  12. Good article Bill. After reading this it got me thinking what happened to the keyword research tool? I went on to use that and I can not find it. Also where can I find googles latest article that Bryant mentioned in his comment sounds interesting.

  13. Hi Ryan,

    Bryant was referring to Google’s new rich snippet, for “in depth articles,” which is described in much more detail here:

    Appearing in the “In-depth articles” feature
    https://support.google.com/webmasters/answer/3280182?hl=en

    Google has closed the external keyword suggestion tool, but there is one that you can access after logging in that brings you to the Adwords keyword suggestion tool. You don’t need to be running any paid search campaigns to use it, and it is set up differently than the old free one. For example, the search volumes show are by default “exact match” numbers instead of the old default “broad match.” If you’re using the numbers for SEO results, you want the exact match numbers anyway.

  14. “Hi Bill!

    As always, another excellent post from you! I tend to agree with you that one of the deceptive aspects of a search engine using synonyms, is that sometimes words change meanings in different contexts. Kudos for this nice article of yours.”

Comments are closed.