How Google Might Ignore Insignificant Terms in Queries

The most important step in doing keyword research is entering a keyword phrase into a search engine like Google, and seeing what results show up, and trying to understand why the pages that appear within results are there. If you can’t do that, then it’s time to dig down and start learning.

Whether you’re a searcher looking for information on the Web, or someone doing keyword research for a website, it’s important to have an idea of the many different ways that a search engine might treat a search you perform. For instance, if your search is one that might trigger Google to show results from a specific web page associated with a named entity (a particular person, place, or thing) at the top of those results, you shouldn’t necessarily be surprised to see that site listed first in search results. This is something that is done algorithmically by Google. Just stating that Google has a “magical” brand preference is a mistake in that instance. It’s better to try to understand how that algorithm might be triggered instead.

Would you eat this mushroom before researching and investigating first whether or not it was safe?

Likewise, when you perform a search for a term such as [hospice], Google might decide to show a map result from Google Maps in Web search results because their universal search algorithm suggests that the query has a local intent, and the searcher is likely looking for a nearby hospice. Again, it would be a mistake to make the assumption that Google is favoring their own “property” in Google Maps when the reality is that the vertical search result of Google Maps is what searchers are actually looking for.

And with a term such as [hospice], Google may also insert within those search results some web results as well, as localized organic results, since searchers are likely looking for a nearby hospice when using that term as a query. If you look at results for a query and how they change when you change the specified location on your browser in Google, you can often see the results change for these localized organic results. For some locations, you might see changes, while for other locations, those localized results might not appear.

Insignificant Approaches to Backing Off Query Terms

Sometimes when you perform a search, and the results aren’t great, a search engine might remove some of the terms of your query. This has sometimes been referred to as “backing off” on that query.

In the past, these less than helpful terms might have been things like stop words, such as “a,” “the,” “of,” “is,” when they are included in that query. But that doesn’t work well when to comes to a query such as [the matrix], which is usually best served as a movie starting Keanu Reeves rather than an array of numbers or symbols. It also doesn’t work well with a quote such as [to be or not to be], which could be construed as stop words, or as a question about mortality itself.

Or the words that might be pulled from a query could have been “common words” that tend to show up frequently in lots of documents on the web. While these frequently occurring words could also be stop words, they don’t necessarily have to. But removing them might also not work well in returning results to search that searchers really want to see.

A Google patent granted today looks at terms within queries, and explores how a search engine might ignore some of the words that might be of little significance based upon the context of a query. It might consider such terms as optional.

Insignificant Terms in Queries

Instead of classifying words on the basis of whether or not they are stop words, or words that frequently appear on the Web, it makes sense to look instead at words that may not appear too frequently within the context of other words within the same query. That is at least, if there aren’t many or even any results that show up for the original query performed by a searcher.

The method in thie patent would look at query logs to “identify pairs of queries that are the same except that one of the queries in each of the pairs of queries includes an extra term.”

The extra term might be one that could be one of little significance when looking at one query phrase with the additional term and and another similar term without the phrase. For example, if I search for [bombastic texas plumbers] and [texas plumbers], I might see a few extra search results at the top of the query list, but then many of the same terms appearing for both phrases. It appears that Google has removed the requirement of the use of the word “bombastic” in results for the longer term of [bombastic texas plumbers] to more results for that term.

The patent is:

Determining query terms of little significance
Invented by John Lamping and Christophe Bisciglia
Assigned to Google
US Patent 8,346,757
Granted January 1, 2013
Filed: March 28, 2005

Abstract

A system determines whether a term of a search query is a term with little significance based on a context of the search query. The system performs a search based on the search query while considering the term with little significance as optional when the search query includes the term with little significance and presents a list of search results based on the search.

The patent goes into a fair amount of detail distinguishing between “document selection” and “document ranking.”

Document selection involves identifying documents that may be candidates for retrieval by a search engine. To be a candidate, a document usually needs to include all of the words within a query, with the possible exception of stop words.

Document ranking, on the other hand, looks at how well a candidate document might rank for a specific query.

When searchers include a lot of terms within a query, the amount of results that might match it often tend to decrease, and sometimes those queries include extra terms that really have little informational significance, and don’t describe their actual informational needs.

Consider the query [information about mazda cars], and whether or not the terms “information,” and “about” really help a searcher who wants to learn more about mazda cars. They really are insignificant because there may be many very relevant pages about that brand of car that doesn’t include either term.

The patent goes into a fair amount of detail about how terms within queries might be analyzed to see whether or not they are significant, on the basis of previous similar queries performed by other searchers. It also includes a few examples as well.

Take Aways

This particular patent describes how Google might analyze query terms to determine whether or not some are insignificant with a search, and to remove that term as an absolute requirement to be included within documents selected to be shown as results to a searcher.

The bigger lesson though is to have a sense of when Google might boost some search results and diminish others based upon a number of different factors, especially when you’re doing keyword research.

Is the search query a navigational one, where Google might prefer to show a certain website first before any others?

Is the query one that might be associated with a particular named entity, which might be associated with a particular website?

Will Google assign specific categories to certain queries, and assign categories to specific websites as well, and boost search results in document rankings where the category for the query and the category for the website match?

Websites may also be implicitly associated with specific locations as well.

Some pages may rank well in search results based upon the terms and phrases withing them co-occur in other documents returned for the same query terms under a word relationship approach, or which phrases tend to show up repeatedly in documents with search results for a query under a phrase-based indexing approach.

The more you understand how a specific site might be reranked in search results, and why, the easier it may be to analyze the search results that you see for a specific query phrase, and get a sense of what it might take to rank well for a query term. I’ve written about many other reranking approaches from the search engines in the past.

Share

34 thoughts on “How Google Might Ignore Insignificant Terms in Queries”

  1. Great breakdown of this latest Google patent. How do you think this will effect long tail search results?

  2. First of all, great blog Bill. You’ve just gotten a new follower. I find it amazing how google has evolved into a thinking monster and I believe there is still a lot to come. I just hope we are benefited not affected.

  3. Thank you Bill, I always enjoy reading your research, you done it again with this one.
    As it never ceases to amaze me to see so many SEO’s actually get this part wrong “keyword research” because just like you said “to have a sense of when Google might boost some search results and diminish others based upon a number of different factors, especially when you’re doing keyword research

    Having that sense will certainly allow the upcoming SEO’s precision target keywords which as we all know, “NO keyword research tool can give”. Anyways, I am right now posting your article on the usual places.
    By the way, can you also give us your predictions about where Google seo will be at in 2013?

  4. “Websites may also be implicitly associated with specific locations as well.”

    Isn’t Google doing this all the time anyway with Google Places. Maybe I missed the point here somehow and apologies if that is the case.

  5. Interesting article. In my experience the use of insignficant terms in search queries is usually linked to the experience of the user i.e. casual or new internet users often tend to include more of them as they are more likely to enter natural sentences, wheres more technical users are mostly likely to enter only keywords, often ranked in importance. So how search engines treat insignificant terms is crucial to better SEO.

  6. Great post Bill. In some of the niches I’m in I’ve already started seeing evidence of this sort of activity in search results. It’s definitely affected the long tail as pages without the “less important” terms are often now out-ranking pages with the extra terms. That never used to happen.

  7. As someone with a place name in my URL (which is relevant to my content), I’m rather confused post Panda what the implications are. I’m sure all will be clear with time!

  8. With every new updates to its algorithm, Google is inevitable heading towards its direction of becoming more like “human”. That is to say, Google tries as much as it can afford to answer queries based on a really, really comprehensive analytical model that manifest itself in its complex ranking algorithm. So comprehensive and contextually aware we never knew possible.

    Thanks for this post Bill!

  9. It always takes some time before these things hit Google.se (Sweden). But I thinks it’s good to be prepared and even if you use a lot of “may” and “might” in your posts, it is still things to take into consideration. Thanks for a great roundup Bill.

  10. very solid article backed up by truly amazing data – I get lost in hours studying different SERPs and JUST when I’m about to reach a conclusion, it all changes :)

    Would love to see a full list of the “insignificant” keywords per the patent, if there is one.

  11. Bill,

    Great post as always! This may be referenced in another post, and if it is I apologize. You bring website categorization into the post in the take aways. Is there any evidence to suggest that Google is categorizing websites on a deeper level, such as subfolders or subdomains? For example if XYZ.com’s main goal is to sell mazda car parts, it’s reasonable to assume they would classified with transactional queries, but what if they have a blog describing mazda maintenance would that get shafted because the main site is transactional? Looking forward to your thoughts.

    Thanks,
    Chris

  12. Awesome post Bill! I’m really fascinated with the implicit location association that you mentioned. As a searcher this seems like an area where Google could do better. For example, when I search “SEO” my results are tailored based on my browser location, but why? There is certainly no implicit location association there, is there?

  13. Keywords, tags, categories, images, and a little seo seem to be the key to ranking well once a blog has significant content.

    I do not understand complete Google search algorithims nor do I ever expect to understand them. They are just way too over my head.

    Interesting article but not sure how I can actually use the information.

    Wayne Melton

  14. Interesting stuff as ever Bill. Quick question:

    From your experience and knowledge, when you see a patent filed in 2005 but only granted in 2013, would you think this is something that may already be in use by Google, or would you expect them to wait until the patent is granted before implementing it?

    I know it’s difficult to know the inner workings of Google, but it would be interesting to know your thoughts considering your familiarity with the patent system.

    Cheers

  15. Hi David,

    There’s really no telling.

    I recently saw a Google patent granted involving how Google ranks pages in the Google directory. It was filed in 2000 or 2001, and appears to have influenced how pages were listed in the Google directory for years. The Google directory was discontinued by Google before the patent was ever granted.

    I’ve also seen a granted Google patent that described how Google might show instant results as a query was being typed into search results. The patent was granted 5 years before Google started showing instant search results.

  16. As a new reader/subscriber to your blog, I would just say you give the REAL heads-up on what Google and other search engines are implementing and your access to patents approved are just awesome!

  17. Great point, I think that search engines are our most advanced form of artificial intelligence. There is a huge difference between the results from different search engines. Google is number 1 because they return the best results, even when we don’t type the most accurate search keywords.

  18. It’s interesting to see that instead of creating a basic stop word list, they use an algorithm based on a context of the search query. I think one of biggest Google’s asset is its search history since 1998, which allows to analyze trends, developments and research course allows to refine ambiguous cases.

  19. Great post on a fascinating topic. It seems like Google is getting more and more involved with all sorts of patents (ongoing lawsuits with companies like Vringo might add to the reasons for all of these patents). This one is particularly interesting due to the many points you bring up in the “Take Aways”. Great piece!

  20. Great post Bill,

    I already do something like this actually. When I need to rank for a term, I open up opensiteexplorer and research what the top few sites did. After I figure out what they did I then figure out whether or not it’s ethical and if it’s worth my time pursuing.

    Have a nice day and great post!

  21. Very insightful. I never realized that the stop words could actually be important in the query, but it makes sense that they do, depending on context. I guess if the query has already some strong keywords, the stop words can be discarded.

  22. Wow, lots of educated theories on how Google works! It’s true that much of it is theoritical and have no practical use, but some of it is hung over my head.

    “And with a term such as [hospice], Google may also insert within those search results some web results as well, as localized organic results, since searchers are likely looking for a nearby hospice when using that term as a query.” I couldn’t really understand this part. Google inserts words to our searches according to user tendencies?

  23. Hi Dim

    When Google files a patent on something, that means that the ideas within the patent are something that they’ve considered carefully, and that it can often be more of a business decision or a programming decision as to whether or not they add ideas or processes described within the patent, than it might be whether or not what’s being discussed is possible.

    It’s a very real fact that some queries don’t return highly relevant search results, and that just removing one or two terms from those queries may not be the answer to producing better results, or even which terms to remove if that’s going to be the approach that Google uses. The patent gives us some ideas of how Google might approach that issue. Not theory that I’m making up, but rather exploration on the part of Google itself.

    Google does sometimes insert organic results into a set of search results that are based upon the location where a searcher says they are searching from (the part of Google that lests you set your location). I refer to these results as “localized organic results”, and I’ve been seeing them on and off since at least 2009. Google isn’t inserting those based on user tendencies, but rather on context – what your Google setting is for your location. And if you haven’t set a specific location in Google, they often will tend to guess at your location, and if you look at your settings from the Google Home page (regardless of whether or you signed into Google or not), you will see where they think you are searching from (or a different location, if you’ve intentionally set one).

  24. Hi Bill

    Great article here. I still need to do some more research on this topic, as my clients get most of my times these days ;-). But, is this saying that if you searched “”seo boca raton” and “seo in boca raton” that the results would be different?
    Please feel free to look at our Blog and comment if you would like..

    Justin
    SEOjus

  25. Hi Justin,

    Thanks. It is hard to try to do research and give clients the time they deserve, but I think doing enough research to help their needs is essential to giving them the best you can with the time you have.

    What I’m saying is that this patent describes how Google might handle search results when there may not be a lot of relevant search results.

    At some point in the past, search engines would take a query like that, and try to provide more results by doing things like adding more results after removing stop words from the query, or removing other terms so that there would be more results for a searcher to look at. Under this approach, Google might still remove one or more words from a query when the results might not be very good. But this patent describes how it might compare the search results from the original query with the results that it might show after removing one or another terms, in an effort to try to give searchers’s better results. In other words, it’s trying to figure out if some of the query terms are insignificant.

  26. I believe that in 2013 chasing the long tail may become less profitable as well. In the past, SEO and PPC effort was poured in to chasing lesser searched for long tail terms. With the new patent granted to Google around ignoring terms of little significance, this could change. This patent means that unless you’re ranking absolutely top for that long tail term, shorter tailed results may appear in favor of the longer query with insignificant terms in it.

  27. Hi
    I think this is a bit in depth for me at the moment going to have to read it a few times to get it clear in my head. My keyword research so far has basically using Adwords and seeing how many searches it was getting. I am starting to see that there is a lot more to it than that. Have to see if you have a more basic post as well.

    Thanks lee

  28. Hi Bill

    Great blog. I’ve enjoyed reading several of your articles.

    The first paragraph hits the nail on the head – I often tell clients to type something into the search engine, sit back and then think about why the sites in the results shown are there before getting carried away with SEO on their sites.

    Also, good to see some details on the logic G uses to discard terms which are not important.

Comments are closed.