How Google May Map a Query to an Entity for Suggestions

Search predictions come from:

– The terms you’re typing.

– What other people are searching for, including trending searches. Trending searches are popular stories in your area that change throughout the day. Trending searches aren’t related to your search history.

– Relevant searches you’ve done in the past (if you’re signed in to your Google Account and have Web & App Activity turned on).

Note: Search predictions aren’t the answer to your search, and they’re not statements by other people or Google about your search terms.

~ Search on Google using autocomplete

A website by the name of SourceFed produced a video that claimed that Google was intentionally manipulating search results to make Hillary Clinton look good, because it wasn’t showing results tied to her name that SourceFed insisted Google should be showing.

SEO Consultant Rhea Drysdale posted a response on Medium that shot holes in their argument. Rhea started off with:

SourceFed believes Google is manipulating search results in favor of Hillary Clinton, because “Hillary Clinton cri-” did not return “Hillary Clinton criminal charges” and “Hillary Clinton in-” did not return “Hillary Clinton indictment.”

I thought it was interesting that Google was just granted a new patent that describes one way they might be generating suggestions and autocomplete responses to queries on May 31, and thought it was worth looking at. I also thought it was interesting because it was trying to address how entity information might be used with autocomplete suggestions. The patent is:

Associating an entity with a search query
Inventors: Olivier Jean Andre Bousquet, Oskar Sandberg, Sylvain Gelly, Randolph Gregory Brown
Assignee: Google
US Patent 9,355,140
Granted: May 31, 2016
Filed: March 13, 2013

Abstract

Methods and apparatus for associating an entity with at least one search query. Some implementations are directed to methods and apparatus for identifying multiple queries associated with an entity and identifying one or more of the queries as an entity search query that provides desired search results for the entity. Some implementations are directed to methods and apparatus for identifying a particular entity and, in response to identifying the particular entity, identifying an entity search query corresponding to the particular entity.

The process described in this patent provides search suggestions to searchers using a query to entity mapping intended to show off new aspects of entities and queries to provide improved search results to searchers. This is a fairly complicated process, and is worth looking at to get a better sense of what is going on behind the curtains when Google does what it does, so that we don’t make assumptions that might not be very good, when it doesn’t do what we expect it to be doing.

When we search for Hillary Clinton in a Google Search Box, we see a number of query terms that Google is presenting as autosuggestions.

Hillary Clinton Auto Suggestions

When we choose one of those, like the term “email,” we see some additional words added to that query term:

Hillary Clinton email query suggestions

If we follow the suggestion [hillary Clinton email charges], we see a story that is about the possibility of criminal charges being filed against the candidate:

Hillary Clinton email query charges results

Google’s algorithm chose to map a query to the entity “Hillary Clinton” that used the terms “email charges” rather than “criminal charges” as SourceFed was guessing should be how Google would map the topic of that query. Sourcefed didn’t map out the query the way that Google did, but Google did have autosuggestions that covered that topic. If we compare Google trends information for both terms added to the entity “Hillary Clinton”, those terms seem to be close to each other in regards to how much interest searches appear to have shown for each of those queries:

Email Charges vs. Criminal Charges trends

Take Aways

I was left wondering why this patent doesn’t discuss trends, and if I would have to look for another that did (I chose to do that.)

This patent doesn’t mention the use of Google Trends in the identification of queries to map to entities, but we do know that Google Trends have used the Machine Identification numbers that would be assigned to entities at FreeBase.

This patent does tells us that properties associated with some entities may be identified at online encyclopedias such as Freebase, and entities may be assigned unique entity Identifiers.

This patent does focus upon how it might be helpful in telling one entity apart from another using properties associated with different entities, and uses the Entity “Sting” as an example, since there is a well known musician and a well known professional wrestler who both use that name, and they are different people:

Also, for example, in some implementations, the query suggestion system 135 may identify one or more entities associated with a received query via the query to entity association database 125. The query suggestion system 135 may provide one or more query suggestions based on the identified entities, with each of the query suggestions being particularly formulated to focus on a particular entity. For example, the musician Gordon Matthew Thomas Sumner and the wrestler Steve Borden may be associated with the query “sting” in the query to entity association database 125. In response to a received query “sting”, the query suggestion system 135 may identify the musician Gordon Matthew Thomas Sumner as the dominant entity from the query to entity association database 125 and suggest an alternative query suggestion to the user, with the alternative query suggestion being particularly formulated for the musician Gordon Matthew Thomas Sumner (e.g., “sting musician”).

The query to entity mapping described in this patent based upon terms describing properties found in a knowledge base such as Freebase that can help tell that one is a musician and one is an athlete. Using an autosuggest based upon using properties about those entities to find query terms to use to map to the entity shows how query terms may be selected carefully.

Since that patent focuses upon queries that might fit best with different entities, I looked at other patents that involved autocomplete to see what they said about using trend information. This one showed how trend information and personalized search histories could be used to generate suggestions using autocomplete:

Providing customized autocomplete data
Inventors: Nicholas B. Weininger and Radu C. Cornea
Assigned to: Google
US Patent 8,868,592
Granted: October 21, 2014
Filed: May 18, 2012

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing customized autocomplete suggestions. First profile data is obtained for a first user. Second profile data is obtained for second users that submitted search queries, where the second users are different from the first user. Based on the first profile data and the second profile data, similarity scores are determined. The similarity scores are each indicative of a degree of similarity between the first user and at least one of the second users. A proper subset of the search queries is selected based on the similarity scores, and an update for an autocomplete cache of a computing device associated with the first user is generated using the selected subset of search queries. The update is provided to the computing device associated with the first user.

This patent is telling us that autocomplete suggestions may be customized or personalized, but could use trends in word usage when they offer suggestions:

Autocomplete suggestions can be customized for the interests, attributes, and behavior of a particular user or a group of users. Using an autocomplete cache, personalized autocomplete suggestions can be generated when a network connection is unavailable. Using the autocomplete cache, personalized autocomplete suggestions can be presented in a manner that limits network latencies. The autocomplete cache can be updated to reflect current topics and trends in word usage, especially topics and trends among users with similarities to a particular user.

So, the “trend” information used in autocomplete for most people may not quite be the same that is shown in Google Trends, but may be customized for
each searcher performing a search.

Regardless of which autocomplete process Google is following; Rather than charging Google with showing a bias, it may be best to see what query suggestions Google provides, and see what range of topics and concepts that those cover, instead of expecting certain words to show up, like in this instance where “email charges” was a suggestion and “criminal charges” wasn’t, but Google appeared to be covering very similar concepts with those suggestions.

Google wasn’t purposefully avoiding a topic; it was just using words it preferred to use to offer as a query suggestion.

22 thoughts on “How Google May Map a Query to an Entity for Suggestions”

  1. Good stuff, Bill.

    It does seem however that there is something else in play these days where an entity Google suggest can recognize is an individual.

    Certainly, across a whole range of searches done, on persons of notoriety including politicians, sportsmen, entertainers, and other ‘celebrities’, the suggestions are the cleanest and most positive I have ever seen them. I found this held true for pretty much all major celebrities, but not for mere local celebrities or ‘C-listers’ etc. It seems too widespread, going too far back into years past, for me to believe the selection was by hand. Thus my theory is that the ‘Entity’ level or recognition of a person is involved.

    However, I’m currently investigating the fact that certain locations recognized as entities seem to also have more positive and less negative suggestions. Shaun Anderson of Hoboweb shared in a post earlier that ‘Dunblane’ gets suggestions that omit ‘dunblane shootings’ until after it is clear a searcher is about to input more than the place name alone.

    However, a search for ‘Columbine’ instantly suggests ‘columbine shooters’ and ‘columbine victims’. No need for a space after Columbine to show you have more than a general tourism interest in the location.

  2. Hi Ammon,

    Thanks for sharing those thoughts.

    The places you are mentioning are related to events or occurrences that are well known, and are entities in Google’s knowledge graph as events such as the columbine shooting and the query suggestions being shown for them are of the query mapping for an entity type of autocomplete suggestions.

    I say this because when I search for either place, the knowledge panel that appears for each is about an event:

    Dunblane school massacre
    Columbine High School massacre

    Shaun’s observation regarding Dunblane, and how it hesitates showing what may be a negative query suggestion regarding the shootings is interesting; and I could see that continuation of typing as a perceived purposeful triggering of those query suggestions.

    The entity process regarding autocomplete has only be a granted patent for a little more than a week now – Google may have started using it before then. But, now that I’m aware of it, I’ll be looking at those thinking of it being applied.

  3. Hello Mr Slawski,
    Great post.

    Some years ago, I found some websites where some marketers were positioning themselves as “Online Reputation Managers”.

    Among the services they offered was the ability to help their clients change or “flush down” the negative words that appear next to their names on Google’s autocomplete.

    I always wondered whether this was actually possible and how they would even go about doing it.
    So this article brings that back to mind.

    Because, if it were possible, then couldn’t another theory be that some so-called Online Rep marketers would poach the celebs and politicians and offer to clean-up their autocomplete suggestions?

    Just wondering if you knew about this.

  4. Hi Rotimi,

    Thank you. Happy that you liked the post. The firm I am Director of Search marketing at, Go Fish Digital https://gofishdigital.com/ does online reputation management campaigns for clients. There are a few different tasks we undertake to help clients with how they are perceived online, but autocomplete results are one of them. There are a lot of people who might work to help improve those autocomplete terms, like public relations people as well. The celebrities and politicians don’t necessarily want those results showing on searches of their names and will often seek someone out to help them have more positive things showing for them.

    Online marketers don’t manipulate Google search terms to say bad things about people in order to gain new clients; often enough bad things happen that people and businesses need help with things written about them online that might impact their reputations negatively. In addition to negative autocomplete results; people sometimes are unhappy with search results that appear for them as well. There are plenty of reasons why people might seek help with their online reputations. But, it would never be something that we would do on purpose, to take actions aimed at negatively impacting someone’s reputation. The Sourcefed article, possibly to gain attention to itself, ended up having possible negative implications for the politician it was writing about.

  5. Ok great !
    Thanks sir.

    But actually, I wasn’t thinking that marketers would create negative results in order to win new clients.
    I imagine the negative news on Mrs Clinton would be generated organically (thru the influence of traditional news).

    Anyway, I asked my question because, as I was reading your post, I imagined another way of SourceFed looking at their dilemma, instead of instantly assuming & concluding that Google were “cheating” for Mrs Clinton :

    So I thought, “Is it possible that Hillary’s campaign team had hired a squad of Online Reputation Managers to try to clean up as many results that could show up for queries that start with Hillary Clinton Cr….???”

  6. Hi Rotimi,

    As we can see if we look at the autocomplete results that appear for Hillary Clinton, there are ones that might be seen as negative, but they don’t follow the formatting that Sourcefed was guessing they would. It’s possible that Hillary Clinton has hired someone to do online reputation management on her behalf (what we see in autocomplete possibly could have been worse). But, there’s no saying that there was ever a query suggestion for Hillary Clinton for “criminal charges.”

  7. Hi Bill,

    Great post, have you seen, or read these 2 posts, one about how Eric Schmidt is funding and backing 3 data science startups (Civis Analytics, cir.cl, and the Groundwork) who are working for Hillary’s campaign to help her get elected. http://qz.com/520652/groundwork-eric-schmidt-startup-working-for-hillary-clinton-campaign/ “With tech policy an increasingly important part of the president’s job—consider merely the issues of NSA surveillance and anti-trust policy, not to mention self-driving cars and military robots—helping to elect yet another president could be incredibly valuable to Schmidt and to Google.” and this one by Jullian Assange https://wikileaks.org/google-is-not-what-it-seems/

  8. Hi Andrew,

    Thanks. The Sourcefed video tried to create linkbait by making claims that Google was purposefully presenting biased search queries by cherrypicking specific queries that they expected to be shown as suggestions, and when those didn’t show, they were making claims that Google was purposefully hiding information from the public. In reality what was happening was that Sourcefed was showing off a lack of understanding of how autocomplete works, and using that ignorance to claim that Google was intentionally biased, which the evidence doesn’t support (there are query suggestions offered by Google that show people were searching Google for information about charges and a possible indictment related to emails; Google wasn’t hiding those but was just using different language than sourcefed did).

    Eric Schmidt’s Funding of a startup that is working to support Hillary Clinton’s political campaign and the wikileaks article about Eric Schmidt both are about Eric Schmidt and have nothing to do with Google’s presentation of query suggestions about Hillary Clinton either. They neither support nor provide us with more information about the claims that Sourcefed were making. I wrote this post because I was interested in exploring how Google generates query suggestions, and sharing information about that.

  9. Nice post Bill as always. It’s interesting to see how the trends continued —

    http://bit.ly/1UgDYom – last 7 days “email charges” vs “criminal charges”

    http://puu.sh/prJGu/411bfa73b8.png – last 7 days stopping at “email” vs “criminal”

    Clearly some very high interest in criminal over email, yet I pulled the same autosuggest results as you did above.

    I wonder ultimately how this determination is made.

  10. It does seem however that there is something else in play these days where an entity Google suggest can recognize is an individual.

  11. Hi Todd,

    If you’ve read some of the statements from Google about this recently, one of the points they made is that they filter out some query suggestion terms that might be perceived as being negative. See:

    https://search.googleblog.com/2016/06/google-search-autocomplete.html

    They tell us there:

    “The autocomplete algorithm is designed to avoid completing a search for a person’s name with terms that are offensive or disparaging. We made this change a while ago following feedback that Autocomplete too often predicted offensive, hurtful or inappropriate queries about people.”

    It’ possible that “crimes” and “criminal” may be considered hurtful.

  12. I always thought that google search prediction are the terms which are mostly searched by its users.

  13. Hi Rocky

    As we see from this particular example, (1) Google may filter the use of certain words which may impact negatively upon people, and (2) The results you see may be personalized based upon your location and your past search history, and the numbers from those may be different than overall Google Trend information.

  14. Really liked your Blog. Inspired me to read more useful blogs like this. Most of points i read are valid and factual. Thanks for sharing your knowledge.

  15. Hello This post is exceptionally educational and an awesome post.
    I generally felt that google look expectation are the terms which are for the most part sought by its clients.
    Truly enjoyed your Blog. Enlivened me to peruse more helpful online journals like this. A large portion of focuses i read are legitimate and genuine. Much obliged for sharing your insight.

  16. nice explanation to us about how google helps to queries ,which is interesting articles .am amazed i am reading the articles one by one since yesterday night and every time i find a new article grabbing my attention within a post.

  17. previously I have not thought to apply this way, after reading the tips you gave, I’m so attracted to want to try it

  18. I figured I would see just how quick Google can use tiemly events to finish your query (autocomplete)

    So there has been a big fire here in southern california called the “sand fire”

    so I started to type in the phrase….and behold….the first 2 autosuggest phrases were

    “sand fire update”
    “sand fire map”

    I would have never though Google would be able to return such timely suggestions!

Leave a Reply

Your email address will not be published. Required fields are marked *