Not Brands but Entities: The Influence of Named Entities on Google and Yahoo Search Results

Does Google favor big brands when showing search results? That question has been bandied about on the Web for a while, but the answer may be more complicated than just a matter of brands.

The question arose this morning on Malcolm Coles’ blog, in his post Google treating brand names in search terms as site: searches? after Malcolm very astutely discovered certain sets of search results showing more that 2 results from the same domain.

Rather than just looking for brands, it’s more likely that Google is trying to understand when a query includes an entity – a specific person, place, or thing, and if it can identify an entity, that identification can influence the search results that you see.

I’ve written about the topic before, when Google was granted a patent named Query rewriting with entity detection back in May of 2009, which I covered in Boosting Brands, Businesses, and Other Entities: How a Search Engine Might Assume a Query Implies a Site Search.

It’s possible that Google has been doing something like this for a while, but may have turned this process up a notch very recently.

The process in that patent may mean that if Google recognizes when a search query involves a particular entity, and if the entity can be associated with a specific web site, it might show multiple results for that site. For example, Google recognizes that “SEO by the Sea” is an entity, and when I perform a search such as “SEO by the Sea entities,” (without the quotation marks), Google will show a number of search results from SEO by the Sea:

Search results from Google on a search for [SEO by the sea entities] showing the first 8 listings from seobythesea.com

In the past, it was quite likely that Google would have only shown a couple of results from SEO by the Sea, possibly with a link under the second to “see more results from this site.” Now, if I have Google set to show 10 results, the first 6 are from SEO by the Sea. If I have Google set to show 100 results per search result page, the first 8 are presently from SEO by the Sea.

There are more that 8 pages on the seobythesea domain that are about entities, but Google is presently limiting how many it is showing. I’m not sure how those 6 or 8 were selected, but it’s something to investigate.

Named Entities, Google, and Metaweb

Google’s recent acquistion of Metaweb is noteworthy for a number of reasons. One of them is that Metaweb has developed an approach to cataloging different names for the same entity, so that for example, when Google sees names on the Web such as Terminator or Governator or Conan the Barbarian or Kindergarten Cop, it can easily associate those mentions with Arnold Schwarzenegger.

A Google patent filing published earlier this month, Identifying Query Aspects, which I wrote about in Google and Metaweb: Named Entities and Mashup Search Results?, identified Metaweb’s Freebase directory as one place where Google might learn more about named entities, and different aspects of those entities, so that it could present a new type of search result broken into categories.

For example, search results on a search for “Hawaii”, might include segmented sections involving different aspects of the entity “Hawaii,” such as “beaches,” “hotels,” and “weather.”

Named Entities and Yahoo

Will the ideas showing up in Yahoo patent filings be incorporated into what Bing offers sometime in the future? If so, Yahoo’s ideas on entities may fuel some very interesting approaches that we might see from Bing.

My post from yesterday described how Yahoo might interpret queries by looking for entities or concepts within those queries, and applying labels to them in an effort to understand the intent behind a search.

Someone searching for [new york pizza sunnyvale] could be interpreted a number of different ways. Yahoo might rewrite the query a couple of different ways, pulling entities from it and labeling each of those.
For instance:

[new york pizza]/food [sunnyvale]/location
[new york pizza]/business [sunnyvale]/location
[new york]/location [pizza]/food [sunnyvale]/location

The term “new york pizza” might be identified as an entity as a particular type of pizza, or it might be seen as the name of a particular business, with the entity “Sunnyvale” being seen as a specific place. Or the query might be rewritten with the entities “New York” and “Sunnyvale” seen as locations, and the entity “pizza” as a kind of food.

The process involved in the patent would try to identify, using a confidence score, which breakdown of the query is the one most likely intended by a searcher. If more than one interpretation appears reasonable, the search results might contain results from more than one interpretation.

In Yahoo’s description of the implications of this process, they mentioned that their entity-based query interpretations could also influence more than the choice of web pages that show up in search results, and could influence things such as the advertisements that might appear as well.

While performing some queries at Yahoo, I also noticed that map results were possibly being shown based upon query interpretations, even though that wasn’t mentioned in the patent filing.

Today, another patent application was published by Yahoo which describes how query rewriting based upon entities might influence map results as well as other geographic features such as weather results, or if the word “entertainment” is searched for, a list of local events. The patent is:

Entity-Based Search Results and Clusters on Maps
Invented by Joy Ghanekar, Jerry Cheng, Edward Stanley Ott, IV, and Marc Eliot Davis
Assigned to Yahoo!
US Patent Application 20100211566
Published August 19, 2010
Filed: June 23, 2009

Abstract

Techniques are described for providing geographically-related search results in map interfaces that are derived with an understanding of the intent behind the user’s query, and the abstract entities to which the query maps.

This patent filing refers to an even larger initiative from Yahoo involving entities, involving something they refer to as the W4 COMN. I’ll post about that next…

Added, 8/21/2010 – The Official Google Blog posted on this update at Google in a post titled Showing More Results from a Domain, noting that the change in their algorithm is intended to show searchers more results from a single domain when there is evidence that there’s a “strong user interest in a particular domain.” They also note that the last few results (on a search results page set to show 10 results) are from other domains to preserve diversity in the results.

Share

43 thoughts on “Not Brands but Entities: The Influence of Named Entities on Google and Yahoo Search Results”

  1. Thanks for keeping up with all these patents for us – it seems like there’s always something new on the horizon. Search Engines keep getting more advanced – I’m sure the concept of basing search results on entities wasn’t even considered a possibility ten years ago. I keep thinking that eventually they will run out of new things to try, but I guess they are smarter than me.

  2. Howdy Bill,

    Do you suppose it would be a good idea to submit data to Freebase? This post (and your last one about it) seems to provide a very compelling hypothesis that is persuading me to start interacting with Freebase.

    What’s your take on it? Have you/ will you for SEO by the Sea?

  3. Hi Michael,

    Thank you. There is always something new. The idea of using entities isn’t completely new, but it does look very different than it did 10 years back. You can see some of the roots of an “entity” approach in a 1998 paper from Sergey Brin, on extracting information from the Web to build a database of information about books, or menus from restaurants:

    Extracting Patterns and Relations from the World Wide Web (pdf)

  4. Hi Donnie,

    I personally haven’t made any additions to Freebase, but I’ve been thinking about it. It looks like they are very inviting to having people join and make additions. I suspect that Google is looking at other sources of information on the Web other than just Freebase, and there’s probably a good likelihood that whatever Google is doing to identify entities goes far beyond Freebase itself. The Google patent filing I mentioned in the post above, Identifying Query Aspects also tells us that they could be looking at Wikipedia and other sources for entity information as well.

  5. Interesting article, I had never heard (or really thought of) the possibility of using brands / entities to influence results. Whether people are fans of the prospect of this occurring I guess really depends if they own / have rights to that entity / brand.

  6. Pingback: links for 2010-08-19 | Glenn Friesen
  7. Pingback: Google tweaks search results: Vol. 3,454,908 « Let's all go to McDonald's
  8. Very interesting … This could actually be pretty useful. If indeed a phrase from the search query is associated with a site this could mean that the user wants results from this site. If it was me I would have used “site:” but the average user does not know for this command it it just uses the name of the site. So Google is on the right track here if you ask me. But it could have a negative side too. If I am looking for some thing related to Microsoft for example, I will need results from many sites, not only from microsoft.com. So pretty complex subject indeed.

  9. I noticed this early last week, a particular vendor dominated the first page results, they have always had 2 listings for the domain, 2 for the shop and 2 for the support site (on separate subdomains) but last week they had 4 listings for the main site! Luckily the extra ones vanished a couple of days later, but I hope this is not a sign of things to come.

  10. Interesting, I have seen this result in test searches before, but never made any deeper conclusions about it. Have to look in to it more, thanks for the info. Cheers, mate!

  11. Hi Geoff,

    I think one of the important things to keep in mind with this is that Google is trying to show multiple results only when they believe that a particular query looks like a searcher is trying to find as many results as possible from a site that is closely associated with an entity included in a query. On the queries I’ve seen that produce more than 2 results for my domain, they do seem like they could easily be interpreted to be requests for multiple pages from my site.

  12. Hi Paul,

    I agree – Google is straddling a tight line here. I do think this change will be useful for people who want to find multiple pages from the same site – but it does create the possibility that searchers might not see useful and helpful pages that might fit their query well and not be on a site showing multiple pages.

    I imagine that Google did a lot of testing before deciding to go live with this approach, and I’m wondering how they feel about that decision now. I’ve seen both positive and negative responses to their displaying multiple results.

  13. Hi Johan,

    You’re welcome. Malcolm indicated in his post (linked to at the start of this post) that he was hesitant to write about this change in Google’s search results because he wasn’t sure if it was something new, or if it had been happening for a while, and he just hadn’t noticed.

    Google does do a lot of live testing of changes that they may or may not implement, and if you see something odd, it may be a test, or it may be a permanent change. Don’t be afraid to take a screenshot, and talk about it with others. :)

  14. I think this new “named entities” approach of Google’s kinda decreases its overall usability. If they want to provide a “good user experience”, and this is a mantra that they have been espousing throughout their history, then provide the 2-3 search results that they used to, and then have a search box in the SERPs for that site, if it is indeed the search result that most users are clicking on after entering a specific search query.

  15. Also, the average user will not perform a site search – it’s generally webmasters and SEOs, who know who to do that anyway.

  16. Pingback: Brands In Google Search Results | Search Engine Optimisation (SEO) | WMpS Blog - Surfing The Digital Wave
  17. Hi Gracie,

    The change does seem to be geared towards helping searchers who seem to be looking for information from just one specific site who, as you note, may not be aware of how to use the special “site” search that Google offers people to do that with (site:www.example.com). The Query rewriting with entity detection makes that explicitly clear.

    But I think it might be more helpful for them to show a couple of results from a site like they have in the past, and include an expanding link that says something like “see more results from this site.”

    Google does seem to have gotten a lot of peoples’ attention with this change.

  18. “then provide the 2-3 search results that they used to, and then have a search box in the SERPs for that site”

    Agreed, definitely a better option.

  19. Hi Andrew,

    I have seen Google provide a search box like that in the past, which I thought was a good idea. I am wondering thought, if they made this change because they found that not too many people were using those search boxes. Would love to see the numbers.

  20. Pingback: Risultati multipli per dominio nelle SERP: non c’è bisogno di essere la Apple
  21. I read another article on this at another site. It’s all a big fix :-) What it also mentioned in this articles was how when the little person uses black hat methods, they are banned, but a report found BMW had been doing the same thing, and they were banned for a total of 3 days.

    Slowly loosing faith in google and especially with some of the results they return.

  22. Hi Darius,

    I don’t have any of the specific details of what BMW was doing, and what they did to get back into Google’s search results, but I do believe that anyone caught using blackhat methods should be banned, regardless of the size or stature of the company involved. Google does have a reconsideration request process that site owners could follow to attempt to get back into Google’s search results.

Comments are closed.