10 Most Important SEO Patents: Part 6 – Named Entity Detection in Queries

In the last installment of this series, we looked at how Google may be using phrase based indexing to use the fact that many phrases often tend to co-occur with other phrases within the content of web pages, to re-rank those pages. When we look at phrases, we also need to drill down to a special set of phrases describing named entities, or specific people, places, or things. In addition to trying to understand which phrases might tend to co-occur with those named entities, the search engines may look to other sources such as Wikipedia, Freebase from Metaweb, the Internet Movie Database (IMDB), and different map databases to attempt to understand when a phrase indicates an actual (or fictional) entity.

Google, Bing, and Yahoo all look for named entities on web pages and in search queries, and will use their recognition of named entities to do things like answer questions such as “where was Barack Obama born?”

A Question and Answer result at Google for the question of where Barack Obama was born.

The search engines associate attributes and facts associated with named entities, and when it comes to local search, they will associate addresses and websites as well. I described how Google may be associating specific websites with specific businesses at specific locations back in 2006 in the post Authority Documents for Google’s Local Search.

How search engines treat named entities specially can be pretty clearly seen the in following Yahoo search results, where a search for “Justin Timberlake” includes a display of both “related people” and “related movies” in the left column on a search using his name:

On a Yahoo search for [Justin Timberlake], the left column of the search result shows related people such as NSync, Andrew Garfield, Mike Myers, and Joey Fatone, and related movies such as Alpha Dog, and The Love Guru.

There are other cases where it’s not so obvious that a search engine is using its recognition of a named entity to affect search results, and the number 6 patent in this series of the 10 most important SEO patents is one that has sometimes been pointed at as proof that Google is biased towards brands, but in reality has a broader impact than that. The patent is Query rewriting with entity detection.

I wrote about this particular patent in the post Boosting Brands, Businesses, and Other Entities: How a Search Engine Might Assume a Query Implies a Site Search. The Official Google Webmaster Central Blog also described the impact of the approach behind this patent in their post, Showing more results from a domain.

Yahoo was granted a patent that is similar in a number of ways, which I wrote about in the post Not Brands but Entities: The Influence of Named Entities on Google and Yahoo Search Results.

Microsoft also uses their recognition and knowledge of named entities in a number of ways as well. For example, in the third part of this series we looked at how Microsoft might be Classifying Web Blocks with Linguistic Features. One of the “linguistic features” described in the Microsoft patent are named entities.

The classification system uses linguistic features to help classify the function of a block because developers of web pages tend to use different linguistic features within blocks having different functions. For example, a block with a navigation function will likely have very short phrases with no sentences. In contrast, a block with a function of providing the text of the primary topic of a web page will likely have complex sentences. Also, a block that is directed to the primary topic of a web page may have named entities, such as persons, locations, and organizations.

In the Entity Detection patent from Google, the search engine attempts to identify when there is a named entity included within a search query, and if it has associated a specific website with that named entity, it may show more than one or two results from that website at the top of search results.

For example, on a search that includes a specific person such as [Barack Obama campaign], it might show a number of results from the same site:

A search result for the query [barack obama campaign] showing 4 results.

In a search that includes a particular place or landmark such as [spaceneedle hours], Google may also show a number of results from a particular domain:

A search result for the query [space needle hours] showing 4 results.

In addition, a search query that includes a business name or brand, such as [seo by the sea named entities] may also include a number of results from a site that it has associated the named entity with:

A search result for the query [space needle hours] showing 4 results.

More than one named entity might be associated with a particular website, which we can see for the query [bill slawski named entity], which shows 4 results similar to those from the “seo by the sea named entities” query above:

A search result for the query [bill slawski named entity] showing 4 results.

The results for the queries that include the entities “SEO by the Sea,” and “Bill Slawski” (yes, I’m an entity according to Google, but likely so are you), show the same pages but in a slightly different order. Google was treating my name as a named entity associated with my site before Google launched their Authorship markup, but it’s possible that the authorship markup that enables the search engine to associate specific people with content they’ve created on the web might help Google make associations between named entities and websites.

Conclusion

Knowing that queries that include named entities might be treated differently than queries that don’t is important to both searchers and SEOs, and can result in special features appearing within search results such as the “related people” display at Yahoo, or the expanded results (like an implied site search) at Google, or possibly in a number of other ways.

I’ve written about named entities a number of times in the past and how search engines might be using them:

Seems like this was the week for people to write about named entities, with some excellent posts from Justin Briggs – Entity Search Results – The On Going Evolution of Search and David Harry, who had a 2 part series on the subject – Named Entities; associations for SEO and SEO & Named Entities; what can we learn?

All parts of the 10 Most Important SEO Patents series:

Part 1 – The Original PageRank Patent Application
Part 2 – The Original Historical Data Patent Filing and its Children
Part 3 – Classifying Web Blocks with Linguistic Features
Part 4 – PageRank Meets the Reasonable Surfer
Part 5 – Phrase Based Indexing
Part 6 – Named Entity Detection in Queries
Part 7 – Sets, Semantic Closeness, Segmentation, and Webtables
Part 8 – Assigning Geographic Relevance to Web Pages
Part 9 – From Ten Blue Links to Blended and Universal Search
Part 10 – Just the Beginning

Share

40 thoughts on “10 Most Important SEO Patents: Part 6 – Named Entity Detection in Queries”

  1. Personally, I like this technology. One thing I noticed specifically was the fact that the results you displayed surrounding the Justin Timberlake query seemed so much more colorful…so much more social.

    I do wonder though, as an SEO, would it be possible to optimize your site so as to show complimentary sites and services as opposed to competitors in search queries should your site be labeled as an entity?

    You never know. Competitor sites may appear in results if they are similar to yours.

    Mark

  2. Got to give it to yahoo for their amazing named entity implementation, but i presume google will be their soon enough. Its really great to see how search engines are beginning to understand what users are acctually looking for, it makes the services so much more user friendly. And it will only get better..

    Lana

  3. Interesting using the president as an example. So how damaging to the presidents search results would it be to use the presidents full name of Barack Osama Obama?

  4. I think it is amazing that the search engine will try to interpret and answer my question. It sounds like program that was playing jeopardy. What was it called? Watson?

    As the technology grows it will reduce the really odd search results that pop up.

  5. Awesome writing. but may I point out something? not to flame or come off rude but you mention 4 results within the same page stemming from the search “[seo by the sea named entities]” you mention 4 listings. but the top listing should not be included. it simply shows up as a number 1 result because you posted it via Google, that’s why your picture avatar shows up. Only you will see it in that position. it’s also stored in your cookies as well via Google. read through your cookies sometimes. you would be amazed by how much B.S Google, Amazon and Facebook store/track and transmit.

  6. That is true,Bill. As an author and business man, I can relate to how you said, “The Official Google Webmaster Central Blog also described the impact of the approach behind this patent in their post”. I hope more people discover your blog because you really know what you’re talking about. Can’t wait to read more from you!

  7. Man I love digging through your posts to see what the searcn engines may be or are up to next!

    For example, it’s easy to see how the named entities patent ties into Google’s latest take on Rich Snippets which is great. I do worry about the potential implications of named entities in search as a whole though as it could be misused as some kind of additional Google Instant that favors large brands over small sites in a roundabout way, know what I mean? Let’s say I’m a travel affiliate and want to boost seo for my travel site and I type in seo sea, will I get your site as the only site in the search results? And will Google go as far as to link sea to travel meaning you will pop up in travel results?

    It’s just that Google has a tendency to underplay the impact of changes you know…

  8. Hi Mark,

    It is an interesting technology, and the search engines ability to identify specific people, places, and things in search queries means that they can potentially serve us much more than just a set of 10 links to web pages, or modify the results that we do see, such as showing us some more pages from a site that the search engines may have associated with those pages.

    With the Yahoo results, I think they are only showing things like “related people” or “related movies” for celebrities at this point, and I would guess that they wouldn’t show related businesses.

    Unfortunately, the Google Related Toolbar seems to understand when entities are shown as well, and will show competitors sometimes for businesses. Not in search results, but actually when people are on those pages, in a toolbar that comes up at the bottoms of pages. This “related” toolbar is a feature of the Google Toolbar, and is turned on by default in Internet Explorer.

  9. Hi Lana,

    Don’t know if Google will ever implement the kind of “related persons” or “related movies” feature that Yahoo had. Yahoo’s always been more of a portal type site than Google has, and they seem to love celebrities.

    But there are likely a number of other things that Google might do with named entities that might make search results more interesting.

  10. Hi John,

    That’s an interesting question, and it looks like Google’s Freebase (acquired when Google purchased Metaweb) is one of the founders behind the CTags.

    I don’t know how much trust Google has in the use of meta tags most of the time, and if they would accept a new type that identifies concepts the way that CTags do.

    But, we have seen a good number of different types of meta data in the form of schema.org code that people can use to make it easier for the search engines to identify that kind of information.

  11. Hi Thomas,

    One of the reasons that I decided to use Obama as an example, with the use of the word “campaign,” was because it might have been easy for Google to associate the president with the White House site, but harder to associate him with his official campaign site.

  12. Hi Kentaro,

    There may be some reputation management issues implicated by a named entity approach. But I can’t say that I’ve seen a named entity, or specific person or place or thing, associated with a website that it shouldn’t be in Google at this point, when Google shows extra search results for that website (at least not yet).

  13. Hi Mike,

    Sometimes people do perform searches in question format, and the information extraction approaches that Google uses can help the search engine answer questions like the ones in my example. But you don’t need to write content in question form for Google to answer questions about named entities.

    However, some researchers at Microsoft did a quick survey of queries performed at Microsoft and notices that over 70% of those queries included named entities within them (Named Entity Recognition in Query):

    We have conducted a manual analysis on 1,000 unique queries randomly selected from the search log of a commercial web search engine. It indicates that named entities appear very frequently in queries and about 70% of the queries contain named entities. Furthermore, if a named entity occurs in a query, usually only that single named entity occurs and less than 1% of the queries contain two or more named entities.

    I think that’s a good thing to know, and you might keep it in mind when you do create content for pages.

    For example, when Google chooses to answer a question about someone’s birthplace, they might be looking at all the biographical templates at wikipedia that have the word “birthplace:” in one column, and in the same row but next column over have the answer. Google is treating those as “key/value” pairs.

    When you want to improve the local search SEO for a page you are working on, it could help for you to do things like put the word “phone:” in front of a phone number for a business. These days you could even use the formats at Schema.org which helps organize the data around a particular business into a certain structure.

  14. Hi RichardC,

    The top listing should not be included. it simply shows up as a number 1 result because you posted it via Google, that’s why your picture avatar shows up. Only you will see it in that position. it’s also stored in your cookies as well via Google. read through your cookies sometimes. you would be amazed by how much B.S Google, Amazon and Facebook store/track and transmit.

    I’m not going to take your comment as flaming, but I have been paying a lot of attention to authorship profiles, and to how Google does social search. You’re right that sometimes images show up in Google results because of personalization. But in the screenshots above, that’s not why we are seeing them.

    There are two different kinds of pictures of authors that will show up next to a search result. One of those is an authorship image, like the one in my search result above. It’s a larger image (larger than the other kind of image), and it shows up adjacent to the text in the snippet for the result. Google provides an example of that type of result on their page: Author information in search results.

    That authorship image shows up for everyone, regardless of whether they are logged into Google or not. It has nothing to do with personalization, and is a result that appears because the page ranks highly for that term in Google.

    There’s another type of image that will show up in search results that only appears when you are logged into your Google account. These are smaller images (smaller than the picture of me next to my pages results in the search results above), and appear under the snippet instead of next to it. They show people who have clicked on a Plus button for a particular page that might be relevant for that query, or shared it on Google Plus. That result might not have showed up in search results if you weren’t logged in. See: Social Connections

    Beneath some search results, you may see the names and photos of social connections who have shared or created web content. To see how you’re connected to a particular social connection, hover over the person’s name on an annotated search result. For instance, if you’ve added your Twitter account on your Google profile and you follow Bob on Twitter, you might see Bob’s name attached to a search result. Click the name to see the person’s profile page.

    That particular page was written back when Google was using Twitter as a source of social search results, and the deal between Google and Twitter has ended, so Google is now showing people’s Google Profile images. See also: Search plus Your World: Personal results

    So my “SEO by the Sea” and “Bill Slawski” search result examples above would still have 4 results, because my Google Profile avatar shows up next to them as a result of setting up Google authorship markup on the pages of my site, and not because it might be a personalized result.

  15. thanks for taking the time to reply. I have a couple of questions if you don’t mind? email me when you get a chance? Thanks.

  16. Hi Dennis,

    Thank you very much.

    I’ve been seeing a lot of people who have been writing about named entities as if they might only affect large brands, and express a great deal of concern about that.

    But this named entity approach applies to large brands, small brands, celebrities, and other specific people who aren’t celebrities (like me above), places (which is why “mentions” of businesses along with some location information helps in Google Places results), and even sometimes things like ideas.

    Google has a specific algorithm that it uses to try to understand when a query is a “navigational” query, in that a search might have intended to see a particular page instead of searching for information about what was typed into a search box. For example, when I type [ESPN] into my toolbar search, I want to see the homepage for that site, and Google shows it at the top of results.

    For a query to be associated with a specific website for purposes of navigational results, Google might look at how often people use that query term as anchor text to link to the page, it might look at how often other sites that show up in the top search results for that query link to that particular site, it might see how relevant that site is for that query, and more.

    Google has a similar algorithm that will sometimes associate a named entity with a specific site when that named entity appears in a query along with an additional term or terms, Google assumes that a searcher is attempting to perform a site search on that site. That’s what we see in the [Barack Obama campaign], and the [bill slawski named entity] search results above. Now Barack Obama may be famous and well known and a big “brand,” but I’m not.

  17. Pingback: - iPullRank
  18. Pingback: Domain Name SEO | The Open Algorithm

Comments are closed.