Search engines are hard at work transforming the Web from a place of words to a place of people, places, and things. An Ars Technica article from earlier this month, How Google and Microsoft taught search to “understand” the Web, discusses this evolution of the web, though I think they see this trend incorrectly as one that only goes back a few years.
The first post I wrote about search engines extracting entities from webpages was in January of 2006, in Providing related links to documents. I’ve written a number more that describe how the identification and extraction of an entity from a page might be useful in one manner or another to a search engine. This is true with local search, as well as with practices that can drastically impact the composition of the search results that we see everyday. Over at the SEOmoz blog a couple of days ago, Dr. Pete Myers wrote The Bigfoot Update (AKA Dr. Pete Goes Crazy).
If Google were to take the approach that I described in January in 10 Most Important SEO Patents: Part 6 – Named Entity Detection in Queries, and turn it up a notch or two, you would see the kinds of results that Dr. Pete described in his post about the “Bigfoot” update.
Instead of just associating an entity with one website, and assuming that a searcher might want to perform a “site search” on that site, it looks like Google recognized that sometimes an entity might be associated with more than one site. The result is that you may sometimes see a set of search results where you may have 3-4 results from one site, followed by 3-4 results from another site, and sometimes even another 3-4 results from a third site, all in the same set of search results for a single query.
In the example below, we see one expanded set of search results from one site that help provide pages on a search for [space needle hours]. Google recognized that “space needle” is an entity, and that there’s a web site associated with it that does a good job of answering questions about its hours of operation. But what if there were more than one that provided answers to that query that were just as good? Dr. Pete started running across search results that contained multiple “entity associations.”
Growing Entity Usage
Last week I saw a few different patent applications published which build the identification and use of entities even more. While Yahoo’s results are powered by Bing, their presentation of those results are unique in some cases. In my post on Named Entity Detection above, I pointed to how Yahoo might display “related movies” or “related songs” or “related people” in a sidebar when you search for certain people, like in the example below on a search for Justin Timberlake:
Dynamic Identification of Related Entities
A Yahoo patent published last week tells us about how Yahoo might sometimes associate different entities based upon things such as whether they tend to appear together frequently in a short period of time in pages or articles published in the Web, or in query sessions from searchers.
We’re given a few examples in the patent filing, including some involving people participating in the O.J. Simpson murder trial. Back when the trial was taking place, a search for [o.j. simpson] may have been satisfied with pages or articles about him. They could also included query refinement suggestions that might have focused upon other people involved in the case, from the lawyers, to the judge, to the witnesses, to the victims. Or pages involving those other entities might also have been included within those search results.
The Yahoo pending patent is:
Method and System for Discovering Dynamic Relations Among Entities
Invented by Anish Das Sarma, Alpa Jain and Cong Yu
US Patent Application 20120143875
Published June 7, 2012
Filed: December 1, 2010
Entity Following
A Microsoft patent application from last week describes a way that a search engine can decide to “follow” an entity in real time, and receive alerts when there’s new information regarding an entity.
While this might not sound too different from “alerts” that people have been able to set up in the past to follow specific topics or see pages that might have certain keywords within them, it is different. Instead of “subscribing” to some RSS feeds from specific sources, like a “Google News”, this kind of “Entity Following” actually sets a dedicated crawler in places that actively looks for content on the Web involving a specific entity. The patent filing is:
Entity Following
Invented by Zhaowei Jiang, Xavier Legros, Ronald H. Jones, JR., and Ryan Panchadsaram
Assigned to Microsoft
US Patent Application 20120143845
Published June 7, 2012
Filed: December 1, 2010
Abstract
The present invention outlines a genuine entity following system that also addresses data source limitation. When reviewing entity-related objects in web content, a web user designates one or more entities to follow in real time.
More particularly, the present invention is directed through strategic deployment of a dynamic crawler upon selection of a “follow” pointer over an object in a web browser such that a web user can automatically designate entities to be followed and receive alerts at predetermined temporal intervals when new information regarding such designated entities becomes available.
A web entity engine of the present invention is designed to discover trending entities at any given time while generating output activity (i.e., signal) streams for this entity.
Entity Extraction Complexity
One of the patent filings published last week from Microsoft gives us a look at some of the inner workings of the search engine when it comes to identifying and indexing entities. For example, imagine that the search engine recognizes that “Star Trek” is an entity that it wants to collect information about. It might find out that there’s a Star Trek television series, a Star Trek video game, a Star Trek movie, a Star Trek comic book, and so on. While they might be related, they are separate entities.
There are a few different ways that a search engine might be able to distinguish between these different entities and organize them in a manner that might be useful. For example, a knowledge base like wikipedia might be turned to, which might have a Star Trek Disambiguation page which tells us that there are separate entries for two different television series, three films or film series, some different video games, a novel, and other “related” topics.
Other things might be looked at as well. For instance, the first Star Trek television series (as opposed to the animated series) had a number of actors and directors and other people related to it. A search engine might be able to find documents on the web that discussed things like who played which character in that first television series, and associate the content (and facts) on that page with the version of the entity “Star Trek” related to the first TV show.
If you’re interested in the approaches that Bing might take in locating and extracting entity information, and making sure that it’s associated with the right entity, the patent filing describes other approaches that might be used as well:
Measuring Entity Extraction Complexity
Invented by Amir J. Padovitz and Bala Meenakshi Nagarajan
Assigned to Microsoft
US Patent Application 20120143869
Published June 7, 2012
Filed: February 10, 2012
Abstract
A named entity input is received and a target sense for which the named entity input is to be extracted from a set of documents is identified. An extraction complexity feature is generated based on the named entity input, the target sense, and the set of documents. The extraction complexity feature indicates how difficult or complex it is deemed to be to identify the named entity input for the target sense in the set of documents.
Question Answering with Entities
An approach to entities that a search engine might use to to crawl the Web and collect facts and information about specific people, places, or things. Google has been answering questions like “when was Babe Ruth born” (February 6, 1895), or “How tall is mt. Fuji” (12,388 feet or 3,776 m), for a few years.
A Microsoft patent filing explores when they might recognize that someone is asking a question like “when was XXXX born,” or “What are the titles of Shakespeare’s plays,” where the search engine could provide an answer (or list of answers) rather than just returning a set of search results that may or may not contain those answers. The patent filing is:
Query Pattern Generation for Answers coverage Expansion
Invented by Franco Salvetti, Ying Tu, and David D. Ahn
Assigned to Microsoft
US Patent Application 20120143895
Published June 7, 2012
Filed: December 2, 2010
Abstract
Answers are provided to users in response to queries as a supplement to any responsive documents. Query formats for entity and attribute combinations are identified. The query formats can be substituted with entity and attribute combinations that have a corresponding attribute value to form a list of answered queries. The attribute value corresponding to an answered query can be provided when a query is received that matches an answered query.
Local Search and Recommendations for Entities
We’re all probably used to a search engine showing us the locations of pizza places around us when we search for pizza in a search box. But what if when we performed that search, the search engine took other kinds of information into consideration that we might not expect. For instance, one pizza place might be much more popular at night than during lunch time. A traffic estimation based upon our location and the location of nearby pizzerias might indicate that one pizza joint usually reached quickly might have an accident between it and us this afternoon, making it not a good choice to show first.
We’re told in this patent:
The location-related entity ranking technique described herein is a technique to rank location-related entities, such as, for example, local businesses, restaurants, entertainment venues, events, and so forth. To do this, one embodiment of the technique leverages the mobile search logs (logs of searches conducted on mobile computing devices) to rank location-related entities in real-time or near real-time.
Whenever a user submits a query, the technique examines the location-related entities in the search results that other nearby users have selected after submitting the same or similar queries. In one embodiment, the technique only includes a portion of the mobile search logs that correspond to a given time window. Additionally, in one embodiment of the location-related entity ranking technique, there are two options for searching for location-related entities in response to a search query: real-time search and near real-time search.
The patent filing is:
Real-Time Personalized Recommendation of Location-Related Entities
Invented by Dimitrios Lymperopoulos, Jie Liu, Melissa Wood Dunn, Ashwini K. Varma, Fang Wang, Jen-Hsien Kenny Chien
Assigned to Microsoft
US Patent Application 20120143859
Published June 7, 2012
Filed: December 1, 2010
Takeaways
Bing announced last week that they would start showing Britannica Online Encyclopedia Answers in search results where appropriate. These “knowledge base” results, like Google’s knowledge base” results from Wikipedia or Freebase or other data sources may provide some answers to questions that people have, and may also cause people who are exploring a topic to dig further and do more searches.
But those type of entity-based results aren’t the only impact of entity identification and association by a search engine.
The patent filings I’ve listed above were all ones that were published last week, and they show a variety of approaches involving named entities, from how a search engine might determine that certain entities are related based upon some dynamic event, to how entities may be extracted and distinguished from one another. Another of the patent filings allows for near real time tracking of a specific person or place or thing.
The last one I listed describes businesses within local search results as entities, and how they might be ranked in those real time results based upon real world considerations (and personalization).
About a month ago, I wrote the post All Your Knowledge Bases Belong to Google to describe some of the things that Google is doing with knowledge bases and entities.
Google’s not alone in that effort. The patent filings I wrote about in this post are from Yahoo and Bing, showing their commitment as well towards a Web of things.
Well-sourced coverage on entities Bill. I tend to think a lot of the “search haterade” that I see doesn’t fully appreciate the direction that search engines are taking. We’re moving well-beyond 10 blue links of scraped title tags and meta descriptions.
Thanks for the great follow-up! You explained the entity detection aspect of my post about 50X better than I did. It’ll be very interesting to see how the “web of objects” and Knowledge Graph start to filter into organic results and evolve search in general.
It has been a few years since I visited SEObytheSea. You were my favorite search-related resource when I was getting up to speed with SEO. Aaron Bradley @aaranged is one of the few “SEO for Semantic Web” types I know, and he recommended this post of yours. He was right.
Thank you so much for presenting a more complete picture of major general purpose search engine providers’ use of person, place and thing entity relationships. A lot of ire was directed at Google last month. Your review of similar, long in-progress efforts of a similar sort by Yahoo and Microsoft Bing is great. I noticed the announcement by Bing regard inclusion of Britannica. Too bad Microsoft couldn’t have preserved Encarta, and somehow integrated it into Bing, rather than retiring it several years ago. It could be useful for entity search purposes. Have you ever used Freebase directly? It seems rather a mess. Maybe Google is able to effectively extract useful entity data from it (I couldn’t, not on my own)!
Have to agree with Gyi. I’m also seeing these types of results with smaller, broader search queries. Whereas becoming more specific with your search query is resulting with more relevant results.
I’m wondering if anyone else has noticed this as well? Why would these entities be shown for the broader searches and not the other? Or am I just seeing things? 😉
BTW, caught a spelling mistake, which took me forever to figure out. 4th paragraph from bottom. “to how entities amy be extracted” amy = may I assume. I kept reading that sentence and getting hung up. 😀
Loving the concept of entity following…
It’s a good idea for search engines to do, but I still think it’s open to spamming. The search engines should know for a fact that “philadelphia personal injury lawyer” is not an entity, it’s an attempt to grab an expensive search phrase, and yet searching for that phrase produces nothing but lawyers who have used that phrase in the title of their homepage and in dozens of spammy backlinks with that anchor text, as if this was their “entity.” The first local result That sort of behavior should be discouraged, and entities using branded names — “SEO By The Sea” is a perfect example, it has no AdWords, no monetization — should be rewarded for being honest with the search engine and by accurately describing their own pages. Penguin is a step in this direction, but they’ve got some ways to go.
Max, I completely agree. It is definitely still open to spamming. Hopefully penguin/panda will help clear it up a little bit more…
Very insightful details concerning mobile and its effect on Google maps listings during different hours of the day.
“A traffic estimation based upon our location and the location of nearby pizzerias might indicate that one pizza joint usually reached quickly might have an accident between it and us this afternoon, making it not a good choice to show first.”
The implication of information being displayed based on this type of data is tremendous! This is the type of forward-thinking that will hopefully decrease the success of spamming.
@ Shannon – Most of the time I come up with the question on why people do comment on blogs with their links. Even without putting a necessary anchor texts, they just put the links. Most of them are not even hyperlinks. They are looking like normal text. My site is also get some links like that. I hate spamming too.
Pretty useful information, thanks. I’ve been looking more and more into SEO information and I’m still learning about it. I suppose I started learning at a good time since it seems like search engines are changing things a lot. Either way the information on the entities is interesting. I’ll be interested in seeing how it has an affect on people who have their own blogs.
Hi Gyi,
Thanks.
If it’s not quite clear at this point how big an impact entities will have on search in the future, I think it’s going to be something people start recognizing more and more in the future. Part of the fun of SEO is in recognizing when changes like this start happening.
Hi Michael,
Sorry about the typo (and thanks for pointing it out). I did manage to pop in and correct that one, and a couple of others. 🙂
The entity associations tend to happen when Google thinks there is an entity included within a query, and possibly one or more words in addition, with some level of confidence that the entity can be associated with one or more sites.
It’s possible that when a query is more narrow, that type of entity association might not be as easy to make.
For example, an entity association with a particular site might be based upon a few different factors, but one of them might be that within the top n (10, 100, 1000) pages returned for a particular query, a certain percentage of those might point to a specific domain. For example, on a search for ESPN, if you looked through the top 100 results for that query, a good percentage of those might link to the ESPN site, posibly with “ESPN” as the anchor text.
When we have a more unique query term which might result in a much more diverse set of results, we might not have that same kind of percentage of links to a particular domain or two or three.
There are likely other parts of what determines whether or not an entity might be associated with a specific site that also provide a certain level of confidence that an entity is within a particular query, and a specific domain might also be associated with that query.
Hi Dr. Pete,
Thank you for your (twitter) questions and post, which provided a great introduction to this topic. Google has been doing this kind of entity association and showing expanded results for certain queries, but they definitely turned things up a couple of notches recently. Nice catch on the change.
It’s going to be interesting to see how things go from here. I’ve been seeing some result sets that include a large number (10 +) for some queries where it seems Google is really confident that there’s an intent to search one specific site.
Hi Ellie,
Thank you, and welcome back.
I was hoping to help make it clearer that Google’s work on their knowledge base isn’t the only show in town. In some ways, Google’s even a little behind. Microsoft has used an object level ranking approach for both their product search and their academic search for a few years.
It’s interesting seeing Bing use Knowledge Base results in a little more task oriented manner than Google, but I like that they teamed with Britannica. I know the team that came to Microsoft from Powerset was doing a lot with Wikipedia pages, so I’m surprised that they aren’t focusing more on Wikipedia, too.
I’ve been to Freebase a few times, and looked around, but really didn’t quite know how or where to start. I’ll be back there to see if I can get a little further.
Hi Tom,
Ideally Entity Following should uncover a lot more results than an approach like alerts have in the past. I hope that it is something Microsoft moves forward with it. I’m interested in trying it out.
Hi Max,
While there are people spamming search results by trying to stuff as many keywords into titles, meta descriptions, content, anchor text, and more, understanding and indexing entities looks at other signals than just the repetitive use of the same text over and over.
It looks for aspects and attributes of an entity that might be associated with it, looks for content that contains descriptions and information about those related topics and concepts. An entity association will be less prone to spam, because it will be about actual entities, and not just keyword stuffed pages.
Hi Shannon,
That “philadelphia personal injury lawyer” better have some great pages that a lot of other people decide to link to. He or she better not only tell us what the laws are, provide details about the outcomes of real cases and settlement decisions, give us details on what a personal injury lawyer actually does in investigating and litigating a case, but also provide a look at this area of the law that most people don’t expect. He better do a lot more than stuffing his pages with phrases. Otherwise it’s unlikely that his or her website will ever get that kind of entity association.
Some phrases and queries may never get this kind of entity association. But it is definitely something that should end up being a lot less prone to web spam.
Hi Ren,
That last patent filing on “Local Search and Recommendations for Entities” is pretty interesting, but it’s not from Google. The patent is from Microsoft.
It is quite possible that Google will be doing some very similar things.
Hi Curtis,
The web is ever changing and evolving. Many of the old rules of SEO apply still, in some ways, but with updates like the Panda and Penguin ones, those are changing the ways that some people have been doing SEO.
The search engines are trying to provide more useful and better information, even if it means making some significant changes to the way they display that information like the knowledge Base interface, and like showing more results from individual domains when they think it might be appropriate.
Search Engines and Entities Bill Slawski In deze blogpost gaat Bill Slawski in op het proces van zoekmachines die het web niet langer begrijpen als een plek van slechts woorden, maar als een plek van mensen en plaatsen.