As a recent post on Google’s Inside Search blog noted, the Web doesn’t just contain strings of text, but also includes a great amount of information about things. The post was an introduction by Google to search results that would contain a lot more information about things that people might search for, with textual summaries and links to related topics in Google’s sidebar when appropriate. If you create Web pages, perform keyword research, and even search the Web, this presents some new challenges and some new opportunities, including a need for concept research.
A news story at Fast Company in 2010 carried the interesting title, Bing to Lap Google in Making Search an App? The article tells us about Microsoft finding ways to understand when it might be appropriate to show more than just links to web pages or images or news stories when certain searches might be performed. The “instant answers” displayed in the Bing search results aren’t the informational type results that Google is beginning to display alongside its search results but are rather more akin to the OneBox type of results that Google has been displaying for a few years.
Bing, Entities, and Knowledge Bases in Concept Research
Earlier this month, Microsoft published a patent application that describes some of the processes behind this identification of queries where such results might appear. The patent filing, Presenting Actions and Providers Associated with Entities tells us about how Microsoft might identify entities within search queries, and respond by showing answers that might make it easier for searchers to perform tasks like buying tickets for events and much more.
Some examples from the pending patent include answers on a wide range of topics, including, “weather, news, area codes, conversions, dictionary terms, encyclopedia entries, finance, flights, health, holidays, dates, hotels, local listings, math, movies, music, shopping, sports, package tracking, and the like.”
There are two main steps Bing might take towards being able to present those kinds of answers. One of them involves recognizing that a query includes an “entity”. This can be done in part by looking at a range of knowledge bases such as The Wikipedia, Freebase, the IMDB, and other sites on the Web that contain encyclopedic information about specific people, places, and things.
Those knowledge bases may also be used to understand attributes or aspects related to an entity. For example, a search for [Ronald Reagan] might involve a recognition that the subject of that search was a real person and use sources such as Wikipedia and the IMDB to understand aspects about Ronald Reagan, such as date and place of birth, information about his radio and movie careers, movies acted in, his military service, his political career including being governor of California, and President of the United States. knowledge bases are a good starting point to target when performing concept research
The second step involves understanding the intent behind a search.
To recognize query intent, a query input by a user is referenced (e.g., received, retrieved, etc.). A past query log(s), such as a query log associated with the user that input the query, a query log of a group of users, or query logs of all users can be used to recognize query intent. Other data, such as user data, may additionally or alternatively be used to determine query intent.
For example, interests of the user may be utilized to determine query intent. A query may be evaluated for the intent of the query using machine learning algorithms such as clustering. As can be appreciated, in some embodiments, query intent may be or include the query input by a user, without additional analysis.
A search for “when was Ronald Reagan born?” shows an intent for an instant answer that takes advantage of the information found within a knowledge base.
The instant answers shown can be more complex as well. Imagine someone searching for the query [cincinnati reds]. Again, we have an entity that could be found in a number of knowledge bases, and a number of aspects could be found within that knowledge base related to the term.
A look through the search engine’s query logs might also show query sessions from searchers that could indicate the intent behind the search. Someone searching for [cincinnati reds] might follow up with a second search in the same query session for [cincinnati reds tickets] or [cincinnati reds score] or [cincinnati reds schedule].
In addition to Bing searching for all documents that might contain the string of words “Cincinnati Reds” and returning just those, it also understands from knowledge bases that the term is about a specific entity – the major league baseball team from Cincinnati, and it understands from its query logs that people searching for that entity are often interested in tickets or schedules or scores.
Finding answers to commonly asked questions is another important effort to take when conducting concept research.
The context of the search might also make a difference as well:
- A search for the team during the baseball season might show a result related to one of those topics.
- A search before or after the season might show the team’s record from the previous season, or other information.
- A search performed by someone during the morning might show a score from the night before.
- A search from nearby where a game is to be played might show ticket and schedule information.
In All Your Knowledge Bases Belong to Google, I wrote about how Google was also looking at entities from knowledge bases, and its own query log files to determine whether or not they should show additional information in search results to searchers about the entities within those queries.
As the Inside Search Blog link I started this post off with tells us, that information is intended especially to help searchers who might not know much about a topic and are interested in performing discovery type searches where they can learn more. The post also tells us that it looks to its log files in an attempt to anticipate some of the followup queries that people often perform when their searches contain that entity.
Google’s recent announcement regarding the use of knowledge bases appears to be more informational than Bing’s, which focuses more on providing situational or transactional information about entities. Then again, Google has been providing OneBox type results for a few years that help book flights, or provide information about the weather, or show maps of local businesses, or similar results that attempt to match the intent of searchers.
Concept Research and Web Page Associations
A patent application from Bing that came out this past week goes beyond the use of knowledge bases to understand just entities. It also looks at a range of concepts that might found in queries.
Imagine that the search engine takes one or more knowledge bases and maps concepts to the pages of those knowledge bases. It might do so manually, or it might automate the process. For instance, It could take the titles from Wikipedia articles and use those as concepts, and associate those pages with those concepts.
The end goal would be to create an ontology of concepts that could be used to identify concepts used in searches to determine which pages to show searchers, and in some cases to provide instant answers of the types mentioned above.
For example, someone performs a search for “Kennedy birthday.” If a search engine just searches for strings of words within its index of the Web, it might only return a list of web pages that contain that phrase. Instead, imagine that it attempts to understand which “Kennedy” may be referred to within the query, and decides that John F. Kennedy, Bobby Kennedy, and Ted Kennedy might be the most likely choices, with John F. Kennedy having the highest confidence or probability of being the right answer based upon query log refinements for similar searches.
Or on a search for [Java], there’s some ambiguity as to whether a searcher is interested in the programming language, the island, or the coffee. Those three different concepts related to the term might be identified as part of an ontology created from knowledge bases like Wikipedia, Freebase, and others. The search results returned might include information about each of the different concepts related to the term, showing a diversity of results that could satisfy different searchers.
The patent filing is:
Concept Disambiguation via Search Engine Search Results
Invented by David Ahn, Michael Paul Bieniosek, Andrei Peter Makhanov, Franco Salvetti, and Giovanni Lorenzo Thione
Assigned to Microsoft
US Patent Application 20120130972
Published May 24, 2012
Filed: November 23, 2010
Abstract
Concept disambiguation is provided for search queries by analyzing search results in conjunction with an ontology of concepts. An ontology of concepts is identified, and at least one document is associated with each concept. The document associated with a concept is representative of the concept and used to generate a concept signature. When a search query is received, it is processed to obtain search results.
The search results are used to generate a search results signature, which is compared to the concept signatures to identify one or more concepts that are relevant to the search query.
While this patent filing tells us that sources such as knowledge base pages might be used as pages associated with specific entities, it seems to leave the door open to having other pages being identified as the primary pages about specific concepts as well. A knowledge base page might be helpful in creating an ontology of concepts, and in identifying different aspects or attributes associated with those concepts.
This is an important question to keep in mind when doing concept research:
Which raises the question, what does someone have to do to have their page be identified as embodying a specific concept, and having their page associated with that concept by Bing?
It’s possible that the home page for a particular State might be seen to be a better page to be associated with a “concept” related to that State or the homepage of a business being associated with the “concept” of that business.
The processing part of a search for [Titantic Director] might identify the query to be about the movie Titanic and more specifically about the person who directed the movie. A page within the search results for that query might be identified as being about that particular aspect of that concept by an analysis of the textual content of the page, and a “feature vector of terms and/or phrases found in the textual content.”
In other words, for a page to rank well for a query such as [Titantic director], that page should be one that is about the concept related to that query as indicated by the terms and phrases on the page itself.
It’s not just a matter of how many times the term “Titanic director” appears in the page title, in headings on the page, in the textual content of the page, and in links pointing to the page.
Instead, it’s a question of how much the page embodies the concept by addressing different attributes and aspects of the concept in meaningful ways, and possibly addressing related queries about the concept.
Concept Research Takeaways
Search is evolving to understand the meanings and concepts contained on pages on the Web, and search engines are increasingly looking to sources of information such as knowledge bases and their own search query logs to understand entities and concepts that might appear within queries.
While Google may be looking at knowledge bases such as Wikipedia and Freebase to learn about the entities that it sees, it’s also possible that it also incorporates the kind of ontology described by the CIRCLA technology (pdf) that came to it in its merger with Applied Semantics. It’s also likely that Bing is developing its own ontology from similar knowledge base sources.
One of the more interesting papers that I’ve seen from Microsoft recently was Improving Entity Resolution with Global Constraints (pdf), which told us about how it might look to more commercial type knowledge bases such as the movie database from Netflix or the music database from iTunes to understand which entities might be referred to in a query. Those commercial databases have an economic incentive to only have a single entry for the entities they contain, since they would want to have all information about a single entity together, including reviews and other user-generated content about each.
It’s important to keep in mind, when you’re writing about a topic, or doing keyword research, that the words that you’re choosing to use aren’t just strings of words, but rather embody certain concepts that may contain many different aspects. Doing concept research on those can pay off richly.
If you want to create a page or site about the [cincinnati reds], it doesn’t hurt to understand that both Google and Bing may have an ontology about the team that includes many different aspects related to that term. It could include history, schedule, standings, stadium information, ticket sales, players, statistics, and so on.
Your research related to the use of a term as a keyword possibly should go beyond just a look at a search volume related to the term and an analysis of how competitive it might be compared to other pages about the team, to an exploration of the different concepts and aspects and attributes related to the term in sources like knowledge bases and pages that tend to rank well for the term, as well as an exploration of related queries that people might search for when they search for that term.
Addressing those related concepts and aspects, in doing concept research might make it more likely that your page will be seen as one that should be associated with a particular entity or query.
On a side note, David Ahn, who is listed first as an inventor on this second patent, came to Microsoft when the company acquired the semantic search engine Powerset. According to his LinkedIn profile, it appears that he left Microsoft to join Google around a month ago. While that doesn’t mean that he is taking the technology behind Microsoft’s concept patent application with him, he is taking his knowledge of assigning concepts to web pages.
Added June 1, 2012 – The Bing Community Blog announced some new features today, including a central column they are calling a “snapshot,” which includes entity and knowledge base type information, in the post: Social Meets Search with the Latest Version of Bing, Available to Everyone in US Today. The snapshot results sound very much like the things described in the Microsoft pending patent I wrote about above, Presenting Actions and Providers Associated with Entities, which may be more transactional in many ways than Google’s knowledge base results. From the Bing post:
Available for the first time today, we designed a new center column, called snapshot, which makes it easy to get things done in one place. It includes information, such as maps and reviews, and quick ways to take action, so booking a restaurant reservation or checking hotel rates are now quicker and easier. The snapshot feature offers restaurant reservations, hotel reviews, movie trailers and show times, maps, and more right within the search experience; giving you everything you need to take action in one place and eliminating the need to visit multiple sites to complete tasks.
Have you ready to add concept research to the keyword research that you do when optimizing a website for search and searchers?
Last Updated June 8, 2019.
Great post Bill. I’ve always been a big supporter of understanding searchers intent rather than just keyword lists and volumes. This certainly pushes things light years ahead in that direction and I can’t wait for this to be in full force in our SERPs.
As Matthew states, I’m also a fan of intention-based searches, as long as the data is presented in a clean and useful fashion. There are also a lot of opportunities for copy write infringement this way.. especially when you are using other’s data and taking away their organic search volume.
Bill – great stuff. That list of relationships in that paper is really interesting.
This concept stuff is challenging because there’s not really a bunch of standard “mental models” to think through and organize these things easily. I did a piece recently on Paid Search where I show how the “problem/solution” mental model works really well for me during keyword research – I guess that fit the “causation” category per the paper maybe?
http://searchengineland.com/how-to-use-the-keyword-funnel-to-understand-searcher-intent-121463
There have to be another 4 or 5 good mental models that can be used for keyword research that cover other concepts besides problem/solution (to organize it, and figure out how complete a job you’ve done). I think that list in that paper may be the key. Good food for thought, thanks.
For years I’ve been helping clients optimize their website for specific themes and topics. For the longest time I had to compete with “SEO companies” who would only optimize sites for 1-5 keywords and “guaranteed rankings”. Even thought the changes happening at Google and Bing are greatly effecting the SEO industry I see this as a positive as it really separates poor SEO vs quality SEO.
Indeed, I find that not relying on keyword research alone can actually BENEFIT your keyword research, if that makes any sense. I sat down with a client recently with a list of keywords based on a topic. We then brainstormed other things that a searcher would use. What we ended up finding were keyword phrases that were actually being searched more then the original keyword and much less competitive.
Hi Matthew,
Thank you.
This kind of concept matching seems to be the kind of thing that both Google and Microsoft have been working on developing over the course of a number of years. It seems to be something they are building brick by brick, rather than being able to implement overnight.
It’s likely that just indexing the content found at many different pages on the Web, and the URLs where those are located, and then using a number of different signals to rank them is easier than extracting a concept graph, and associated attributes and aspects of those concepts, and then mapping content related to those across the Web.
Using knowledge bases the way that both Google and Bing have been is bringing us closer to those days where those kinds of associations can be made. Looking at query logs associated with those, and possibly even social media, brings us even closer by helping to identify intent.
These do appear to be positive signs that both Google and Bing are constructing a concept graph that will play an ever increasing role in how pages are indexed on the Web.
Hi Brent,
Understanding what situational and informational needs different keyword phrases might help fulfill is definitely a helpful step towards optimizing the pages of a site for searches that can help meet the objectives of both searchers and site owners.
There is a risk of someone copying your content when you publish to the Web, and it is likely that someone writing about a topic that you’ve written about will look at your page if it ranks well.
I do think it’s helpful and important when you’re going to optimize for a certain query to look at other pages that do as well, to see what they’ve written, what aspects and attributes related to that query they’ve covered, or failed to cover, including knowledge base pages. It’s also important to create unigue, interesting, and engaging content as well, and just copying what someone else might have written is no way to do that.
Hi Ted,
Thanks. I’m guessing that you’re referring to the different types of relationships discussed in the CIRCLA paper:
It’s definitely a good list of things to keep in mind when conducting keyword research.
I do like the funnel approach that you describe in your Search Engine Land article for understanding the intent behind searches related to specific queries. I think that exercise does provide some valuable insights into what people might be looking for and expecting when they use certain queries to search with.
Thank you for providing such an interesting approach.
Hi Bloghands,
Some SEO initiatives I’ve seen put into place are really limited. There are an incredible amount of strategies and approaches that can be used to gain visibility in search engines and on the Web, but many don’t often even try.
For example, some pages actually work best not optimized for a specific head term, but as pieces that focus upon a collection of related long tail terms.
I’ve seen many sites that have 20+ pages, and only attempt to focus upon optimizing a handful of pages for specific terms as well. Talk aboout missing opportunities.
Hi Eric (evolvor)
There’s nothing quite like a brainstorming session with people who are actually subject matter experts on a topic, and who are willing to sit down and engage in some mind mapping and brainstorming.
There’s a risk that sometimes they might offer inside jargon and terms that people who use their goods or services, or who aren’t experts on the topics they deal in might not understand or use, but that’s why it’s good to do some of that initial research exploring knowledge bases, competitor’s sites, as well as forums and other places where people who might be interested in what they offer might use to have conversations about things they offer (or on things that are very related).
You have to look at it as a buying process. Search engine marketing is the art of figuring how to interject yourself into the customer’s buying process to a) get their attention and b) guide the sale to closure.
Not surprised the search engines are figuring how to associate your content into the broader base of knowledge about a topic. They are also probably looking at where / how your content performs within the narrative of the customer’s discovery process.
Sometimes the insights can be a bit surprising. For example, when I was analyzing a competitor for my day job, I noticed that one of their best performing keywords was a part number!
But…thinking about the broader customer interaction and buying process, it makes perfect sense. My company is in a pretty boring product category. Very rarely does a purchasing agent get up and say – I’m going to go buy widgets today.
What happens is someone storms into their office and screams – our $1MM/day production line is down and if we don’t get part #542343B on that machine in the next 24 hours, we are . So… our now highly motivated buyer pulls up google, types the part number, and….
Keyword research is also as important as concept research.I do both of them before i publish any post.
Anyways nice post Bill!
Looks like everyones in a race to reinvent Windows Encarta!
Great post Bill.
@Evolvor finding all the keywords within a subject must be defined as keywords research.
Many keyword tools filter a database with keywords and therefor don’t find related keywords that don’t include the searched keywords.
It save some time sorting, but you also miss out on the low competitions and high value keywords that don’t include the keywords, but are about the subject.
Hi John,
I’ve worked with site owners who have a variety of objectives for their site. Sometimes those involve sales, and sometimes they focus upon education, sharing information, gaining newsletter subscribers, and even influencing some actions and activities (plant a tree, sign a petition, vote, etc.).
But you’re absolutely spot on about understanding the different stages that a visitor might be going through, from information discovery, to comparing prices or services or opportunities, to becoming a consumer of goods or services or information, to returning for customer service or using online services or sharing a site with others.
The kinds of emergency that you mention does happen, and being ready to help someone in that kind of situation can be something you could base a good part of your business around.
Understanding that kind of buying or buy-in process is another great way to thing about concepts that you can optimize the pages of a site for.
Hi Suraj
Exactly. I’m not saying to not do keyword research as much as I’m saying to approach those as concepts that have relationships with other terms and phrases, and possibly a number of informational and situational needs that they might address.
Hi Bill,
IMO yes, folks should be expanding content writing capabilities. The semantic web from a linguistic point of view is already happening, see: http://blogs.wsj.com/digits/2012/05/29/google-search-engine-surging-following-revamp/?mod=wsj_share_twitter
It’s a good idea to slow down the churning and improve the output. Ontologies and topic models are more than keywords. Keywords are the seeds but to get that sweet fruit of relevance you must go beyond just doing keyword research.
Hi @Matdwright,
It seems like a search engine taking advantage of knowledge base and query log information top rank search results might lead to something like trying to rebuild an encyclopedia in your web pages, but it’s not. The approach can lead to making content creation a little more complicated, but in some ways it might be simpler. I think it makes that planning of what content you’re creating a little easier because you’re putting more thought into it at the front end.
Hi Brian,
Many keyword tools do have limitations that often don’t make them a good starting point. The list in my response to Ted’s comment, taken from the CIRCLA paper, presents some relationships that could be used in a process to find ideas for related terms as well.
Hi Scott,
Thanks for the link to the WSJ article. I wasn’t aware of the impact of the knowledge base results additions, but it’s understandable that searches at Google have increased. Those results both provide some useful document summaries and aid in exploratory searches.
One of the most successful pages I’ve ever worked upon started out as a total failure. We focused upon a specific keyword phrase for a very recently added page that was both thin on content and ideas. Just adding the phrase in some appropriate places within the existing content brought us a page that really was pretty low quality in terms of content. And many of the competitors’ pages that did rank well for that term were content-rich and great experiences.
We shifted our approach by finding some great examples of what the keyword phrase suggested. We created a page that was long on content, but also rich with long tail terms that people were searching for. Individually, they weren’t at the levels of search volume of the main phrase, but taken together they easily surpassed it. On a site of around 300 pages, this page started attracting about 6% of all new visitors to the site within a month or so. The purpose behind the site was to get people to take some specific actions, and the content of this page did that much more effectively than the previous incarnation did, as well.
We went from a keyword rich and user experience poor page to a keyword rich and user experience rich page by understanding the concept of the main phrase behind the page better, and writing about concepts and entities related to it. That made all the difference.
Makes sense. They may do it already but that concept may impact the longtail keyword directly too, as opposed to it being 3 separate keywords within the query.
Dear Bill,
I love your articles. It’s like taking a bath in the future.
Ines
This will really be great Boolean Algebra. Imagine somebody typing Java as you stated in the article and the search engine has to decide whether the intent of the surfer is seeking information on the island, the software or the coffee brand? That withstanding a concept based search would go beyond the keyword and inform the search engines the intent of the person conducting the search. This way the SEs would match the queries and the content based on the wider intent of the search.
If Google pushes the knowledge graph more than necessary, dont you think that it will just bypass lots of websites! E.g, usually when i need to learn about a person, I search for the name in Google and most of the times, Wikipedia is the first site in the SERP, and I usually navigate into wikipedia article, given the fact that Google is bringing this knowledge graph, it would be though better for me, I wont have to go through wikipedia stuff and I ll be able to have the required results easily, but if we think from perspective of sites, Google would be actually stopping users from clicking!
Secondly, is there a separate Google bot for the knowledge graph, that would be crawling in the reference sites like the wikipedia and other such sites and collecting data from there? And if there is any way of stopping Google from picking data from out sites, so that Google may not be able to steal data from our site and show in its SERP as knowledge graph!
Hi Thomas,
The idea is definitely to find pages that may contain a concept related to what is contained within a query, and if the impact is in returning results where the terms do express a concept as opposed to showing pages where the query terms are scattered across a page in ways that may be unrelated to each other, that’s an improvement from a search standpoint.
Hi Ines,
Thank you.
Hi Gichuki
Ideally, a search engine would recognize that a query has potential multiple meanings, and potentially involves some possibly different intents behind it, and will provide a diversity of results. On a set of the top 10 results for a search on [Java], we should ideally see some pages related to the programming language, some related to the drink, and some related to the Island. That’s a much better result than just providing us with a set of results that don’t understand that the term might refer to different meanings.
Hi Asad,
The knowledge graph results aren’t an attempt to replace the Wikipedia results, or even to discourage searches. If someone is just searching for some very small detail, like someone’s birth date, then they might do that.
But, if the only reason why someone might be coming to your webpage is to find some small snippet of information like that anyway, then they might not necessarily be a visitor that might bookmark your site, refer it to others, subscribe to your newsletter, click on your links to other articles and so on, anyway.
The knowledge base results are intended to be a brief summary of information about a topic, and possibly include some additional links to further information, as indicated in query sessions that might be related to the entity identified in the knowledge base sidebar information. Their purpose is to provide some information, especially to searchers who are performing exploratory type searches, and want to find out more about a topic. They are intended to encourage more searches, and to help guide people who might not know much about a specific topic.
Intention based searches sound to be the one of the best approaches that I have heard yet. The main limitation is getting a knowledge base with enough knowledge.
HI Jonathan,
Intention based searches are just as effective without a knowledge base approach, and have been for more than a couple of years now. Having a sense of what the intent might be behind a search when someone types it into a search box can be really helpful (or at least knowing how a search engine might interpret that intent.)
Nice Tips
Some SEO initiatives I’ve seen put into place are really limited. There are an incredible amount of strategies and approaches that can be used to gain visibility in search engines and on the Web, but many don’t often even try