How Google Finds ‘Known For’ Terms for Entities

Google finds terms and phrases to associate with entities that can be considered terms of interest for businesses, locations, and other entities. These terms can influence what shows up in search results and in knowledge panels for those entities. Consider it part of a growing knowledge base of concepts, entities, attributes for entities, and keywords that shape the new Google after Hummingbird. Semantics play a role as things that specific entities are known for are identified.

The Red Truck Bakery in Warrenton, Virginia

For example, the Warrenton, Virginia, Red Truck Bakery (local to me) is known for:

Continue reading

How Google Might Use the Context of Links to Identify Link Spam

With Google’s Penguin update, it appears that the search engine has been paying significantly more attention to link spam as attempts to manipulate links and anchor text to a page. The Penguin Update was launched at Google on April 24th, 2012, and it was accompanied by a blog post on the Official Google Webmaster Central Blog titled Another step to reward high-quality sites

The post tells us about efforts that Google is undertaking to decrease Web rankings for sites that violate Google’s Webmaster Guidelines. The post is written by Google’s Head of Web Spam, Matt Cutts, and in it Matt tells us that:

…we can’t divulge specific signals because we don’t want to give people a way to game our search results and worsen the experience for users, our advice for webmasters is to focus on creating high quality sites that create a good user experience and employ white hat SEO methods instead of engaging in aggressive webspam tactics.

Continue reading

Hummingbird and Author Rank Authority

Is Hummingbird the key to understanding the expertise of an author for things like In-Depth articles, and a possible future Author Rank? With content from an author considered using a concept-based knowledge base, it’s quite possible.

The Google Hummingbird rewrite of Google’s search engine wasn’t just aimed at providing a way to better understand long and complex queries, like the type that someone might speak into their phone. It was also likely aimed at better understanding the concepts and topics written about and discussed on Web pages, and in social signals such as posts at Google+ and comments on those posts, in Tweets, in Status Updates, and other short text based messages where there might not be a lot of additional context to go with those messages.

The following screenshot shows the concepts that might appear for Tweets when they are analyzed using the Probase Concept-Based knowledge base (from Short Text Conceptualization using a Probabilistic Knowledgebase):

A breakdown of concepts that appear in specific tweets, according to the Probase knowledge base

Continue reading

Concept-Based Web Search

There are a few different parts to this story, though I’m not sure how many there will be because I’m still in the middle of writing them. I started with a prologue, titled Are You,Your Business, or Products in a Knowledge Base?, which introduced Microsoft’s Conceptual Knowledge Base Probase.

Microsoft’s Probase Knowledge Base

Sometime between when Microsoft acquired semantic search company Powerset and now, the software company began work on one of the largest knowledge bases in the world, Probase. Why Bing doesn’t use it now is a mystery, but it doesn’t appear to. There are a few papers about Probase, including one titled, Concept-Based Web Search. Here’s a snippet from the paper, which might evoke some recent memories of Google’s Hummingbird update:

It is important to note that the lack of a concept-based search feature in all main-stream search engines has, in many situations, discouraged people from expressing their queries in a more natural way. Instead, users are forced to formulate their queries as keywords. This makes it difficult for people who are new to keyword-based search to effectively acquire information from the web.

Continue reading