Most of us searchers and site owners and search engine optimizers are familiar with Google’s Link Graph, and how Google uses the connections between websites to help in ranking pages on the Web. In part, Google looks at the relevance of the content of a page compared to a query that a searcher enters at the search engine.
In addition to “relevance”, Google also uses the patented method of PageRank, in which the quality and quantity of links pointed to a page are used as a proxy for the quality of the page being linked to. The higher the quality of a page (and the higher PageRank it possesses), the more PageRank it likely passes along.
The link graph is one example of how Google ranks and measures and possibly sorts web pages. Another that Google might look at is the attention graph – how Google might use topics and concepts that may be searched upon frequently to change rankings of pages based upon freshness and hot topics.
When Google indexes the Web, it’s often been convenient to think about the search engine running two different methods or approaches that seem to run in parallel. One of those involves the crawling and indexing and ranking of pages on the web (and images, videos, news, podcasts, and other documents).
The other approach doesn’t look at pages as much as it indexes objects it finds on the Web, or what we often refer to as named entities, which are specific people, places, or things – real or fictional. We see this second kind of crawling often referred to as fact extraction and see the results of such extraction as Knowledge Panel results or even things like Google’s OneBox Question & Answer results.
When SEOs talk about Google and the programs it uses to crawl and index pages on the Web, we usually refer to those crawlers as robots or spiders or even Googlebot, and don’t differentiate these crawling programs much. Not the kind of robot above (which is a new twist from Google), but it’s probably time to start thinking of Googlebot differently.
When we talk about how web sites are related, it’s not unusual for us to talk about links between sites and pages. Google pays a lot of attention between such links, and they are at the heart of one of its most well known ranking signal – PageRank. PageRank is now more than 15 years old, predating the origin of Google itself in the BackRub search engine.
Google finds terms and phrases to associate with entities that can be considered terms of interest for businesses, locations, and other entities. These terms can influence what shows up in search results and in knowledge panels for those entities. Consider it part of a growing knowledge base of concepts, entities, attributes for entities, and keywords that shape the new Google after Hummingbird. Semantics play a role as things that specific entities are known for are identified.
For example, the Warrenton, Virginia, Red Truck Bakery (local to me) is known for:
A transformation was triggered at Google with their announcement of the Knowledge Graph in the Official Google Blog post, Introducing the Knowledge Graph: things, not strings. That transformation was one less concerned with matching keywords, and more concerned with matching concepts, understanding entities, and bringing knowledge about entities to searchers in knowledge panels next to search results.
Google published a patent application last week that describes the knowledge panels that appear next to search results as part of the new knowledge graph. Here’s the video that accompanied the post (note the reference to a “panel” in the presentation):
When we talk about indexing and crawling content on the Web, it’s usually within the context of pages being ranked on the basis of a number of signals found on Web pages that might be ranked in response to queries. Google has told us that the future of search involves Knowledge Bases, and the indexing of Things, Not Strings. Gianluca Fiorelli explored Google’s ideas of Search in the Knowledge Graph Era earlier this week.
A few years back, I wrote some posts about some Google Patents that explored how Google might be extracting and visualizing facts, and using Data Janitors to process that information and clean it up and sort it. Google was granted another patent this week that’s very much related, looking at how Google might understand locations for places collected from Web pages. One of the inventors, Andrew Hogue, gave this Google Tech Talk presentation last year:
When you walk into the lobby of Building 42 at the Googleplex, you can see a display that shows you queries entered into the search engine at any one time. It’s a mesmerizing sight, and I found myself wondering about the people and motivations behind some of the search terms I saw flowing down the screen.
Imagine that instead of seeing one query at a time, that search information was analyzed, and queries were bundled together, to maybe provide us with more meaning.
Can search engines be used to tell us what the world is thinking at anyone time? Would looking at the most popular keywords or queries that people type into a search engine provide us with some insights?
A tool from Google that is often overlooked is Google Sets (no longer available), which allows you to “automatically create sets of items from a few examples.”
Google Sets was one of the first applications in the Google Labs (no longer available) pages.
Those pages are “Google’s Technology Playground,” and contain a number of programs that may or may not be tomorrow’s useful applications from the search engine. As Google tells us,
Google labs showcases a few of our favorite ideas that aren’t quite ready for prime time. Your feedback can help us improve them. Please play with these prototypes and send your comments directly to the Googlers who developed them.
Google was granted a patent this week on the process behind Google Sets, and the patent document provides some details on how the program finds additional words based on “items from a set of things” that you enter.