I’ve been saying for at least a couple of years that Google’s local search is a proof of concept for the search giant to use on how to find and understand entities.
With local search, Google goes out and looks for a mention of a business on the Web, especially when it it accompanied by geographic location information. It collects and gathers facts related to businesses (entities are people, places, and things) and then it clusters information about the objects it finds to make sure that those mentions across the Web are all referring to the same places.
If you start reading about local search, you’ll see people referring to the importance of consistency in how you present address information for a business, and the same thing is true for entities.
A couple of months ago, I wrote a post about a new patent from Google that was the first Google patent granted to Navneet Panda as an inventor. The patent described a complicated way for Google to judge the quality of websites, and my post was titled Is this Really the Panda Patent?. Simon Penson wrote a followup post at Moz titled The Panda Patent: Brand Mentions Are the Future of Link Building which looked at some other aspects of the patent.
On August 1st, Jayson Demers published a post to Forbes titled Implied Links, Brand Mentions And The Future Of SEO Link Building which covers a lot of the same ground as Simon’s post. I contacted an editor at Forbes and stated that the post plagiarized Simon’s post. Jayson didn’t give me any credit for my post about the patent either, but Simon did.
When Google crawls the Web to collect information about objects or entities, it also collects facts about those entities. These facts are separated into different categories or attributes associated with those entities. For example, a book may have attributes such as an author, a publisher, a year published, a web site it can call home , a genre, and more.
Identifying Entities by their Attributes
A search that includes those attributes can be used to identify the entity the attributes might be associated with.
Google was granted a patent recently that describes how those attributes could be searched within an attribute data store to find the entity. The patent shows how the process described within it might be used to answer some complex queries, and some interactive Answerbox type queries. The issue that this patent addresses can be summed up in a single question:
Years ago, I started referring to search results as recommendations, seeing how they’ve been starting to look more and more like that part of a page at Amazon that says “people who viewed this book also looked at these books.”
When someone searches at a search engine, one of the things they look for in the search results they receive are trustworthy pages (or recommendations) that look (and are) legitimate. How does a search engine deliver pages that are trustworthy?
One way to do that might be to try to boost pages in search results that the search engine feels are more trustworthy – and Google developed a version of Trust Rank to do that with. The inventor of Google’s Trust Rank (which differs from the version that Yahoo invented) is Ramanathan Guha.
As part of the regular business analysis that I do on an ongoing basis, I like to keep an eye out for acquisitions made by search engines, and look at the technology that those companies being acquired have filed patents for.
When I heard about Google’s acquisition of Skybox, I jumped to the assumption that low-level orbiting satellites might be used in a manner similar to Google’s Project Loon to spread internet access to a wider audience across the globe. Or they might be used to make Google Maps a lot better with high resolution and frequently updated satellite images.
And then I looked at the patent filings assigned to Skybox Imaging, and quashed those assumptions, or put them off as secondary reasons why Google might have purchased the satellite company.
How much of an impact might high resolution and very frequently updated satellite images have upon a business analysis?
Most of us searchers and site owners and search engine optimizers are familiar with Google’s Link Graph, and how Google uses the connections between websites to help in ranking pages on the Web. In part, Google looks at the relevance of the content of a page compared to a query that a searcher enters at the search engine.
In addition to “relevance”, Google also uses the patented method of PageRank, in which the quality and quantity of links pointed to a page are used as a proxy for the quality of the page being linked to. The higher the quality of a page (and the higher PageRank it possesses), the more PageRank it likely passes along.
The link graph is one example of how Google ranks and measures and possibly sorts web pages. Another that Google might look at is the attention graph – how Google might use topics and concepts that may be searched upon frequently to change rankings of pages based upon freshness and hot topics.
When Google indexes the Web, it’s often been convenient to think about the search engine running two different methods or approaches that seem to run in parallel. One of those involves the crawling and indexing and ranking of pages on the web (and images, videos, news, podcasts, and other documents).
The other approach doesn’t look at pages as much as it indexes objects it finds on the Web, or what we often refer to as named entities, which are specific people, places, or things – real or fictional. We see this second kind of crawling often referred to as fact extraction and see the results of such extraction as Knowledge Panel results or even things like Google’s OneBox Question & Answer results.
When SEOs talk about Google and the programs it uses to crawl and index pages on the Web, we usually refer to those crawlers as robots or spiders or even Googlebot, and don’t differentiate these crawling programs much. Not the kind of robot above (which is a new twist from Google), but it’s probably time to start thinking of Googlebot differently.
There are things that we just don’t know about search engines. Things that aren’t shared with us in an official blog post, or search engine representative speaker’s conference comment, or through a publicly published white paper. Often we do learn some aspects of how search engines work through patents, but the timing of those is controlled more by the US Patent and Trademark Office than by one of the search engines.
For example, back in 2003 Google was filing some of their first patents that identified changes to how their ranking algorithms worked, and among those was one with a name similar to the original Stanford PageRank patents filed by Lawrence Page. It has some hints about PageRank and Google’s link analysis that we haven’t officially seen before.
If you want a bit of a history lesson you can see the first couple of those PageRank patents at Method for scoring documents in a linked database (US Patent 6,799,176) and Method for node ranking in a linked database (US Patent 6,285,999).