As part of the regular business analysis that I do on an ongoing basis, I like to keep an eye out for acquisitions made by search engines, and look at the technology that those companies being acquired have filed patents for.
When I heard about Google’s acquisition of Skybox, I jumped to the assumption that low-level orbiting satellites might be used in a manner similar to Google’s Project Loon to spread internet access to a wider audience across the globe. Or they might be used to make Google Maps a lot better with high resolution and frequently updated satellite images.
And then I looked at the patent filings assigned to Skybox Imaging, and quashed those assumptions, or put them off as secondary reasons why Google might have purchased the satellite company.
How much of an impact might high resolution and very frequently updated satellite images have upon a business analysis?
Most of us searchers and site owners and search engine optimizers are familiar with Google’s Link Graph, and how Google uses the connections between websites to help in ranking pages on the Web. In part, Google looks at the relevance of the content of a page compared to a query that a searcher enters at the search engine.
In addition to “relevance”, Google also uses the patented method of PageRank, in which the quality and quantity of links pointed to a page are used as a proxy for the quality of the page being linked to. The higher the quality of a page (and the higher PageRank it possesses), the more PageRank it likely passes along.
The link graph is one example of how Google ranks and measures and possibly sorts web pages. Another that Google might look at is the attention graph – how Google might use topics and concepts that may be searched upon frequently to change rankings of pages based upon freshness and hot topics.
When Google indexes the Web, it’s often been convenient to think about the search engine running two different methods or approaches that seem to run in parallel. One of those involves the crawling and indexing and ranking of pages on the web (and images, videos, news, podcasts, and other documents).
The other approach doesn’t look at pages as much as it indexes objects it finds on the Web, or what we often refer to as named entities, which are specific people, places, or things – real or fictional. We see this second kind of crawling often referred to as fact extraction and see the results of such extraction as Knowledge Panel results or even things like Google’s OneBox Question & Answer results.
When SEOs talk about Google and the programs it uses to crawl and index pages on the Web, we usually refer to those crawlers as robots or spiders or even Googlebot, and don’t differentiate these crawling programs much. Not the kind of robot above (which is a new twist from Google), but it’s probably time to start thinking of Googlebot differently.
There are things that we just don’t know about search engines. Things that aren’t shared with us in an official blog post, or search engine representative speaker’s conference comment, or through a publicly published white paper. Often we do learn some aspects of how search engines work through patents, but the timing of those is controlled more by the US Patent and Trademark Office than by one of the search engines.
For example, back in 2003 Google was filing some of their first patents that identified changes to how their ranking algorithms worked, and among those was one with a name similar to the original Stanford PageRank patents filed by Lawrence Page. It has some hints about PageRank and Google’s link analysis that we haven’t officially seen before.
If you want a bit of a history lesson you can see the first couple of those PageRank patents at Method for scoring documents in a linked database (US Patent 6,799,176) and Method for node ranking in a linked database (US Patent 6,285,999).
Does Google’s newly granted patent co-invented by Navneet Panda describe Google’s Panda Update?
Search Quality vs. Web Spam
Many of the patent filings that I’ve written about from Google address Web Spam issues, and how the search engine may take steps or follow approaches to keep its search results from being manipulated. An early example of Google tackling such issues is their patent filed in 2003 titled Methods and systems for identifying manipulated articles.
But many of the patents I’ve written about involve ways that Google is trying to improve the quality of search results that searchers see.
One of the most impactful updates at Google was the Panda Update, released into the world in February of 2011, and affecting almost “12%” of all search results. In a Wired interview of Google’s Amit Singhal and Matt Cutts, TED 2011: The ‘Panda’ That Hates Farms: A Q&A With Google’s Top Search Engineers, the name of the update was revealed to be taken from a Google Engineer that played a significant role in its development:
Wired.com: What’s the code name of this update? Danny Sullivan of Search Engine Land has been calling it “Farmer” because its apparent target is content farms.
Amit Singhal: Well, we named it internally after an engineer, and his name is Panda. So internally we called a big Panda. He was one of the key guys. He basically came up with the breakthrough a few months back that made it possible.
In January of 2011, Google’s Matt Cutts published a blog post on the Official Google Blog, titled Google search and search engine spam, which told us:
One misconception that we’ve seen in the last few weeks is the idea that Google doesn’t take as strong action on spammy content in our index if those sites are serving Google ads. To be crystal clear:
- Google absolutely takes action on sites that violate our quality guidelines regardless of whether they have ads powered by Google;
- Displaying Google ads does not help a site’s rankings in Google; and
- Buying Google ads does not increase a site’s rankings in Google’s search results.
These principles have always applied, but it’s important to affirm they still hold true.
I’ve been seeing a few long posts lately that list ranking signals from Google, and they inspired me to start writing a series about ranking signals over on Google+. Chances are good that I will continue to work on the series there, especially since I’ve been getting some great feedback on them.
This post includes the first seven, plus an eight signal – the Co-Occurrence Matrix described in Google’s Phrase-Based Indexing patents.
I’m also trying to include links to some of the papers and patents that I think are among some of the most important to people interested in SEO that support the signals that I’ve included.
Here are the first 8 signals: