When Google indexes the Web, it’s often been convenient to think about the search engine running two different methods or approaches that seem to run in parallel. One of those involves the crawling and indexing and ranking of pages on the web (and images, videos, news, podcasts, and other documents).
The other approach doesn’t look at pages as much as it indexes objects it finds on the Web, or what we often refer to as named entities, which are specific people, places, or things – real or fictional. We see this second kind of crawling often referred to as fact extraction and see the results of such extraction as Knowledge Panel results or even things like Google’s OneBox Question & Answer results.
When SEOs talk about Google and the programs it uses to crawl and index pages on the Web, we usually refer to those crawlers as robots or spiders or even Googlebot, and don’t differentiate these crawling programs much. Not the kind of robot above (which is a new twist from Google), but it’s probably time to start thinking of Googlebot differently.
There are things that we just don’t know about search engines. Things that aren’t shared with us in an official blog post, or search engine representative speaker’s conference comment, or through a publicly published white paper. Often we do learn some aspects of how search engines work through patents, but the timing of those is controlled more by the US Patent and Trademark Office than by one of the search engines.
For example, back in 2003 Google was filing some of their first patents that identified changes to how their ranking algorithms worked, and among those was one with a name similar to the original Stanford PageRank patents filed by Lawrence Page. It has some hints about PageRank and Google’s link analysis that we haven’t officially seen before.
If you want a bit of a history lesson you can see the first couple of those PageRank patents at Method for scoring documents in a linked database (US Patent 6,799,176) and Method for node ranking in a linked database (US Patent 6,285,999).
Does Google’s newly granted patent co-invented by Navneet Panda describe Google’s Panda Update?
Search Quality vs. Web Spam
Many of the patent filings that I’ve written about from Google address Web Spam issues, and how the search engine may take steps or follow approaches to keep its search results from being manipulated. An early example of Google tackling such issues is their patent filed in 2003 titled Methods and systems for identifying manipulated articles.
But many of the patents I’ve written about involve ways that Google is trying to improve the quality of search results that searchers see.
One of the most impactful updates at Google was the Panda Update, released into the world in February of 2011, and affecting almost “12%” of all search results. In a Wired interview of Google’s Amit Singhal and Matt Cutts, TED 2011: The ‘Panda’ That Hates Farms: A Q&A With Google’s Top Search Engineers, the name of the update was revealed to be taken from a Google Engineer that played a significant role in its development:
Wired.com: What’s the code name of this update? Danny Sullivan of Search Engine Land has been calling it “Farmer” because its apparent target is content farms.
Amit Singhal: Well, we named it internally after an engineer, and his name is Panda. So internally we called a big Panda. He was one of the key guys. He basically came up with the breakthrough a few months back that made it possible.
In January of 2011, Google’s Matt Cutts published a blog post on the Official Google Blog, titled Google search and search engine spam, which told us:
One misconception that we’ve seen in the last few weeks is the idea that Google doesn’t take as strong action on spammy content in our index if those sites are serving Google ads. To be crystal clear:
- Google absolutely takes action on sites that violate our quality guidelines regardless of whether they have ads powered by Google;
- Displaying Google ads does not help a site’s rankings in Google; and
- Buying Google ads does not increase a site’s rankings in Google’s search results.
These principles have always applied, but it’s important to affirm they still hold true.
I’ve been seeing a few long posts lately that list ranking signals from Google, and they inspired me to start writing a series about ranking signals over on Google+. Chances are good that I will continue to work on the series there, especially since I’ve been getting some great feedback on them.
This post includes the first seven, plus an eight signal – the Co-Occurrence Matrix described in Google’s Phrase-Based Indexing patents.
I’m also trying to include links to some of the papers and patents that I think are among some of the most important to people interested in SEO that support the signals that I’ve included.
Here are the first 8 signals:
Will Google be transforming the way that we order from restaurants and other merchants such as pharmacists? A patent application published by Google this past week points to the possibility.
Google has been experimenting with showing menus from restaurants in its search results recently, and added them as reported in Search Engine Land on Friday – Now Official: Google Adds Restaurant Menus To Search Results.
The article seems more filled with questions than answers, such as where Google is getting the menu information, and even why they are publishing menu information. I suspect that a lot of restaurants will be be begging Google for ways to submit their latest menus in the near future.
Knowing what the menu might look like at a restaurant might make the difference between whether you will dine there, or drive past. For example, if I didn’t know better based on word of mouth, I wouldn’t begin to suspect that the Inn at Little Washington, in the middle of nowhere rural Virginia, might be one of the best restaurants in the United States. Here’s part of their menu:
When I’m looking for something at a search engine, I will often start out with a particular query and then depending upon the kinds of results I see I often change the query terms I use. It appears that Google has been paying attention to this kind of search behavior from people who search like me. A patent granted to Google earlier this month watches queries performed by a searcher during a search session, and may give more weight to the words and phrases used earlier in a session like that, and might give less weight to terms that might be added on as a session continues.
This patent seems like part of an evolution of algorithms from Google that has brought us to their Hummingbird update.