I’ve written about Google Bombs in the past, and how a bio page featuring President George Bush ranked highly on a search for “Miserable Failure” as a result of a Google Bomb, in a post from 2011 titled How a Search Engine Might Fight Googlebombing.
In a post from earlier today, Nemek Nowaczyk wrote a post Google Bombing the Knowledge Graph: Who’s a Liar? He noticed that on a search for “liar” (in Polish), Poland’s Prime Minister Donald Tusk appears in the knowledge base results for the search. Nemek sent me an email with a link to his post. Within seconds, I was typing one of the better known English language Google bomb phrases into a Google search, with a guess as to what I would see there.
Ok, so the top knowledge base result on a search for “miserable failure” wasn’t George Bush. But a smiling George Bush was close enough to be a “see results about” disambiguation knowledge panel result.
The great thing about HTML is that it’s so flexible and offers so many ways to do things. The worst thing about HTML is that it’s so flexible and offers so many ways to do things. I’ve looked at a lot of websites and I still see people doing things new ways.
An issue that’s often common to many websites is when a page on a site can be found at more than one URL. This might be done by a site owner for a number of reasons, and in a number of ways. It might be an issue related to a content management system that’s being used as well.
A patent application published by Google explores how the search engine might recognize when it finds a URL through a web crawl and another URL through a feed, such as a product feed, with both URLs referring to the same page, but those URLs are structured differently.
This seems like potentially a lot of work to me, and the patent filing has me shaking my head that Google might use resources to figure out duplicated content on a site, even if it potentially might enable the search engine to understand URLs and associated products and other information that it might identify better.
Google was granted a patent this week that describes how web sites might be given quality ratings, based upon a model that looks at human ratings for a sample set of sites, and web site signals from those sites.
The patent tells us that the advantage of such an approach would be to:
Provide greater user satisfaction with search engines
Return sites having a higher quality rating than a certain threshold
Ranking sites appearing in search results based upon quality
Identifying quality sites without having a human view the site first
This patent was originally filed in 2008, and the use of quality signals sound similar to what Google has shared with us regarding the Panda Update. It’s more of a search quality “improvement” than a web spam penalty.
The patent uses blogs as a type of site that it can be applied to within its claims and description section. One of the inventors, Christopher C. Pennock was a Senior Software Engineer on Google Blog Search, according to an early 2009 SMX Session with him which discusses ranking signals in Blog Search.
On May 1st, Google’s Head of Webspam Matt Cutts published a video in his series of Google Webmaster Help videos, answering the question, “What’s the latest SEO misconception that you would like to put to rest?”
For some reason, Matt decided to focus upon patents, with a video about people possibly placing too much faith in what is uncovered in patents related to search engines. To a degree, I agree with his response, but I was reached out to by a number of people who saw the video as something aimed specifically at me, since I write about search related patents so often. I felt that I had no choice but to respond. Here’s the video from Matt:
In an ideal world, your site architecture should be set up so that search engine crawlers are only able to visit each page of your site at one web address, and no more. You may be laughing, but when Google sends you the “I give up, your site has too many URLs” message in Google Webmaster Tools, you won’t be then. Seriously.
Keep Colors and Sizes Together
If you create multiple product pages where the only thing different is offering the product in red or green or blue, or small or medium or large, you are creating too many pages. True when you decide to let “email a friend” pages get indexed, and “Add to my wishlist,” and “Compare Products” and other pages that Google doesn’t want in its index either.
On September 8, 2011, Google filed a patent named “System and Method for Confirming Authorship of Documents,” (U.S. Provisional Application Ser. No. 61/532,511). This provisional patent expired on September 9, 2012 without being prosecuted. A day later, on September 10th, Google filed two new versions of the patent, using the same name for both of them. Google’s Othar Hansson’s name appears on both as lead inventor, and the description sections are substantially similar, with a couple of very small changes.
The claims sections of the two patents are different, however. The first patent application (US20130066970) describes a link based approach to claiming authorship of a site, or being a contributor to that site. The second patent application (US20130066971) describes an email based method of claiming authorship (or of being a contributor).
The approaches described in both patent filings appear to be substantially similar to the instructions that Google describes in their help pages starting at Author information in search results
In 2006, Google battled Yahoo! and Microsoft for an algorithm developed by an Israeli Ph.D.student in Australia. The algorithm had a semantic element to it, and advanced Google in an algorithm arms race between the search giants (one of which doesn’t even have a search engine of its own now). We’ve seen the technology described in terms of how it is displayed in search results, but not how it does what it does. Until now.
Google was awarded a patent this week that looks at search results for specific queries and the entities that appear within them, to produce query refinements. This invention is from Google, but the lead inventor behind it was part of a bidding war between Google, Yahoo!, and Microsoft. In 2009, the breakthrough was made public on Google in the form of Orion technology.
The Orion approach involved both extended snippets for queries (three or more lines of descriptive snippet instead of two for some longer queries), and “more and better query refinements.” How this technology is displayed is described in a Google Official Blog post from March 24, 2009 titled Two new improvements to Google results pages.