A patent granted to Google today explores Web spam and the manipulation of documents and links on the Web. It describes how the rankings of pages may be influenced if they are identified as “manipulative.”
The identification of manipulative documents, how they might be grouped together, and how they could be treated by the search engine is described in some detail. That treatment might include removal of pages from the search index, reductions in rankings for pages, and possibly a change in how quality scores (PageRank) are calculated for links from manipulative pages.
The patent was filed almost 4 years ago, on December 10, 2003, and wasn’t granted until today.
A good number of papers and patent applications have been published since then on Web spam, and have explored more detailed approaches, but this patent is interesting in that captures some aspects of how Google may have been detecting and fighting Web spam over the past few years (and may still be).
(Updated 11/26/2007 at 4:30pm to clarify the relationship between Google Website Optimizer and Google Analytics)
Google introduced a new tool in October of last year, the Website Optimizer, that enables website owners to test out different versions of pages on their website. Some new patent applications from Google focus upon testing and optimizing landing pages for conversions, using a tool that is very much like the Website Optimizer.
There’s a lot of supporting documentation from Google on how to use their Website Optimizer tool, including a detailed Quick Start Guide, and a couple of videos:
How might statistics created from user query logs be useful to search engines and to searchers?
A Google patent application published at the World Intellectual Property Organization, Systems and Methods for Generating Statistics from Search Engine Query Logs (opens in new window), explores how such statistics might be created.
The filing lists Olcan Sercinoglu, Artem Boytsov, and Jeffrey, A. Dean as inventors, and was filed with WIPO on May 9, 2007. It was published on November 22, 2007, and appears to show the process behind Google Trends. But it provides much more information than that.
A real life example which expands upon how such statistics might be useful is a study that was conducted with the help of two of the inventors listed in the patent filing, Language Preferences on Websites and in Google Searches for Human Health and Food Information.
Where will many Google employees be five years from now? How many will be running their own technology companies, and pursuing their own projects? How many will be investing in other companies, and helping to drive innovation?
Georges Harik was one of Google’s first 10 employees, the Director of Googlettes and a Distinguished Engineer at Google. He’s been involved in financially backing a number of startups, and is involved in a project by Pagebites, Inc., which may be poised to bring some interesting twists to online communications with Imo.im.
Under Georges Harik’s watch, the Googlettes worked upon efforts involving Gmail, Google Talk, Google Video, Picasa, Orkut, Google Groups and Google Mobile. He was also a co-developer of the technology behind AdSense and the Google Search Appliance.
He worked upon the first product plan for the AdWords Online system. A number of the Google patent filings I’ve written about here have his name on them as inventor.
The evolution of language used to discuss a topic can be interesting. One of the words that seems to be increasingly tied to Google is “platform,” as in the Android mobile platform and the Opensocial platform.
Mike Elgan wrote an interesting post earlier this month about a combination of these platforms in Making the Google Phone, OpenSocial connection. Mobile social networks seem like a great combination.
In working to create these applications, Google could make it easier for developers to put together applications that can be used across social networking sites and across different mobile devices, or both as Mike Elgan points out.
A Google patent application from last week explores how the development and use of applications could become even more affordable to developers. The abstract from the patent filing tells us:
Google published three patent applications on Google Notebook this week, which describe the fundamentals of how the program works, and provide a hint at how notebooks may influence some search results.
The nice thing about the Google notebook is that it has the potential to be a helpful research tool, enabling you to quickly save and organize information that you find on the Web. Having said that, I’ve had it installed on my desktop for many months and rarely find myself using it.
Some of the newest features that aren’t covered in the patent filings include the ability to turn your notes into Google Documents, the mobile version of Notebooks, an integration with Google Maps and with the personalized home page, and the ability to add labels to notebooks.
If you’re interested in some of the finer details of how notebooks work, and the assumptions behind their creation, then you may find some value in looking over the patent applications that describe them:
A couple of months back when I was traveling, I wrote a quick post about a new PageRank patent issued to Stanford University on PageRank, and asked if anyone would be interested in trying to break it down to see if it it had anything interesting in it. David Harry took a look in a post titled Tale of the two PageRank Patents.
David and I have been exchanging some emails since on some of the patents that we see, and an area that we are both fascinated with are some that delve into a kind of a behind the scenes personalization. He has written a couple of very thoughtful and interesting posts involving personalization at Google recently, which are definitely worth checking out:
If you could limit the results of a search at Google to a specific point of view, would you? Depends upon what I mean by point of view, doesn’t it? I’ll get to that below.
A Google patent granted this week shows a screen shot of an advanced search that could have been:
There are a number of interesting features in this advanced search that would enable searchers to filter or expand search results in response to their queries.
These would require a searcher to make some choices as to what URLs are looked at (as on-topic” or “off-topic”), or categories, or keywords, enabling them to add some or reject others.