This past June, Google presented a way for us to use HTML to indicate that we are the authors of blog posts and online articles and other content on the Web. The details were introduced in Authorship markup and web search. I wrote more about it in Author Markup, Schema.org and Patents, Oh My!
One of the benefits of using Authorship Markup is the possibility of Google search results showing your Google Profile image to the right of pages that you’ve used the markup on to indicate as being from you, along with a link to that profile. It’s possible that the Authorship markup might be the start of something bigger.
Sir Bedevere: What makes you think she’s a witch?
Peasant 3: Well, she turned me into a newt!
Sir Bedevere: A newt?
Peasant 3: [meekly after a long pause] … I got better.
Crowd: [shouts] Burn her anyway!
From the color-me-unsurprised department comes news from Time Magazine’s Techland that 92% of Newt Gingrich’s Twitter Followers Aren’t Real. I’m not making a statement with this post about the politician’s politics, or his character, or even an indictment of social media itself. Mainly because I think far too many people are guilty of the same thing – trying to use inflated social media stats to prove their social worth.
I discussed this with keynote marketing speaker David Dalka this morning, and he shared his thoughts in Twitter Gate – Buy More Twitter Followers Free Instantly – Business Marketing Strategy Implications?, digging into some of the business issues involved surrounding social media and pursuing followers on social networks:
It makes one wonder where all these non-real followers are coming from and more than a few CEOs are likely reading this article and asking the question, “Is all this investment in social media justified and an activity that will grow my business and improve the bottom line or are there wiser investments to be made?”
In the Google paper, Predicting Bounce Rates in Sponsored Search Advertisements (pdf), we’re told about an experiment at Google where researchers used a document classification model on sponsored advertisements and landing pages to try to predict how many people might see an advertisement in Google’s search results, and after clicking upon the ad leave the landing page very quickly. The experiment in that paper is also described in another Google paper, PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce (pdf), which tells us how Google might be able to take an extremely large amount of observational data and use it to create classifications that, amongst other things, could potentially be used to help rank pages in organic search like we’ve been told that Google’s Panda updates do.
A patent from Google was granted today that appears to use a similar approach to determine whether sponsored advertisements in Google might lead to malware. The patent describes malware as malicious software that might be deceptively or automatically installed on a visitor’s computer when they arrive at a page. In addition to trojan horses and viruses, this can include monitoring software. In some instances a landing page may be the first in a series of one or more redirections, which can include malware on the page or pages being redirected to. The need for such a classification approach comes about because of the sheer volume of advertisements that Google shows.
We know that Google’s Panda updates look for features on websites that indicate “quality” in some manner. Under the document classification approach in this patent, “intrusion features” are tested and weighted on landing pages.
In my last couple of posts, I wrote about the acquisition of over 1,000 patents from IBM by Google. There are a number of reasons why a company might acquire a patent. In the case of the IBM patents, it’s likely that many of those will be used to protect Google from patent infringement litigation. It’s possible that some might be considered as launching points for the development of technology or processes that the company could use internally, or may offer to others outside of Google itself.
Some other recent patent acquisitions by Google include patents from Exbiblio, from Widevine, the phone patents from Myriad Group, more phone related patents from Verizon, and a number of memory chip related patents from Metaram, amongst others.
While many of those have the flavor of patents acquired to either help protect Google from patent litigation or to help them develop new technologies, a pair of patents recorded in the USPTO assignment database this Thursday, assigning the interests of Successes.com in those patents to Google, have a different taste to them. Successes.com is a company run by veteran broadcaster Jan Ziff (a correspondent for the BBC, the State Department, Voice of America, Associated Press, Mutual Radio, and National Public Radio’s Morning Edition and All Things Considered) and her executive producer on the 20 year CBS international news show Sound*Bytes, Allan Davidson. The testimonials page from successes.com includes many from some fairly large tech companies, including Nextel, America Online, Red Hat, Zonelabs, and this one from Google:
Yesterday I noticed a very large number of new patents listed in the USPTO assignment records for Google from IBM, and made note of them in a post, Google Acquires Over 1,000 IBM Patents in July.
I didn’t expect or anticipate the interest that my post would stir up, though I probably should have, given what seems to be an increased amount of litigation directed at Google involving patent infringement claims, with Apple taking on HTC and Google, Oracle and Google disputing use of Java in Android, Purple Leaf taking exception to Checkout, and other suits.
Given the interest in the IBM patents in a number of places on the web and some conversations I had, I thought it might be a good idea to provide the list of patents that Google acquired earlier this month. Google acquired a number of additional patents from IBM earlier this year and last year as well. I included those in my February post, Google Patents, Updated and Google Self Driving Cars Get Jumpstart from IBM Patents.
In yesterdays’ post, I mentioned that these newly acquired patents cover a wide range of topics, and I’ve had little chance to go through most of them. Some appear to be very broad, while others are much more narrow. Google might find a number of them useful in covering activities they are engaged in presently, such as the manufacture of a very large number of servers. Some include industries that Google might not venture into, such as the fabrication of chips. Many of them might act to help limit litigation aimed at Google.
Google was recently involved in a bidding war with Apple, Microsoft, and others over more than 6,000 patent filings from Nortel. It was a war that the search giant lost when a group comprised of Apple, Microsoft, Research in Motion, Ericsson, Sony, and EMC joined together to bid $4.5 billion in cash. Google oddly chose to bid using numbers based upon mathematical formulas and constants, with their final bid based upon pi – $3.14159 billion.
A post at the Official Google Blog, Patents and innovation, by Google’s Senior Vice President and General Counsel Kent Walker in early April discussed patent reform and the need for a company to defend themselves by having a formidable patent portfolio. Google’s decision to pursue the Nortel patents was based in part upon creating a “disincentive for others to sue Google.”
While Google might not have been successful in the auction for Nortel’s intellectual property, they haven’t been standing pat. On July 11th and 12th, Google recorded the assignment of 1,030 granted patents from IBM covering a range of topics, from the fabrication and architecture of memory and microprocessing chips, to other areas of computer architecture including servers and routers as well. A number of the patents also cover relational databases, object oriented programming, and a wide array of business processes.
Search engine optimization grows and changes much as the Web itself does. With the recent addition of Google Plus to the services that Google offers, and this year’s introduction of the Big Panda updates, one of the growing areas of SEO involves seeing how Google and other search engines might incorporate more user information into how they rank webpages. The introduction of Google Plus has highlighted the importance of looking at how the search engine collects information regarding how people search, how they browser the Web, what they publish online, and how they interact with others in social networks, and what the search engine might do with that information.
With the Panda updates, we’ve seen Google introducing a way of modeling information in large scale data sets, like the Web, to try to identify and predict features of webpages that can be used to rank pages not only on the basis of relevance and popularity (based upon the links pointing to those pages), but also also upon a range of other features such as credibility, trust, originality, range of coverage of a topic, usability, and more.
I’ve been looking back at some of the patents that Google published, and ran into a couple that really weren’t discussed much when they were originally published, and probably should be talked about a little more.
Historically, search engines have ranked web pages in search results based upon a combination of an information retrieval (IR) score based upon a matching of terms in a query to terms in a document, as well as a linked based score that calculates the quality and quantity of links pointing to a page, based upon a method like PageRank.
A new patent filing from Google explains some shortcomings of these approaches, and explains how a score based upon usage data of a document might be used either in combination with those approaches, or in place of them. The patent tells us that term-based methods can be biased towards pages where the content or display of those pages has been manipulated to focus upon those terms. We’re also told that link-based approaches are limited in that relatively new pages have usually have fewer links pointing to them than older pages, so they often have a lower link-based score.
Instead, pages that are returned as being responsive to a particular query might be assigned a score based upon usage information and ranked based upon those scores, or in combination with IR and link-based scores.
The patent application includes examples of two types of usage data, frequency of visits to a page or site, and number of unique visitors to a page or site, but it tells us that other usage data might be included as well.