“All mushrooms are edible; but some only once.” ~ Croatian proverb
Google was granted a patent today that could be used to collect a seed set of data about features associated with different types of mushrooms, to “determine whether a specimen is poisonous based on predetermined features of the specimen.” The patent also describes how that process could be used to help filter email spam based upon the features found within the email, or to determine whether images on a page are advertisements, or to determine categories of pages on the Web on the basis of textual features within those pages. The image below, from the patent shows how features about a picture such as height, width, placement on a page, caption, and so on might be examined while determining whether or not it is an advertisement:
This machine-learning approach can be trained with data that produces known outcomes, which could then be applied to very large data sets to classify data according to patterns identified within the seed set of data. When Google published Finding more high quality sites in search in February of 2011, they introduced what would beome known as the Big Panda update. The approach was further elaborated on by Google’s Matt Cutts and Amit Singhal in an Interview at Wired Magazine around a week later in TED 2011: The ‘Panda’ That Hates Farms: A Q&A With Google’s Top Search Engineers.
As a recent post on Google’s Inside Search blog noted, the Web doesn’t just contain strings of text, but also includes a great amount of information about things. The post was an introduction by Google to search results that would contain a lot more information about things that people might search for, with textual summaries and links to related topics in Google’s sidebar when appropriate. If you create Web pages, perform keyword research, and even search the Web, this presents some new challenges and some new opportunities.
A news story at Fast Company in 2010 carried the interesting title, Bing to Lap Google in Making Search an App? The article tells us about Microsoft finding ways to understand when it might be appropriate to show more than just links to web pages or images or news stories when certain searches might be performed. The “instant answers” displayed in the Bing search results aren’t the informational type results that Google is beginning to display alongside its search results, but are rather more akin to the OneBox type of results that Google has been displaying for a few years.
Bing, Entities, and Knowledge Bases
Google’s Project Glass seems to be moving closer and closer to reality, with the granting of 7 more patents today. Last week, I pointed out 4 patents related to the project in Google Glasses Design Patents and Other Wearables. Of those, 3 were design patents filed to protect the look and feel of the glasses, and the fourth patent described a way of using an infrared (IR) reflective surface on rings or gloves or even fingernails to provide input for the eyeglass display device. The patents granted today include only 1 design patent, and 6 patents that describe some of the more technical details about how Google’s Heads Up Display might work.
The First patent is a design patent from inventors who worked on the three design patents granted last week, Matthew Wyatt Martin and Maj Isabelle Olsson (Mitchell Joseph Heinrich was a co-inventor of one of the earlier three).
In a Google Inside Search blog post, Introducing the Knowledge Graph: Things, not strings we’re told of a new initiative from Google to show us more information within search results themselves about the things we search for. This is a potentially paradigm shifting view of what a search engine does. The post tells us:
The Knowledge Graph enables you to search for things, people or places that Google knows about—landmarks, celebrities, cities, sports teams, buildings, geographical features, movies, celestial objects, works of art and more—and instantly get information that’s relevant to your query. This is a critical first step towards building the next generation of search, which taps into the collective intelligence of the web and understands the world a bit more like people do.
It’s not a surprise that Google’s been working towards reinventing themselves and what they do. With an increased emphasis on social and real time search results, Google’s been transforming themselves into a way to monitor activities and events in the world as a near real time monitor, rather than just a repository of links to web pages that might satisfy situational and informational needs.
Google was granted three different design patents for augmented reality glasses today, showing slightly different looks from one to the other. The first one includes lenses, while the second and third show variations of the glasses without lenses. A fourth granted patent describes how augmented reality glasses could be used with IR reflective painted markers, on fingernails or gloves or other wearable items, to receive input through the glasses.
The first design patent, Wearable Display Device (US Patent D659,741) shows the following pair of glasses. As a design patent, its purpose is to protect the look and feel of the invention, without providing details of how it might work.
Author Ranking in social media is more than just a popularity contest, and can include things like how frequently an author surfaces content that subsequently becomes popular, topical authority on different subjects, and popularity and influence signals.
Author Authority to Distinguish Signal From Noise?
Social media contains a lot of signal, and a lot of what might be considered noise. Within social streams of real time communication such as tweets and status updates and blog posts is information that can be invaluable on many different topics.
How does a search engine pick out which authors are actual authorities on different topics, and which are sharing and resending and adding to authoritative content? How does it tell which authors are piggybacking off such content, and which authors just really aren’t authorities on any given topic?
Some authors aren’t even real people, but instead exist as spam and/or aggregator accounts, adding little or no value to other members of a social network.
Manipulative repetitive anchor text, blog comments filled with spam, Google bombs, and obscene content could be the targets of a system described in a patent granted to Google today that provides arbiters (human and possibly automated), with ways to disassociate some content found on the Web, such as web pages, with other content, such as links to that content.
In an Official Google Blog post, Another step to reward high-quality sites, Google’s Head of Webspam Matt Cutts wrote about an update to Google’s search results targeted at webspam that they’ve now started calling the Penguin update. The day after, I wrote about some patents and papers that describe the kinds of efforts Google has made in the past to try to curtain web spam in my post Google Praises SEO, Condemns Webspam, and Rolls Out an Algorithm Change.
The patent doesn’t describe in detail an algorithmic approach to identifying practices that might have been used to manipulate the rankings of pages in search results. Instead it tells us about a content management system that people engaged in identifying content impacted by such practices might use to disassociate certain content with webpages and other types of online content.
Google published 8 patent applications at the USPTO today that describe key elements of Google Plus and a number of alternatives that may or may not become part of Google’s social network. These include 2 applications on how social connections can be sorted into different social circles, 4 filings about how content can be shared in the system, and 2 more pending patents on differences in what might be shown to the author of content created on the social network and what might be visible to people viewing that content who aren’t the authors.
The patent filings are pretty detailed, and if you’ve spent some time using Google Plus, you’ll recognize a lot of the features being described within the patents, and see some that you might wish were included and others you may hope are never added.