Back in September of 2009, I wrote a blog post that I titled Google’s 10 Oddest Patents. The first of those that I included in that list was one named Instrument for medical purposes, I included it mostly because Google was a search company, and it felt odd that Google would have a patent on a medical process. That one used “ultrasonic sound to investigate the structural makeup of biological tissue in organs and vessels.”
Times have changed, and since that time, Google has restructured and put itself under a holding company structure with the name Alphabet running all elements of the company. A branch of the Company had evolved that was being referred to as “Google Life Sciences,” and it changed names recently as well, to Verily Life Sciences.
What role and what kind of impact might these new subsidiary have? I was wondering if Google would make changes to the patent assignments it had made along with the name changes, and I was surprised to see them do so, where they assigned 148 patents to Verily Life Sciences on two different days. It’s an interesting list, and I’ve provided it here. They may technically have ownership under other patents as well, but this list points to a number that could possibly become products that the company offers to the public, after any government approval that they may need to pursue.
In the post, the author (Chuck Rosenberg) tells us how they improve image searching at Google by labeling images with entities, rather than text strings. The entities they used are entities that you would find at a source such as Freebase. He tells us that they use Freebase Machine ID numbers for those labels:
As in ImageNet, the classes were not text strings, but are entities, in our case we use Freebase entities which form the basis of the Knowledge Graph used in Google search. An entity is a way to uniquely identify something in a language-independent way. In English when we encounter the word “jaguar”, it is hard to determine if it represents the animal or the car manufacturer. Entities assign a unique ID to each, removing that ambiguity, in this case “/m/0449p” for the former and “/m/012×34” for the latter.
Advertising on the Web is going through some changes because of how smart phones and tablets track visitors on a site, and how advertisements may broadcast high-frequency sounds that may act as audio watermarks that other devices can pick up upon. Imagine watching TV, and your TV broadcasts a high-frequency sound from an advertisement that your phone hears, and shares with the advertiser, who may then track whether you search for or purchase the product offered on a web site?
A couple of months ago, I wrote about a Google patent that involved rewriting queries, titled Investigating Google RankBrain and Query Term Substitutions. There’s likely a lot more to how Google’s RankBrain approach works, but I came across a patent that seems to be related to the patent I wrote about in that post, and thought it was worth sharing and starting a discussion about. The patent I wrote about in that post was Using concepts as contexts for query term substitutions. The title for this new patent was very similar to that one (Synonym identification based on categorical contexts), and the more recent patent was granted on December 1st of this year.
The new patent starts off describing a scenario that is a good example of how it works. The inventors tell us:
Googlebot Doesn’t Read Pictures of Text During Web Crawls
When I was an Administrator at Cre8asiteforums (2002-2007), one of my favorite forums on the site was one called the Website Hospital. People would come with their sites and questions about how they could improve them. One problem that often appeared was people having problems being found in search results for their sites for geographically related queries. One symptom for many sites experiencing that problem was that the only time the address of their business appeared on the site was in pictures of text, rather than actual text. This can be a problem when it comes to Google indexing that information. Google tells us they like text, and can have troubles indexing content found within images:
Google’s web crawler couldn’t read pictures of text, and Google wasn’t indexing that location information for their sites’ because of that. Site owners were often happy to find out that they just needed to include the address of their business in text, so that Google could crawl and index that information, and make it more likely that they could be found for their location.
Under this new patent, Google adds a diversified set of trusted pages to act as seed sites. When calculating rankings for pages. Google would calculate a distance from the seed pages to the pages being ranked. A use of a trusted set of seed sites may sound a little like the TrustRank approach developed by Stanford and Yahoo a few years ago as described in Combating Web Spam with TrustRank (pdf). I don’t know what role, if any, the Yahoo paper had on the development of the approach in this patent application, but there seems to be some similarities.