In October of 2015, a new algorithm was announced by members of the Google Brain team, described in this post from Search Engine Land – Meet RankBrain: The Artificial Intelligence That’s Now Processing Google Search Results One of the Google Brain team members who gave Bloomberg News a long interview on Rankbrain, Gregory S. Corrado was a co-inventor on a patent that was granted this August along with other members of the Google Brain team.

In the SEM Post article, RankBrain: Everything We Know About Google’s AI Algorithm we are told that Rankbrain uses concepts from Geoffrey Hinton, involving Thought Vectors. The summary in the description from the patent tells us about how a word vector approach might be used in such a system:

Particular embodiments of the subject matter described in this specification can be implemented so as to realize one or more of the following advantages. Unknown words in sequences of words can be effectively predicted if the surrounding words are known. Words surrounding a known word in a sequence of words can be effectively predicted. Numerical representations of words in a vocabulary of words can be easily and effectively generated. The numerical representations can reveal semantic and syntactic similarities and relationships between the words that they represent. By using a word prediction system having a two-layer architecture and by parallelizing the training process, the word prediction system can be can be effectively trained on very large word corpuses, e.g., corpuses that contain on the order of 200 billion words, resulting in higher quality numeric representations than those that are obtained by training systems on relatively smaller word corpuses. Further, words can be represented in very high-dimensional spaces, e.g., spaces that have on the order of 1000 dimensions, resulting in higher quality representations than when words are represented in relatively lower-dimensional spaces. Additionally, the time required to train the word prediction system can be greatly reduced.

So, an incomplete or ambiguous query that contains some words could use those words to predict missing words that may be related. Those predicted words could then be used to return search results that the original words might have difficulties returning. The patent that describes this prediction process is:

Computing numeric representations of words in a high-dimensional space

Inventors: Tomas Mikolov, Kai Chen, Gregory S. Corrado and Jeffrey A. Dean

Assignee: Google Inc.

US Patent: 9,740,680

Granted: August 22, 2017

Filed: May 18, 2015

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for computing numeric representations of words. One of the methods includes obtaining a set of training data, wherein the set of training data comprises sequences of words; training a classifier and an embedding function on the set of training data, wherein training the embedding function comprises obtained trained values of the embedding function parameters; processing each word in the vocabulary using the embedding function in accordance with the trained values of the embedding function parameters to generate a respective numerical representation of each word in the vocabulary in the high-dimensional space; and associating each word in the vocabulary with the respective numeric representation of the word in the high-dimensional space.

One of the things that I found really interesting about this patent was that it includes a number of citations from the applicants for the patent. They looked worth reading, and many of them were co-authored by inventors of this patent, by people who are well-known in the field of artificial intelligence, or by people from Google. When I saw them, I started hunting for locations on the Web for them, and I was able to find copies of them. I will be reading through them and thought it would be helpful to share those links; which was the idea behind this post. It may be helpful to read as many of these as possible before tackling this patent. If anything stands out in any way to you, let us know what you’ve found interesting.

