Quality Scores for Queries: Structured Data, Synthetic Queries and Augmentation Queries

Augmentation Queries

In general, the subject matter of this specification relates to identifying or generating augmentation queries, storing the augmentation queries, and identifying stored augmentation queries for use in augmenting user searches. An augmentation query can be a query that performs well in locating desirable documents identified in the search results. The performance of an augmentation query can be determined by user interactions. For example, if many users that enter the same query often select one or more of the search results relevant to the query, that query may be designated an augmentation query.

In addition to actual queries submitted by users, augmentation queries can also include synthetic queries that are machine generated. For example, an augmentation query can be identified by mining a corpus of documents and identifying search terms for which popular documents are relevant. These popular documents can, for example, include documents that are often selected when presented as search results. Yet another way of identifying an augmentation query is mining structured data, e.g., business telephone listings, and identifying queries that include terms of the structured data, e.g., business names.

These augmentation queries can be stored in an augmentation query data store. When a user submits a search query to a search engine, the terms of the submitted query can be evaluated and matched to terms of the stored augmentation queries to select one or more similar augmentation queries. The selected augmentation queries, in turn, can be used by the search engine to augment the search operation, thereby obtaining better search results. For example, search results obtained by a similar augmentation query can be presented to the user along with the search results obtained by the user query.

Continue reading “Quality Scores for Queries: Structured Data, Synthetic Queries and Augmentation Queries”

Citations behind the Google Brain Word Vector Approach

Cardiff-Tidal-pools

In October of 2015, a new algorithm was announced by members of the Google Brain team, described in this post from Search Engine Land – Meet RankBrain: The Artificial Intelligence That’s Now Processing Google Search Results One of the Google Brain team members who gave Bloomberg News a long interview on Rankbrain, Gregory S. Corrado was a co-inventor on a patent that was granted this August along with other members of the Google Brain team.

In the SEM Post article, RankBrain: Everything We Know About Google’s AI Algorithm we are told that Rankbrain uses concepts from Geoffrey Hinton, involving Thought Vectors. The summary in the description from the patent tells us about how a word vector approach might be used in such a system:

Particular embodiments of the subject matter described in this specification can be implemented so as to realize one or more of the following advantages. Unknown words in sequences of words can be effectively predicted if the surrounding words are known. Words surrounding a known word in a sequence of words can be effectively predicted. Numerical representations of words in a vocabulary of words can be easily and effectively generated. The numerical representations can reveal semantic and syntactic similarities and relationships between the words that they represent.

Continue reading “Citations behind the Google Brain Word Vector Approach”

How Google Might Make Better Synonym Substitutions Using Knowledge Base Categories

Shea Stadium
Leigh Miller – Yankee Stadium, francis_leigh, Some rights reserved

How Google May Use Synonym Substitutions to Rewrite Queries

A couple of months ago, I wrote about a Google patent that involved rewriting queries, titled Investigating Google RankBrain and Query Term Substitutions. There’s likely a lot more to how Google’s RankBrain approach works, but I came across a patent that seems to be related to the patent I wrote about in that post and thought it was worth sharing and starting a discussion about. The patent I wrote about in that post was Using concepts as contexts for query term substitutions. The title for this new patent was very similar to that one (Synonym identification based on categorical contexts), and the more recent patent was granted on December 1st of this year.

Continue reading “How Google Might Make Better Synonym Substitutions Using Knowledge Base Categories”

How Google May Transform Queries into Trigger Queries

Recently I wrote about Google’s Enriched Results Patent, where Google looked at query terms searched for, and for some of them the search engine returned special “enriched” search results that showed off things such as financial information when the query might have been something like a financial stock market term, such as “GooG” for Google.

A search result for Goog returns a financial listing for Google in search results.

At Search Engine Land in 2007, I wrote about Google’s OneBox patent, and much like Google looking for query terms that might return an enriched search result, under the onebox patent, Google might decide among a range of seven different types of search results, including things such as news results, images, videos, local results and others.

Continue reading “How Google May Transform Queries into Trigger Queries”

Searching with Pronouns: What are they? Coreferences in Followup Queries

At Google’s 15th anniversary celebration last summer, shortly after Hummingbird was introduced, Tamar Yehoshua, Google VP of Search, showed us conversational search at Google by first demonstrating a query asking for “pictures of the Eiffel Tower”, and then following up with the query “How tall is It?”

Looking through the base of the Eiffel Tower.

In that second query, Google had to not only remember the Eiffel Tower was being asked about, but also to recognize the Eiffel Tower when it was being referred to as “it.” That is part of the new “conversational search” that Google is now engaging in, using something know by linguists as a “coreference.” I wanted to write about coreferences to clear up confusion that people might have had about them.

I was inspired to do that after reading an article from Eric Enge earlier today, where he wrote about Knowledge Graph Advances From Google

Continue reading “Searching with Pronouns: What are they? Coreferences in Followup Queries”

Google Enriched Results Patent

I’ve been exploring some of the different search results that we see at Google, including things such as rich snippets and question-answering results, and came across a couple of patent filings from Google that describe something called “Enriched Results.”

You’ve seen enriched results before. As the first of the patent filings tells us, these results tend to be for things such as:

  • Airlines flights – live flight status information
  • Athletes – player statistics
  • Sports – League Scores
  • Weather – local weather information
  • Financial topics – financial data; and
  • Television programs- programming schedules

Continue reading “Google Enriched Results Patent”