In earlier days at Google, when you used to ask a question, you could sometimes get a response providing answers to questions such as:
“When was George W. Bush’s birth-date?”.
We knew that Google could answer some questions like that, even if it might have been challenging, but we didn’t have much of a clue regarding the existence of something like Google’s Knowledge Graph until 2011. The answers we would see would sometimes be regular snippets where a word such as “birth-date” might be bolded.
Our set of 17 “related patents” that I first saw mentioned in a patent I wrote about this past Tuesday, and which was granted on August 19th, appear to have been created by a team under Andrew Hogue who was tasked to create “an annotation framework” to index more objects and facts associated with them on the web, which he would discuss more deeply during the presentation The Structured Search Engine, which is highly recommended.
He also oversaw the acquisition of MetaWeb by Google and the introduction of 25 former Meta-Web staff members from the company into Google.
Not everything we read in a paper or in a patent from a search engine is something that happens in real life; but sometimes it is.
I like coming across a patent now and then that is dated but does a good job of describing something that happened as set out in that patent or paper.
The patent I’m writing about tonight was originally filed in 2006 and granted in 2010, and it provides a description of processes that I’ve seen first hand, and have used first hand to help people increase the number of visits they get to their offices or phone calls they get from future clients.
Google Maps a Proof of Concept of Knowledge Extraction
During a civil or criminal legal case, the prosecuting side needs to present evidence to a judge or a jury. Each individual piece of evidence doesn’t have to prove the innocence or guilt of the party being tried by itself, but the combination of that evidence has to meet a certain standard. For a criminal case, the standard is beyond a reasonable doubt. For a civil case, it’s a standard of more probable than not. So, criminal cases tend to require higher levels of confidence.
When Google collects information on the Web about an entity, for their knowledge vault, they want that information to be as trustworthy as possible.
If you’ve read anything about Google’s introduction of the knowledge vault, one of the points about it that stands out is that there’s a high level of confidence in the information listed. There is more confidence in the facts that are associated with entities than there might have been in the Knowledge Graph.
There are a number of patents from Google, both granted patents and pending patent applications, that describe ways that Google might learn about entities and about facts associated with those by extracting the information from the Web itself instead of relying upon people submitting information to knowledge bases such as Freebase.
We learned from Google’s recent announcement that they would be replacing the Google Knowledge Base with their Knowledge Vault, and that supposedly brings a whole new set of extraction approaches with it that have high levels of confidence with them as to how accurate they might be.
It’s hard to tell exactly which approaches Google might be relying upon, and which ones that Google might have introduced through something like a patent that is no longer being used. But, it doesn’t hurt to learn some of the history and some of the approaches that might have been used in the past.
I’m blogging about a patent today that describes an approach that many of us have assumed that Google has been using for years to identify Objects or Entities and attributes about those and the values that fit those attributes.
Google was officially assigned the pending patent applications from CiiNow last Wednesday (August 27, 2014) in a transaction that was reported as being executed at the end of July.
From searching through the USPTO, I don’t see any other patents assigned to CiiNow, so that appears to have been all they owned. The USPTO assignments don’t include financial details, so that information is unavailable.
The Ciinow.com website appears to be completely unresponsive to visits. The LinkedIn profile of CiiNow Co-Founder and VP of Engineering Devendra (Deven) Raut left CiiNow in 2014 and joined Google as a Tech Biz Dev. It looks to me that Google acquired CiiNow, Inc.
The title from a Google patent reached out and grabbed me as I was skimming through Google’s patents. It has the kind of title that captures your attention, as a weapon in the war that Google wages against people who might try to spam the search engine.
The title for the patent is Reverse engineering circumvention of spam detection algorithms. The context is local search, where some business owners might be striving to show up in results in places where they don’t actually have a business location, or where heavy competition might convince them that having additional or better entries in Google Maps is going to help their business.
The result of such efforts might be for their local listings to disappear completely from Google Maps results. The category Google seems to have placed such listings under is “Fake Business Spam.”
The World Wide Web is a vast resource for information. At the same time it is extremely distributed.
A particular type of data such as restaurant lists may be scattered across thousands of independent information sources in many different formats. In this paper, we consider the problem of extracting a relation for such a data type from all of these sources automatically.
We present a technique which exploits the duality between sets of patterns and relations to grow the target relation starting from a small sample. To test our technique we use it to extract a relation of (author, title) pairs from the World Wide Web.
A few years ago, I presented at SES San Jose and someone asked me what they should be keeping an eye upon in SEO. I told them “named entities.” I was reminded of that conversation as I gave a talk today about named entities and other semantics.
Barbara Starr and I gave a 3 hour Tutorial on Semantic Search to an enthusiastic and engaged audience. We also discussed which might be a better name for the tutorial, “Semantic Search” (the name it had) or Semantic SEO (what do you think?).
Here’s Barbara’s presentation, which is the first half of the tutorial Thanks, Barbara – totally brilliant stuff: