Was Google Maps a Proof of Concept for Google’s Knowledge Base Efforts?

Not everything we read in a paper or in a patent from a search engine is something that happens in real life; but sometimes it is.

I like coming across a patent now and then that is dated but does a good job of describing something that happened as set out in that patent or paper.

The patent I’m writing about tonight was originally filed in 2006 and granted in 2010, and it provides a description of processes that I’ve seen first hand, and have used first hand to help people increase the number of visits they get to their offices or phone calls they get from future clients.

A Surveyor measuring land.

Google Maps a Proof of Concept of Knowledge Extraction

If you’ve used Google Maps, you’ve used one of Google’s most well known implementations of information extraction and knowledge sharing. Other applications from Google that use such methods include Google Now, Google Knowledge Panels in search results and Google Books.

I had an email conversation with Mike Blumenthal about recent errors Mike had been seeing in Google Maps, and he asked me if those might have to do with Google possibly changing how they extract information from the Web. He told me that he had read something I wrote about the Google Knowledge Vault, and how it was aimed at providing results that were more complete and more filled with more confident about the accuracy of results.

I told him that I would look into it.

Google had sent Search Engine Land an email telling them that the Google Vault was one of many projects at Google and may not be a replacement for the Google Knowledge Graph. Regardless of that, the series of patents I’m writing about now and the papers that accompanied news of the Google Vault all discuss fact extraction, and higher confidence levels for facts, which is something worth discussing.

If you’ve spent time in the past doing local SEO, you probably seen that it fits into the world of the Knowledge Web very well.

This Google Maps patent that addresses how information is extracted from the Web to build Google Maps is cited as being related to the patent I wrote about a couple of posts ago, Learning objects and facts from documents.

The patent is:

Generating structured information
Invented by Egon Pasztor and Daniel Egnor
Assigned to Google
US Patent 7,788,293
Granted August 31, 2010
Filed: March 1, 2006

A structure generation engine collects data from multiple sources on the network, unstructured or structured. It parses the data to create structured facts, which it presents as entries in a local directory, as results to a search query, and/or in response to another request for information.

Structured, Semi-structured, and Unstructured Data

Structured data are data that have been organized to allow identification and separation of the key (i.e., context) of the data from the content.

Structured data can be understood by a computer or other machine. For example, consider a telephone number organized in the structure “TN:xxx-xxx-xxxx” where an “x” denotes a number. A computer-implemented process that encounters data organized in this format, such as “TN:212-864-6137″, can determine that the key for the data is a telephone number, and the value of the number is 212-864-6137.

Unstructured data are data that are not organized in a particular format and where ascertaining the context and content might be difficult.

Semi-structured data are data that are partially organized.

Structure Generation Engine

The structure generation engine includes an interface for receiving data from one or more commercial data providers, as well as web pages from sources such as an the enterprise web site and a directory web site.

The engine analyzes the received data to identify facts formed of key-value pairs, and normalizes them to produce structured data.

Keep in mind that the “facts” collected are fairly simple, and include contact information, hours of operation, and handicap accessibility information, and parking information.

Data Related to Enterprises

The structure generation engine receives data related to enterprises local to a particular geographic region such as a city. An enterprise can be a “business, school, government office, non-profit organization and/or other similar entity.” These ended up including park, forests, golf courses, and more.

For some places, such as restaurants, the data might relate to aspects of the restaurant, such as business hours, reservation policies, and accepted payment methods.

Instead of going into a knowledge base, this data goes into a local directory for a geographic region. Google has been answering Q&A type queries with data collected going into some fact repository, but this patent doesn’t focus upon those.

Examples of commercial data providers that generate data that might be used to provide information about these enterprises can include telecommunications providers such as telephone companies, media providers such as newspaper companies, and commercial directory providers, such as the D&B Corp.

The types of data about these places tend to be limited and often have some degree of structure to them. For example, a restaurant in the directory web site might contain the text “Reservations:” followed by a “yes” or “no” to indicate whether the restaurant takes reservations. Some web site might contain less well structured content and facts, such as one site might specify a restaurant’s business hours as “open Mon to Fri 9-5, Sat until 6″ while another specifies the hours as “open 6-2, closed Sundays and Holidays.

Some data might be purchased from commercial data providers.

Some of the data might be from verified site owners, who claim a listing in exchange for the ability to maintain and update it, and view analytics related to it.

Some data might also be collected from web crawlers which recognize the business name and the associated location information.

The patent provides more details on processes such as normalization of facts

This data extraction and normalization process is very similar to that described to be included in a knowledge base for Google.

The patent also discusses confidence levels, and importance of facts (as “weights of the fact”), and tells us that if those aren’t high enought, that they just might not be shown at all.

Mike Blumenthal had asked me if a data extraction process might be in use that showed inaccurate data, but this patent seems to be telling me that it’s more likely that incorrect or immaterial data might be more likely not to be shown at all.

Share

4 thoughts on “Was Google Maps a Proof of Concept for Google’s Knowledge Base Efforts?”

  1. Hi @bill I’ve written a couple of posts stating that Knowledge graph makes more sense for Google in the local search landscape as it’s noisy and by applying data extraction at large scale is the only way for Google to compete with established IYP that know directly clients. I agree totally knowledge graph is more usefull for Google to understand local signals and addresses real time info but both are tied. Thanks for sharing as usual ;)

  2. One of the real life problems with this patent, application, and the theory behind is that business data has been systemically falsified in the directories; the IYP’s and many other directories. Its been going on for years. The most notorious vertical in this regard have been the locksmiths; but its been true for other verticals.

    The 2nd issue is that once in the local index system google doesn’t seem to eliminate the data. It exists, or can exist for a long time. Old data from years ago, that was initially not true, and/or data about a business that has since relocated from 125 Main Street to 285 Hampton Avenue, remains in the local system. If one searches hard and long enough one can find data in the local system about the 125 Main Street location.

    Periodically, as google has made large scale changes in the overall local algo’s the old data surfaces in search, in the Pack results and on Maps. One reason is because it was never “retired”.

    Now here is a real life issue. Suppose Mr Slawski’s dry cleaner at 125 Main purchases Mr. Oremland’s dry cleaner at 125. Mr Slawski renames the dry cleaner Good Guy Cleaners. The former name was Down and Dirty Cleaners.

    Mr. Slawsky keeps the phone number. Its a smart everyday move. Down and Dirty had some customers Good Guy wants to keep.

    So a new business goes up at 125 Main, named Good Guy Cleaners. Meanwhile the parties completely clean up the google my business record and Good Guy has this new wonderful website, a claimed listing in GMB, appropriate categorization, and goes out and gets a lot of great citations.

    BUT ALAS. Google has all that old data for Down and Dirty. It exists in its local data base. It actually maintains the same address and the same phone number as Good Guys Cleaners; at least 2 of the 3 elements of NAP, (and if one considers NAP plus url, 2 of 4 elements.).

    Anyway, both Good Guy and Down and Dirty did everything above board, and Good Guy did the business natural thing in retaining the phone number.

    Alas, it appears to me, after some investigation over 4 years that Good Guy’s strength of signal is somewhat depreciated. Its not as strong as if it had a totally new record with totally new and non confusing citations.

    That comes from a real life example (names and vertical changed). Years after the change one can still see Down and Dirty on Maps. The pack visibility strength of Good Guy is not as strong as it should be (in my estimation).

    Its a bit of a real life dilemma. I wish google would filter out all the old data. It has a “controlling” record that states that Down and Dirty no longer exists. But dang it….it still exists on Google maps, and periodically when google impacts the base algo, it resurfaces.

    Its a problem.

  3. Hi Dave,

    Hopefully, Google is getting smarter and using more available data in dealing with Map Spam. I didn’t write a post about this patent, but it seems to go in that direction, too:

    DETECTING POTENTIALLY FALSE BUSINESS LISTINGS BASED ON GOVERNMENT ZONING INFORMATION
    http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220150154611%22.PGNR.&OS=DN/20150154611&RS=DN/20150154611

    A bunch of the younger software engineers at Google started up the Google Vault project, which covers collecting data on the Web, but at a higher percentage of correct data. They were the group behind the Knowledge-Based Trust paper – http://arxiv.org/abs/1502.03519

    So, yes, there have been some issues with relying upon methods that index based upon data instead of indexing URLs based upon links. Both approaches may have some issues, but I suspect that Google is going to get better at identifying false data, even some of the really old stuff and holding on to and updating to correct data.

    Google does need to learn better how to verify facts about business entities. We’ll see if they can fix those.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>