Incomplete and Wrong Data in Google Local Search

Google Local Search uses address information that it buys from data suppliers like telephone companies.

Sometimes street numbers or other location information for businesses are missing in the information provided by those data suppliers.

How might Google fill in the missing information? One way might be for Google to search the web to find more about those businesses.

A newly published patent application from Google explores how it might perform web searches with the incomplete location information that it does have for businesses, and look at the snippets returned for more address information about those businesses.

What if any of that information is wrong?

Mike Blumenthal recently discussed an interesting question related to local search in Google Plus Box – Where does the (wrong) data come from? It’s a very good question, and he has some interesting examples.

The Address Completion Patent Application

For some types of businesses (and non-commercial organizations) and for businesses in some geographic areas such as China, address information can be difficult to obtain.

While Google can buy business listing data from commercial data vendors such as telecoms, that information may often lack street numbers or other important information, making it difficult for local search to display complete addresses and maps to searchers.

The Google patent filing describes how the search engine might try to get complete information for businesses.

Local Search Using Address Completion
Invented by Jiang Qian
Assigned to Google
US Patent Application 20080065694
Published March 13, 2008
Filed May 22, 2007

Abstract

A local search server receives queries for information about businesses from clients. The local search server searches a local information database for information about a business and reports the information about the business to the client that requested it.

Sometimes, the database lacks complete information for the business. For example, the database might be missing the street number for the business.

The local search server obtains the missing information by interfacing with a search engine and searching for hosted documents about the business.

The local search server receives snippets of text from the documents. The local search server applies one or more heuristics to the text snippets to determine the missing information. The missing information is saved in the local information database.

About Local Search

A local search engine may collect information taken from many data suppliers that provide listing data about businesses and other entities from specified geographic areas. The listing data might contain complete addresses for some businesses, but only partial addresses for others.

Local search works by responding to search queries for information about businesses within particular geographic regions by providing searchers with information about the businesses that may satisfy the query. Information about a business can include:

  • An address or other location information,
  • Business hours,
  • Phone number,
  • Editorial reviews of the business,
  • User-submitted ratings of the business,
  • A map displaying the location of a business,
  • Accepted forms of payment,
  • Whether parking is available,
  • Photos,
  • A link to the business’s web page,
  • Other information.

Missing Address Information

While the local search server may have general address information for a business from a data supplier, other information might be missing that would help show the exact location of the business and enable it to be shown on a map.

There might be more information about well known businesses, and less information for lesser-known businesses. In places like certain regions of China, complete address information is difficult to obtain from any data supplier.

Local search doesn’t need to show something like the floor number for a business in a skyscraper, but being able to show a business on a map, or to provide driving directions is helpful to searchers.

More Address Information from Search Snippets

Google might try to make up for this lack of complete information by looking for more information about the business on the Web.

Some types of businesses, such as parking lots, might be excluded from this process of gathering more information.

For businesses that are included, Google might search for web pages describing those businesses, containing terms matching some or all of the known address information for the businesses, and returning snippets of text from pages that satisfy the queries. Text in those snippets from near the search terms may include additional address information.

A snippet analysis program would follow certain rules to identify missing address information. Those rules can vary depending upon things like:

  • The language in which the search results are presented,
  • The type of missing address information that is sought,
  • The type of business, and/or;
  • Other factors.

Examples of Snippet Analysis Rules:

Choosing consistent versions – There may be more than one way to describe a street address, and the search engine may choose one way over another. For instance, in China, numbers can be represented in number for and in Chinese character form. That information might be shown in local search only in number form.

Looking for correct information – When looking at individual snippets to find a street number or other part of the address, if the business name appears before the address, it may be more likely to be the correct address for the business than if it appears after the address.

How Snippet Analysis May Determine Address Information

A snippet analysis program might favor more precise information over less precise information. The patent application provides a number of examples:

  1. When two street matches are found in a snippet and only the second match has a number, the snippet analysis program may treat the second street match as the address of the business.
  2. If a single snippet contains two different addresses of equal precision (e.g., two different street numbers), the snippet analysis program may favor the first address appearing in the snippet.
  3. For a snippet that contains multiple different addresses, the snippet analysis program might favor addresses that occur more frequently and/or occur earlier in the snippet than other addresses.
  4. If there are multiple snippets with inconsistent address information, the snippet analysis program may favor snippets from pages with titles that include the name of the business over snippets from pages with other titles, seeing the pages with the business name in the title as more relevant.
  5. When a snippet includes a cross street in the address, the snippet analysis program may favor the street having the street number and uses that street and number as the address.
  6. If a snippet includes a cross street but lacks a street number, the snippet analysis program may infer a street number based on the cross street.

Asking Users and Other Data Providers for Address Information

This address completion program could run when someone searches, and it could possibly ask for more information from the searcher.

Sometimes searchers do know the complete address for a business, and they might provide actual street numbers or other information, such as a cross street near the business.

It’s also possible for the search engine to try to obtain additional address information from an alternative paid data supplier.

One Example of Address Completion

From the patent application:

For example, assume that the local information database 310 contains an entry titled “Wal-Mart,” and that this entry contains the address information “Freeport Road, Pittsburgh Pa.”

An embodiment of the local search server uses the search engine to search for documents from document hosts having the terms “Wal-Mart,” “Freeport,” “Road,” “Pittsburgh,” and “PA” in order to ascertain the complete address. In return, the search engine returns the snippet: Wal-Mart Store 877 Freeport Road, Pittsburgh, Pa. 15238. Wal-Mart Super Center 250 Summit Park Drive, Pittsburgh, Pa. 15275. Select from the listings above

The local search server uses heuristics to parse this snippet and determines that “877″ is the street number for the Wal-Mart store on Freeport Road in Pittsburgh, Pa. In response to a query from a client, the local search server uses the geocoder module 314 to generate a map that accurately identifies the location of the store and reports this result to the client.

Other Uses

The process described in this patent application could be used for other purposes than local search, such as generating facts for a general fact repository that stores information from the Web (such as Google Q&A), covering a much wider range than just address information.

Why Does Incorrect Address Information Show Up in One Box Results?

Mike Blumenthal asked in his blog post, where the wrong data comes from in One Box results – when Google only shows address information for one business based upon a query. I really can’t answer that with any certainty.

Since I’m writing about how Google might gather address information when there is incomplete information for a business, I thought it might be good to look at some of the other patent applications to see if they provided some helpful information.

It’s possible that there are other things going on behind the scenes at Google that we don’t know about, but here are some of the things I’ve seen from the patent filings. Maybe they can help us.

Onebox results as Contact Information

A patent application that seems to address where one box information comes from is Enhanced Search Results, which I wrote about in When Might Google Show Local Search Information in Web Search Results?.

It provides a number of assumptions and insights into when Google might try to use information from their local search database, and when they might try to show information from the web site listed instead.

It’s possible for incorrect or incomplete information to be available in the primary data sources that are purchased by Google from telecom providers and other collectors of location data. It’s also possible that web sites may contain old or inaccurate addresses for businesses.

Onebox Results as the Best Match for a Query

The decision to show either one address, or more addresses with location information for businesses during a web search, is described in a Google patent application titled, Geographic Coding for Location Search Queries, which I wrote about in Girl Scouts with Guns: Geographic Coding in Google Location Searches.

If one business is a best result for a web search query, and any other results have a much lower confidence score, then only one result will be shown. If there might be several good results for the query used, then multiple results might be shown:

If a best score is more than a predetermined multiple of a next best score, the location corresponding to the best score may be provided to the user along with a map image of the corresponding location. The map image may be centered on the corresponding location and may be sized to include a per-determined bounding box, region or window around the corresponding location.

Alternatively, if the best score is less than the per-determined multiple, several locations corresponding to a range of scores may be provided to the user.

Political and Other Reasons for Incomplete Information

Some of the difficulties in getting good location information from places like China are explored more fully in a patent application titled Local Search, which I wrote about in Google Local Search in China: Export Restrictions, Filtering Sensitive Keywords, and Limited Data.

While the focus of that patent application is upon local search in China, it provides some other insights into approaches that Google could use anywhere when there is limited address information for businesses.

It also describes the possibility of using different colored icons on a map for businesses where less than complete address information is available.

Best Data Problems and Incorrect Address Information

A couple of patent applications from Google explore the database side of Google’s local search database more deeply.

The patent filing Generating structured information (which I wrote about in Google’s Local Search Patent Application) provides a lot of information about the collection of data from different sources, comparison of that data, and choices made as to which data should be shown for a business at a location.

The document tells us about how “confidence levels” for different geographic facts might be scored, and used to determine which information to display for a location.

An even deeper look at local search database structures, on choices made about which information to show with specific locations is the focus of the patent filing Identifying locations. I blogged about that document in A Google Approach to Improving Location Information Accuracy.

The location repository approach in Identifying Locations sounds like a way of following best practices in building a database, but a database can only be as good as the data that goes into it.

Direct Address Information in the Future?

A potential future source of address information might come directly from signs on buildings in the street view images of the buildings, which I wrote about in Google on Reading Text in Images from Street Views, Store Shelves, and Museum Interiors. Another post on that topic is Better Business Location Search using OCR with Street Views.

It’s possible that errors caused by the software that tries to read characters on signs from buildings might be limited by the use of global positioning satellite (GPS) information that is also collected when images are taken of buildings for Google’s Street Views program.

The images and address numbers of buildings photographed can be matched up with a database that shows which buildings are at which GPS location.

Conclusion

Indexing the World with local search may face more challenges than just indexing the World Wide Web. A recent news story at the LA Times about “Google Street Views and US Military Bases” described what happens when a street view team tries to film a military base.

Local, national, and international methods and laws about collecting and sharing data about locations differ.

Data collection by telecoms is more interested in phone numbers than street addresses. Web site business owners differ in how they provide contact and business location information on their sites, if they provide that information at all.

Mistakes in address information are going to happen – it’s inevitable. New businesses open, old businesses may close, or move, or change names, or merge with other businesses. Telecom information may not be corrected, or may not be updated very quickly, and can often be incomplete.

Websites may contain pages that show older addresses, or the addresses of agents or attorneys or mail drops or post office boxes or multiple locations.

Correcting address information looks like it can be a challenge from the web master posts and comments at the Google Maps Help Center.

Some steps in making sure that Google’s local search has the right location for your business may include

(1) checking your site or sites for old or incorrect address information,

(2) searching Google with your business name and then with address information (partial and complete) for your business to see what comes up, and making whatever changes you can,

(3) checking to see what information that telecoms and other data suppliers might have about your business, and

(4) verifying and editing your business information at the Google Local Business Center.

Share

29 thoughts on “Incomplete and Wrong Data in Google Local Search”

  1. Hi Bill-

    The interesting thing is that Google could get one “local” data set correct i.e. the OneBox and get one completely wrong i.e. the Plus Box at exactly the same time. You can see an example of this in this screen shot for Top Hat Dance Studio.

    The other interesting point is that the Plus Box is not included on all websites, just one’s where Google has a “high confidence” level that it is correct. Yet it often is not as correct as the OneBox.

    So clearly even with Google’s programming there are contradicitions that are often costly to the business involved. Sort of a yellow pages from hell scenario.

    Mike

  2. Hi Mike,

    It’s a little disturbing that there is different information showing up in the plus box as opposed to the one box.

    I should have included a link to my post about the plus box, too:

    Google Plus Box Patent Application

    In the instance that you write about, the old address for the dance studio was showing up in the plus box, while the new address was appearing in the one box. Google’s index is still showing some mentions of the old address with the studio, and a couple of pages on their web site still contained the old address. It is possible that having both addresses on their site may have caused the problem.

    Google is showing about 200 snippets in response to the search:

    “Top Hat dance studio” 3114 Willits

    Clicking through, it appears that they managed to change at least one of those, but a few others I checked show the business with the old address. If they can get more of those to change, it might help.

    Google is also showing 608 snippets in response to:

    “Top Hat dance studio” 10771 Bustleton

    So, the existence of the old address, and the many mentions of the business at the old address likely has somethng to do with this error. There are less results from the old address now, but it’s possible that some of them may appear to be more “authoritative” somehow.

    Definitely worth exploring further.

  3. Dave

    I have found instances where the data is clearly coming from 3rd party websites. It appears that in some cases the signals are just from the specific company’s own website but there are others where it is definitely coming from 3rd party sites as well.

    The question in attempting to find a rational solution in these cases is:
    1)Can the website owner change their website in such a way so as to become the primary signal for the plusbox info
    or
    2)Can the webmaster identify a limited number of “trusted” data sources that Google is using that could be changed that would affect the plusbox data.

    If the website owner can’t do either of the above then it truly would become a IYP hell as you point out.

    It is conceivable that Google could fix this with a programming change. But until such time advice from Google would be helpful but they have not offered any advice on this matter.

    Mike Blumental

  4. Thanks for commenting on this, Bill. I’ve been discussing this with Mike and working with a site pti.edu that currently has erroneous information coming up in a plusbox, but previously had an authoritative one box showing for certain queries with the exact same erroneous address information.

    Specifically, the map insert into organic searches was continuously sending potential customers off to the wrong address. There are less reported problems coming from the plusbox erroneous information.

    The business moved in late 2007 from its former main site to a new main site. It changed the information about address in the Google Local Business Center and changed the address information on its main address/directions page on its website.

    The erroneous information persisted, and still does.

    Further investigation of the site turned up about 15 pages with footer information describing 3 addresses that need to be changed. Among the 3 footer addresses the erroneous information contained descriptions of the 3 addresses with the old address being described as “Downtown Site”

    Additionally there are numerous snipped pieces of information on the web from a great variety of sites including telecom data sources and dot.gov sites that reference the old address.

    The webmaster is aware of the old addresses on their site and will remove the information pertaining to the old address.

    THE BIG PROBLEM, though, is if a crawl is being done of 3rd party sites and contributing to the erroneous address on the current plusbox showing in organic results, and/or a variation of a universal map showing in organic results, wherein the wrong information is sourced from 3RD PARTY websites, the task of removing that information is exceedingly difficult for most business owners/webmasters.

    The 3rd party website owners might not respond in remvoing or updating information. cripes there might not be an operating webmaster for a site.

    I hope the information is being scraped only from the existing website, otherwise it might be difficult or impossible for some webmasters to get an effective change.

    Dave

  5. Hi Dave

    I agree and it would be a simple matter for Google to add a button to the Local Business Center that says “Use this address for all Google address instances” Y/N (or whatever).

    Mike

  6. Based on your findings, Mike, and the large volume of potential sources of address information referenced by Bill above, it seems that there is a good chance that address information in places such as the plusbox or authoritative map, both showing in organic results could derive from sources other than the website and the Google Local Business Center. In fact it appears Google is writing patents that suggest this is part of the algo’s that give this information.

    From the perspective of the webmaster, though, if a wrong address is showing in a plusbox, onebox, etc. and the information is coming from 3rd party sources it may well be impossible or extraordinarily difficult to get the old or wrong address information corrected.

    Not all webmasters of 3rd party sites will respond to one’s efforts to get info about my site corrected. In fact there are plenty of websites around without an active webmaster anymore.

    Either of your suggestions might be acceptable.

    On the other hand, having read the complaints coming out of Google Groups for Business Owners, with regard to Maps, it is apparant that some address correcting mechanisms should be available…..and currently there aren’t any easy or apparent solutions.

    Dave

  7. Hi Bill!
    Thanks so much for writing about this. Incorrect information is definitely a plague. Because we’ve grown up in a YP world, we have come to depend on all contact information being corrected once a year with the publication of the phone book. By contrast, people expect web information to be accurate all the time.

    The question we still haven’t quite gotten a confirmation on about this is regarding the way Yahoo! works. I was told by a Yahoo! Local rep that if you pay $9.95 for their more advanced local listing, it locks in your business information so that it can’t be overwritten by scraped data from elsewhere.

    I mentioned this both to Mike and to Matt McGee. Matt was going to confirm this with a Yahoo! employee he was hoping to interview, but I think maybe that fell through.

    If the information is correct, that one can pay Yahoo to lock one’s listing, that would at least give business owners a way to pay ‘protection money’ to control their own presentation on the web. Frankly, if Google would adopt this, I’m betting all of those poor folks in Google Groups suffering major headaches over wrong data would gladly pay it!

    It wouldn’t solve the issues with unclaimed listings, conflicting data sets throughout the web and the other important challenges you’ve outlined, but it would be a start.

    Great post!
    Miriam

  8. Hiya Bill, exSEOllent post as usual! Just been sitting doing directory listings, and as we all know, the better business directories require very accurate information. Perhaps this will also blow more life into them if their content gets spidered?

  9. What has always interested me is the concept of ownership of all this address data. You figure there are dozens of phone companies who gather the information. In addition, there are hundreds of spiders that are gathering and reorganizing the information. Then other spiders are comparing and correcting the information. Then new data sources are overlaid on older data. After a while, there is really no original source in existence, and virtually every source in some way received information from many other sources and reorganized it. I feel like this information is nearly impossible to truly copyright, and would be interested in hearing your opinion on this.

  10. Dave (earlpearl) and Mike,

    There appear to be a few competing philosophies on mapping at work in Google Maps.

    From the patents on local search, Google doesn’t appear to locate the “authoritative” site for a business at a location first, but rather collects information about locations and the businesses that might be present first, and then determines an “authoritative” web site.

    So the search engine will look at the purchased telecom data about locations, look at semistructured directory sites to gather more information, and crawl/search for (partial or full) address information for a location to discovery what might be present at a location.

    All of that information might be given different weights of confidence to determine what is at that location.

    Competing with that is the ability for businesses to register with the search engine where they are located, and to have that location verified. This is less a mapping approach, and more of a business directory method.

    A third approach is to return a web site for a specific query, and if it is determined to be a business or organization that might have contact information associated with it, to make it easy for people to see that contact information when viewing the page in search results.

    In the first, the focus is upon the location and not the business that might be located there, in the second we have a business directory, and in the third, we have a search result with associated contact information.

    It’s not a surprise that the three different approaches can cause different and uncoordinated results.

  11. Hi Miriam,

    I’m not surprised that Yahoo would come up with an approach that involved paying money to make sure that the correct information was listed for a business at a location. I don’t think that Google would go down that path.

    Hi MSN hacken,

    Good to hear that you’ve seen some results online that appear to support what is described in this patent filing.

    Hi Jacques,

    Being included in a directory listing that displays some very detailed information is probably pretty helpful as long as the information changes when there are changes at the business, like the move to a new location.

    One of the issues involving using some of the directories that required a lot of information is that they often do so at a cost, which means that businesses that can afford registering with directories like that can be better represented in local search than nonprofits and organizations like parks others that may not have a budget or marketing department.

    Hi David,

    I’m not too sure that ownership of the data is too much of an issue. Some of the data used by the search engines is most probably paid for, and licensed for use.

    Grabbing location information from web sites also probably considered a use of data made freely available on the Web rather than any kind of copyrighted information.

    It’s also likely that as Google collects information from sites, it also collects information about the sources of that data. See my link above to A Google Approach to Improving Location Information Accuracy.

  12. Bill, et al:

    There might be an interesting albeit ironic solution in the works:

    Google removes the erroneous plusbox manually.

    The website/business owner complained on 3/13. Mike gave a very thorough response on 3/13. A google rep commented on 3/17 promising a “fix” in about 1 week.

    As of today 3/25, the plusbox is no longer visable.
    I think Mike’s suggestion of a button to push might be another way to work a solution.

    :D

    Dave

  13. Hi Dave,

    Your link is missing the URL. :(

    If the plusbox as showing a different address than the onebox, then at least the confusion is gone regarding which address information to use. Shame that the problem couldn’t be corrected.

  14. Mike’s blog reports a “fix” for erroneous information in the plusbox today 4/1/08. Hope it isn’t an april fools joke.

    The correction mirrors mike’s suggestion of a fix it button.

    Pretty impressive.

    Guess we’ll never learn “exactly” how that particular algo works.

  15. Bill:

    Google has been implementing a correction. Unfortunately it is an opaque correction.

    As of early April (and possibly during March) Google has been removing erroneous plusboxes that had reached the comment section of google groups for business owners.

    I had been helping one group and got an email the other day saying that all was well. Google removed the plusbox.

    This group first started contacting google in google groups in late November 07.

    I did a search for “plusbox” at google groups for business owners and identified five sites that had complained earlier in the year about erroneous plusbox information. All five had the plusboxes removed. On the other hand there was a complaint from a webmaster from early April and the wrong plusbox info was still showing.

    Well that is one way to deal with the problem.

    :D I’m sort of disappointed. I wanted to track one down and do the work to determine what sources were causing the erroneous plusbox info.

    Dave

  16. Hi Dave,

    It’s good to hear that Google is being responsive to people who are reporting incorrect plusbox information in Google groups.

    It would be great though, if they could come up with a way to resolve the problem so that it wouldn’t cause harm to people who don’t notice it is a problem.

  17. Dear Bill,

    We are currently shown with wrong address on google maps.
    Our address is 1510 12th Ave N Fargo, ND
    But google maps show it as 1510 12th St N Fargo, ND
    Kindly suggest us what needs to be done to correct the address on google maps.

    Thank you.

    Kanwal Gagneja
    North Dakota Center for Distance Education
    Fargo, ND

  18. Hi Kanwal,

    Google maps does have the correct address for you listed in it’s index – it shows you at 1510 12th Ave N, Fargo, ND when I do a search for North Dakota Center for Distance Education‎ at maps.google.com.

    The problem is the onebox result that shows in the Web search results when I do a search for your name, and it shows your site, and a contact address. That’s where Google lists your address as being at 12 St instead of 12th Ave.

    The onebox result that shows in the Web search results doesn’t necessarily get its information from the Google local database, but rather from an attempt to try to get the information from your site itself.

    The only place that I see your address on your site is on the contact page, and the formatting of that address might be a little confusing to the program that tries to extract information from pages because of the way that PO Box information is also included in your address.

    I might recommend trying a couple of things to help Google get your address right. The first may be to shorten “Avenue” in your address to “Ave”, which appears to be the way that Google likes to see avenue.

    The second might be to include your address on additional pages of your site, like your home page – you could include the address in the footer sections of every page, or at least the home page, and that might increase the chances of the search engine getting the information correct.

    Good luck.

    You could try to ask for help at the Google Groups for Google Maps pages: http://groups.google.com/group/Google-Maps

    I believe that some people from Google actually show up there, and try to help people who have problems like yours.

  19. Its nice that Google is trying to fill the information but I think it would be very difficult for Google because several companies listed in Local Search may not have any website or official page which will make it tougher for Google to get information on these companies.

    Better thing to do for Google would be to make all information necessary from default to get listed.

  20. Hi Max,

    I don’t believe that Google Maps was ever intended to be a directory of businesses, but rather a tool for people to use to search locally, for businesses, schools, parks, churches, and other organizations. Limiting it to businesses that have web sites, or to organizations that submitted information about themselves would possibly leave some substantial gaps in local search results.

    I think the idea of starting out with data from sources like telecommunications companies and directories, and adding to that data with information found on the Web, and information submitted by the owners of businesses and other organizations presents a significant challenge, but the results could possibly be very useful.

Comments are closed.