Google Patents Identifying User Location Spam

Google collects information about where you compute from, and provides location based services based upon where you travel. To protect this information, and to use it to protect people from spam and scrapers, Google might follow processes to protect that information and to analyze it.

Post a review from Germany about a restaurant, and then 15 minutes later from Hawaii about another restaurant, it’s spam. Drive down a highway where the cell towers collecting information about your journey are located in the middle of Lake Michigan, it’s likely spam. If GPS says you’re in NYC, and you then connect via Wifi in Wisconsin a few minutes later, spam. This information may not even come from you, but rather from others that might impersonate you.

A Union soldier pointing out a location on a map of Virginia.

Google was granted a patent last week which explores how they could use location based data to identify spammers and scrapers. It would also put user location information in a quarantine, and possibly hide starting and/or ending points for journeys from mobile devices to protect privacy for users, and to explore whether or not the information is spam. The location information could be used by the search engine, and that detailed information about locations to keep some information from being used in location based services, or other services that Google might offer.

In my last post, Google Patent Granted on Mobile Location Detection, I described how Google might use a combination of GPS, mechanical electronic devices such as gyroscopes and accelerometers, and algorithms to pinpoint locations more accurately and at lower cost than just GPS alone. I also described many of the location based services that Google uses which relies upon such services.

Google’s Mobile Maps page describes some of those. The patent tells us that it also might rely upon location for desktop services as well, and that it might consider desktop location information as well, when collecting location information for purposes of the processes described in this patent.

I’ve written a lot of posts about Web spam in the past, but this is the first patent from Google that I can recall using the recorded locations of people to Google to help the search engine check upon that information, and analyze whether or not the information is real, legitimate, and trustworthy. On a Web that will provide more information based upon mobile devices such as Android phones, and wearable computing such as Google Glasses, it’s not a surprise to see the patent.

Given the amount of work and effort that Google has put into Google Maps, and services that fall into Social/Local/Mobile categories, it makes sense for the search engine to put into place methods to verify user location information.

A flowchart from the patent showing data collected from phone, laptop, and another mobile device for location based services going into a quarantine server and then a long term server.

The patent also describes methods that the search engine might use to protect that information from others, and to remove personally identifiable information as well. Location based information might be stored initially in a quarantine server, and one or both endpoints of a journey (when one is involved) might be removed from that data.

Challenges – Privacy and Automated Spammers and Scrappers

While location services can be really helpful to people who use them to navigate, to find local businesses, and to potentially interact with others whom they know, there are also privbacy issues that the people providing such services have to contend with. I’m happy to see this particular patent address some of those.

In providing such services, being able to keep that information stored in a way that might avoid connecting an individual with the locations they visit is likely something that we all want to keep from happening. Finding a way to also make sure that location data collected and analyzed is legitimate, from real people, and not from third party automated programs that might try to gain access to database information, or compromise such a system with faked user location information.

The patent tells us that there’s a balance that needs to be maintained in collecting location information, and in protecting that information.

Thus, with regard to how much and for how long user location information should be collected and stored, there can be tension between the business utility of the information and user privacy.

On one hand, the more data collected, the more interesting and useful business uses created for users. As such, collecting comprehensive user location data can help create better products and services for end users.

On the other hand, due to the sensitive nature of the location data and user privacy concerns, removing personally identifiable information and/or storing less information may be desirable as well. Therefore, collection and retention of user location data can be a delicate trade-off, often with no single correct solution to the problem.

The patent describes in detail how such location information might be quarantined, and how it might be analyzed to identify spamers and scrapers. Starting points, and/or ending points of journeys might also be removed from such data.

For instance, Google uses location data to estimate traffic on roadways, and to alert driver to congestion ahead of them. It has no reason to record the start points or end points of any of the travelers that it collects speed or congestion data from, or their identities. But it does want to make sure such information is correct.

While a user ID that might be associated with location information used in a location based service would likely be encrypted, a user reputation score might be created based upon that data. Someone who appears to be traveling down a roadway beyond a “travel tolerance,” and faster than driving speeds that might be reasonable may indicate spam, and the reputation score might indicate that.

The patent is:

User location reputation system
Invented by Yan Yu, Sam Liang, Michael Chu, Yuhua Luo, and Zhengrong Ji
Assigned to Google
US Patent 8,370,340
Granted February 5, 2013
Filed: March 26, 2012

Abstract

A computer-implemented method and system of building a user reputation database for use in a user location data system. The method and system receive user location information containing personally identifiable data of a user and user position data. The user position data may or may not represent one or more actual geographic positions of the user.

The user location information is temporarily stored and analyzed to provide a spam score associated with the user position data indicative of whether the user position data represents the actual geographic positions of the user. Data indicative of the spam score is also provided to user reputation database to store a user reputation score associated with the user.

The patent describes a number of possible fingerprints that might indicate that information comes from a spammer or scraper instead of a legitimate user. These might include things like the rate at which information is uploaded or downloaded from different sources, whether or not there are matches between location information collected by GPS and Wifi, and more. Such information may never escape from a quarantine server into long term data storage.

A flowchart from the patent showing data in a quarantine server, through spam detection analysis and a defense server where a user reputation score is calculated.

While the patent describes a lot of details on how it might identify information that might be entered into this system from spammers or scrapers, and how it might assign reputation scores to users how how much to trust that information, what’s really important here is that a search engine can look at a lot of offline signals to identify efforts to spam or scrape location based information.

It makes a lot of sense for search engines to explore how such signals might be used to identify attempts to manipulate location based services, while protecting the identities of people using such services.

Share

26 thoughts on “Google Patents Identifying User Location Spam”

  1. This patent, local search and what is been doing to fight location spam, is very exciting.

    Another great read, Bill.

    Trond

  2. Hi Trond,

    Thanks. This does seem to indicate a pretty big move on Google’s part, to collect data about how and where user information is collected.

    Given the growth of the mobile Web, and Google’s work on things like Google Glass, it makes sense for Google to start looking at signals like these. It’s going to be interesting to see how this might impact the Web.

  3. Hi Andrew,

    I guess I should have just waited for your blog post on this then spend the hours I did over the past few days going through the patent, and writing a post on it. :)

    So how did you rank a user reputation score based upon their content in the quarantine server? :)

  4. I think I like the idea of google doing this. I mean the internet is so full of people trying to spam everything just to benefit themselves. If it were not for those people in the first place, then the internet would still be a reliable source for anything you searched for. Instead of being a slew of useless information. I just hate that you can’t trust the first thing that pops up on google, you have to spend the time and see if there are 3 or 4 articles that are along the same lines to know if you are getting the truth or not. I don’t know if this is going to help with all of that but maybe it will.

  5. what if i don’t share my location, i put it private, could it be possibly to detect my location? can the spammers still access my info if i made it private?

  6. This. Is. Awesome! I love Google, its constant evolution absolutely amazes me. This anti-spam move is welcomed by myself as a small business owner who has to compete with my competitors basing their SEO on spamming etc.

    It all does feel a bit Orwellian though doesn’t it?!

  7. Thanks for yet another fascinating read, Bill (and Google).

    I’d be really interested in knowing what opt-out mechanism (if any) you’d say Google might be brewing up. It doesn’t sound as though being signed into / out of one’s Google account plays into the tracking of location data, although I certainly could be missing something. I would think G+ accounts would be important – or at least useful – to assigning a “user reputation.”

    On another note, wow, I can’t believe Google is self-imposing restrictions on what it can and cannot do our with location data. Boy, could that data be mined for some green ad dollars. You sure this is the same Google we’re talking about here? :)

  8. Does this include those “Click here to share your location” popups you get on sites like Twitter? I’m not a fan of sharing my location with the world but Google’s seems to be putting more and more emphasis on removing anonymity and rewards users/sites for doing so (as evident by the power of authorship). As you mentioned, hopefully they really focus a lot on the user privacy concerns and minimize personal information transfer – otherwise people who don’t like to give up personal info will be at a disadvantage.

    Cheers,
    Oleg

  9. Hi Bill
    A good read, thanks.
    Just a thought, but won’t the spammers be able to carry on just by using something like an IP hider.
    I have never used one but I have seen them advertised, and I would guess that as Google becomes more efficient at stopping the spam, so the spammers themselves would develop ever more sophisticated software.
    Regards

  10. I’ve noticed YouTube trying to force me to merge my user name that I picked randomly aeons ago to match one of my gmail accounts. Not sure I like being forced to use just one name for all services so at the moment it gives you a questionairre and they seem to be gathering feedback.
    We are a couple of steps away from having local police knock on the door for having views against the incumbent local government. Scary times and most people are oblivious to it.

  11. Google never ceases to amaze me with their ever changing ways to combat spam. As a local business owner I take my hat of to google, as a spammer I would be shaking in my boots

  12. It’s nice to see Google constantly evolving with its algorithm updates mainly to combat spam on many of its properties on desktop and mobile.

  13. Hi Neil,

    If they were a spammer or scraper intead of a legitimate owner of an account, the anonymous IP address from the IP Hider would stand out and be different than the owner’s (who might also be using the same account, from a completely different location and IP address), so the information they added would likely remain in quarantine.

    If they were a legitimate account holder, but their IP address varied greatly each time they logged in, that would be pretty suspicious as well. If they used a mobile device that didn’t match up wiht the location they connected from, that might also be an issue.

  14. Thanks for such a detailed article Bill,

    Google has recently announce the improvement in local business review and identifying spammers. I am excited to see the co-relation between this announcement and this patent. One can easily predict the next step of Google by monitoring their patent.

    Search is getting pure day by day !

  15. Just to clarify, Google would check against historical/stored GPS data to verify the reviewer had been at/near the location they’re reviewing?

    What if I leave my phone at home when I go to the UK, and upon my return want to review restaurants I’d been to?

  16. Hi Sharon,

    It’s not just about reviews, even though I included an example about reviews.

    A little different than that. Google would collect location information associated with your phone or computing device, and include that in a quarantine server where it would be analyzed and anonymized. If it sees data that looks questionable, like two sets of data it collected that aren’t reasonable, it might question that data. For instance, if you connect via your account to a location-based service like navigation in Virginia, and then use another location-based service minutes later in Michigan, something is definitely wrong.

    If you visit the UK, and then return home and use your phone to write a review about a place you ate at in the UK, that isn’t a problem. If your account connects from the UK to write a review, and then five minutes later your account writes another review from New Jersey or India or Moscow, that is a problem. It’s likely that a spammer may be using your account to write one or the other of those reviews. It doesn’t matter whether or not you are presently near the place that you are writing a review about.

  17. Thanks Bill!

    That does make sense, and given the history of “reviewers” with 200 reviews in 20 cities in a one month window it sounds like a really good way to detect that spam.

  18. It seems to me that this method can, in real conditions, detect only a very small percentage of spam, if it wants to avoid too many false positives. And I am quite sure spammers will develop techniques to overcome this; for example, I don’t really see how this method would deal with proxy servers?

  19. So if your a large company with multiple locations, and wish to centralize your marketing, does this not mean Google will assume all your locations in Places, for example, are fake? Bill?

  20. Hi Michael,

    If “centralizing your marketing” means having multiple people logging into the same account from different locations at the same time, or at substantially the same time, that would have Google considering that most of those messages are spam. Especially if that involves doing things like writing fake reviews of a business from such an account. I would expect other search engines and social sites that might offer local/mobile/social services to do something similar.

    As the first claim in the patent notes:

    A computer-implemented method of building a user reputation database, the method comprising: receiving, at a quarantine server, user location information for a plurality of users, the user location information for each particular user of the plurality of users including a user identifier and position data for the particular user, the position data for the particular user being associated with timestamp information, and wherein the position data for the particular user represents one or more actual geographic positions of a client device associated with the particular user; temporarily storing the user location information in a quarantine data storage database; for each particular user, assigning the particular user’s user location information to one of a plurality of predetermined time-slots based on the timestamp information while maintaining a relative order and a temporal distance of the position data for the particular user, each pre-determined time slot of the plurality of pre-determined time slots being associated with a predetermined expiration; analyzing, by the quarantine server during the predetermined quarantine time, the user location information assigned to the plurality of pre-determined time-slots to determine if any position data is indicative of spam and to assign a spam score to each user identifier; storing the user identifiers with the associated spam scores; and filtering any position data indicative of spam so that the position data indicative of spam is not stored until the predetermined expiration, wherein: before filtering, providing data indicative of the spam scores and the user identifier to a user reputation database; and after filtering and before the predetermined expirations, providing data indicative of the spam scores and the associated user position data without the user identifiers to a filtered data storage system for long term storage.

    I’m not being “defensive” of Google. I’m just writing about what seems to be a fairly reasonable way that a business (whether Google, Bing, Facebook, Twitter, or any other that collects information about people who log in to use their services) might look at location based signals to keep people from manipulating it.

Comments are closed.