Google’s Legitimacy Ratings: Should a Search Engine Determine the Legitimacy of Advertisers and Content Providers?

When You Go Shopping, Who Do You Trust?

In the late 90s, my sister called, telling me that she and my brother-in-law, and my parents were going to shop for large screen projection TVs. I suggested that she do some searching around on usenet for information and complaints about popular recent models.

Calling a couple of days later, she described what happened at one of the stores on their TV shopping trip. The salesman showed them a model they all liked, and then dragged them over to a more expensive model. My sister innocently asked why they should want the pricier model over the earlier one, and the salesman started answering her as if she were a dumb blonde.

She told me she smiled, and then explained to him why they weren’t interested in the newer model – the company that had been producing it had been purchased by another company which cut costs by switching from three internal projectors to one, and raised the prices of that model in addition. The first model was a higher quality television at a lower price. She said that his jaw dropped a few inches, and he quickly stopped extolling the virtues of that model while ushering them back to the television that they liked.

Usenet isn’t always helpful in finding information about consumer products, but sometimes it’s a goldmine.

The Problems with Recommendations and Reviews

Yesterday, I wrote about The Growing Power of Online Reviews, and how mainstream media is recognizing the ability of review sites and customer reviews to have an impact upon the success of a business found through those sites or on a local search from a major search engine.

I received a couple of excellent comments in response to the post which mentioned the possibility that sabotage of reviews through false or misleading reviewers, possibly from competitors, could be harmful to businesses with good reputations. That is a possibility, especially with sites that allow anonymous reviews.

Should Search Engines Provide Reviews of Sites Found in Organic and Paid Search?

I’m going to set up a couple of scenerios to illustrate a process that might help people make decisions on how reliable they might find about businesses online – a process described in a new patent filing from Google.

Scenerio One – Legitimate Search Results

You go to a search engine and perform a search by entering a query, and receiving a set of results in response. The links to pages include page title, a snippet of text about the pages, a link to a cached copy of the page, another link to “similar pages,” and a something that you can hover over which shows “legitimacy information” about the site.

That information would include things like how long the site has been online, and review information from previous visitors to the site. It also allows you to find out about other domains which may offer similar services or goods or information, and a legitimacy score for those.

Scenerio Two – Advertising through Paid Search with Legitimacy Ratings

You set up an account to advertise your goods or services online. You choose keywords to advertise with, create ads, and build quality-filled persuasive landing pages. You get the account started, and your ads begin showing in search results. Underneath your ad is a link for people to get a “legitimacy score for this web site.”

They can click on it, and find out information about you, possibly how much you’ve been spending on advertising online, and for how long, what other people think of your site and your business, and comparison information about other businesses that are bidding on the same or similar keywords.

Google’s Patent Application on Legitimacy Through History and Transaction Volume

I started this post with the story about my sister’s shopping experience because there was something about the following patent application that reminded me of her shopping experience. The salesman had more incentive to sell an inferior product because he made more money from the transaction. In this patent filing, it appears that a site that has a longer spending relationship, and spends more money on advertising would have a higher legitimacy score that one with a lesser history and smaller account.

Providing history and transaction volume information of a content source to users
Invented by Johnny Chen and Mohit Aron
Assigned to Google, Inc.
US Patent Application 20060200445
Published September 7, 2006
Filed on March 3, 2005


A computer-implemented system and method for providing a legitimacy rating of a content source are provided. A request for a document is received. An electronic document associated with a content source is passed by a document provider in response to the request. A legitimacy rating of the content source is passed. Examples of legitimacy rating information include, for example, a history rating of the content source based on the length of time the document provider has published documents associated with the content source and a transaction volume rating of the content source based on the number of electronic documents associated with the content source that are passed by the document provider.

Here’s a list of some of the legitimacy information that may be provided to a searcher when confronted with the chance to see that information for a paid search result:

(1) Advertiser’s transaction volume compared to the transaction volume of comparison advertisers (e.g., advertisers in the same industry, or who bid on the same or similar keywords);

(2) Advertiser’s transaction volume compared to all advertisers (or all content sources of a particular type);

(3) How often a user has selected the document;

(4) How many times a user purchased from the advertiser after choosing the document;

(5) The ratio of user clicks / user purchases;

(6) A comparison of that ratio compared to similar advertisers;

(7) Average buyer purchases;

(8) Total value of goods purchased from the advertiser from people who selected the advertisement;

(9) Amount paid by the advertiser to the ad distributor who passed the advertisement to users;

(10) Start date of the ad, or any ad from the advertiser and how long the advertiser has been advertising;

(11) A score based upon that length of time compared to similar advertisers, or all advertisers;

(12) When the advertisement, or the advertiser’s advertisements, were selected a certain number of times;

(13) Number of rating users;

(14) User approval score – perhaps based upon a percentage of rating users who approve of the advertiser within a certain period of time;

(15) Advertiser’s industry, or other identifying information such as an identification of the keywords bid;

(16) The industries for which users have approved the advertiser;

(17) Geographical areas associated with the advertiser (geographic locations where the ad has been published, or the location of the advertiser’s headquarters;

(18) Number or percentage of user complaints;

(19) Appropriateness ratings along a variety of criteria.


I understand the impetus behind this patent application – people engaging in phishing and other illegal activities are growing more sophisticated, and it’s getting more difficult to recognize a potential scam because of that. Helpful and legitimate review organizations like Consumer Reports can only do so much while the web has been increasing dramatically in size. There’s also no telling who penned anonymous reviews on review sites.

The search engine is at the center of much ecommerce, and may be in an idea place to provide information about a business, like the legitimacy information described in the patent.

But, using criteria that benefits long term advertisers over new ones, and rates bigger spenders higher than businesses with smaller advertising budgets seems like it could place the search engine in a conflict of interest situation, where the consumer is the one who ends up being harmed.

12 thoughts on “Google’s Legitimacy Ratings: Should a Search Engine Determine the Legitimacy of Advertisers and Content Providers?”

  1. I agree with most of the points about the information that can be gathered and used, but, it seems to me it is more useful as an algo to access landing page quality rather than the legitimacy of reviews and could be very useful in both cases. The problem of advantages to larger ad budgets can be smoothed. The problem of not all PPC advertisers providing conversion data would skew it more and provide definite advantages to those who provide conversion data. Which I assume wouldn’t sit well with many SEMs. 😉

  2. I’m working today and just spoke w/five potential customers. Two of them knew one former customer and a third knew 3 different former customers.

    All said nice stuff about us. The stuff I was telling them jived totally with what the first two people had heard from the one former customer.

    We weren’t bsing.

    those three told me they will buy.

    The power of word of mouth is awesome.

    Now potentially developing a form of “legitamacy” based on payment schedules to G has nothing to do w/true legitamacy.

    This type of “community” commentary will flourish on the web. I’d rather hope that Google would work toward real Legitamate “legitamacy” than something based on $.

    Yet we will have to work on coping with it….or a different system will arise and supplant G.

    You can fool some of the people all of the time and all of the people some of the time but you can’t fool all the people all the time.


  3. Now potentially developing a form of “legitamacy” based on payment schedules to G has nothing to do w/true legitamacy.

    That was the part of this patent application that really rubbed me the wrong way, in a fairly big way. Legitimacy based upon the advertiser’s relationship with the Search Engine, with the amount of their spend, and the duration of their relationship being indications of a legitimate business?

    The problem of advantages to larger ad budgets can be smoothed. The problem of not all PPC advertisers providing conversion data would skew it more and provide definite advantages to those who provide conversion data. Which I assume wouldn’t sit well with many SEMs.

    Good to see you here, Terry.

    I agree that the volume and transaction histories can be smoothed, but the choice of a title for the patent application seems to indicate that those are the focal points of the document.

    And I know many SEMs who refuse to use Google Analytics because they believe that it’s none of Google’s business. I’m in agreement.

    A good strong business model shouldn’t have to rely upon another business like Google for so much – delivering organic traffic, telling people that a site is more legitimate than other because it has been spending more money longer than others for paid search, and tracking and interpreting results for that business.

    This isn’t a good direction for Google to travel down, and I hope that the ideas expressed in this patent are buried somewhere dark for a very long time. I think Google inserts itself too much into the middle of things with this one. The reason why people trust a Consumer’s Report, or even the random folks on usenet complaining about some consumer product is that they have no economic incentive to recommend one thing over another.

    Yet here we see “legitimacy” based upon ad spends. There’s something not quite right there.

  4. We’re building a system to address the legitimacy issue, and it can be seen at It’s based on real-world business identity. We try to tie each web site to the real-world business behind it, then check out the real world business. If we can’t associate a name and address with the web site, we don’t consider it a legitimate business. This is consistent with the law in several major jurisdictions.

    Name and address information is obtained from SSL certificates, addresses on the web site itself, various business directories, and seals of approval from organizations like the Better Business Bureau Online. The basic question we’re answering is, “If you needed to sue this business, could you find them?”

    We rate sites accordingly, and crank that into the search engine results, displaying a rating icon for each search result. We have this working with Google and Yahoo search.

    This is an automated process, performed on demand the first time a site comes up in a search result. So coverage is broad, unlike manual rating systems, and real-time, unlike web crawler-based systems.

    It’s reasonably effective now, and we’re working on making it better. We’re in alpha test. Comments on the site would be appreciated.

  5. Hi John,

    I do have see some issues with your ratings/rankings. My business, for instance, is a real business, legally incorporated in the State of Delaware, with a bank account, clients, and a location. It scores a “not rated” under your system, and there seems to be a problem extracting the business location, which appears on every page of the site.

    I have no need for SSL since I don’t conduct ecommerce transactions online, and your rankings based upon that criteria makes my site look illegitimate. I’ve reviewed the costs and benefits of online Better Business Bureau membership, and I question the value of membership, and its use as a ranking criteria. What percentage of Web sites have BBB seals? I’d imagine that it is pretty small. What percentage of business sites have SSL on their pages? What do you do with sites that use third party shopping systems that use SSL on different domains along with their shopping cart?

    If we can’t associate a name and address with the web site, we don’t consider it a legitimate business. This is consistent with the law in several major jurisdictions.

    Perhaps it should be consistent with the jurisdiction where the business is located, and not just “several major jurisdictions”?

    Relying upon DMOZ information is questionable, too. The directory isn’t clearly delineated into commercial/noncommercial listings, for one thing. For another, editors will only list a limited number of sites in each category, and will not list many sites that could easily be listed, based upon not wanting to list too many sites that are involved in the same purpose. Another issue is that you are using data collected for one purpose to be used for another purpose, which isn’t always a safe practice.

  6. I see the problem. We don’t recognize

    Bill Slawski – Newark, Delaware 19711 – SEO by the SEA uses WordPress and has had (Stats Disabled) unique visitors

    as a valid mailing address.

    Neither would the Postal Service.

    All we ask is an address that would work on a mailing envelope.

  7. Hi John,

    That’s a good point. In response, I would say that it’s unlikely that I would write that on an envelope and mail it. 🙂

    The address does appear elsewhere on every page of the site – the middle column. The way it is presented would probably work fine upon an envelope:

    Postal Info
    Bill Slawski
    28 Choate Street
    Newark Delaware
    19711 USA

    Please don’t take the points that I’ve raised in the comments to this post as an attack. They were meant as constructive criticism. I do appreciate that you’ve kept the dialogue going. Thanks.

  8. Ah. Here’s the problem.

    Bill Slawski
    28 Choate Street
    Newark Delaware
    19711 USA

    isn’t a valid address according to the USPS Addressing Standards, which are very specific about the line with the ZIP code. That’s what their address scanners look for. Try:

    Bill Slawski
    28 Choate Street
    Newark Delaware 19711

    SiteTruth will pick that up. Since we’re extracting addresses from unstructured web pages, we have to have some idea what an address is supposed to look like. So we follow USPS and Universal Postal Union rules. Right now, we recognize valid UK and US addresses, plus some addresses in Universal Postal Union international format.

    We currently recheck sites every few weeks, so rerating may take a while. We’ll soon put in a means to request quick rerating, as a courtesy to webmasters.

    SiteTruth is still in alpha test, and these comments are very useful to us. Thanks.

    That’s a technical issue, of course. The more general issue of legitimacy metrics is more difficult. Google’s patent application verges on a “pay for placement” system, which nobody, including the Federal Trade Commission, likes.

    User-provided ratings aren’t all that helpful. If anyone can rate, the rating system will be spammed. Our local video rental store offers a discount if you give them a good rating in Yahoo Local. Even on eBay, where you have to deal with a seller to rank them, there’s much complaint about deceptive rankings. User rankings only work well where the number of raters is much larger than the number of things being rated, as with movies and TV shows. Joe’s Plumbing down the street will never get enough raters to be statistically significant.

  9. Hi John,

    I have changed the format in my template, so we’ll see if the change does get picked up.

    I understand the difficulties of trying to impose structure on semi-structured and unstructured data. Truth is, I’m not mailing the website anywhere, and neither are millions of other site owners, so there’s little impetus for us to follow a strict mailing format as defined by the post office. 🙂 I’m sure that I’m not the only webmaster who hasn’t. By requiring physical world formats/standards to the Web, you’re going to miss addresses that you probably don’t want to miss.

    There is the potential for user ratings to be spammed, but here they are being used as just one of a number of signals that may be looked at to determine the legitimacy of a product and help determine whether or not the product or service in question is legitimate or is a scam or part of a phishing attempt. By using multiple sources of information, each of which can be given different weights (possibility even differing from one type of product or service to another), it’s possible that no one signal dominates the others.

  10. I’ve seen the Yahoo Research paper. Microsoft has also worked some in this area, and you can see a little of their efforts in their paper Detecting Online Commercial Intention (OCI).

    Unfortunately, like you say, looking at a domain suffix isn’t really a good indicator because there isn’t a strict requirement that someone registering a .org address use it in a nonprofit or noncommercial setting in the same manner that a .edu is limited to accredited schools. DMOZ categories are also suspect and limited in coverage.

    The unfortunate thing with green/yellow/red indicators is that people won’t usually look past the colors to the information collected, and make a decision for themselves.

  11. Thanks. I forced a rerating, and SiteTruth picked up your address immediately. If you click on the rating icon. you can see the address information. It’s now rated as “No rating”, because, based on Open Directory information, you’re probably not selling anything. This places the search result below “green checkmark” sites where we have third party verification of identity, on a par with “yellow question mark” sites, and above “red do-not-enter” sites.

    We try to rate only sites with “commercial intent”.
    Right now, in alpha test, we look at the domain suffix (“.com” is usually commercial, “.org” usually isn’t), check Open Directory category, and check for advertising links. This is simple and understandable, but not as accurate as we’d like. Blogs with ads are usually treated as commercial sites, and tend to get low ratings. “Non commercial” sites are never rated “red do-not-enter”, unless we find negative information about them, like a phishing report in PhishTank.

    Yahoo Research has done some work in this area (see Web Spam Detection via Commercial Intent Analysis) They trained an ordinary Bayesian spam filter on web pages, with “commercial” and “non-commercial” training sets. That’s a promising idea, but has a transparency problem. Right now, everything SiteTruth does is easily explained to webmasters. That’s not true of Bayesian classifiers, which may do the wrong thing for statistical reasons.

  12. Yes, most users won’t look beyond the icons. The details are provided mainly for webmasters.

    We post on SEO-related blogs because we’re working to make proper disclosure of business identity part of search engine optimization. We want webmasters and people in the SEO field to understand what SiteTruth is doing, and have a chance to get ready for it, before SiteTruth launches in a form much more visible to consumers.

Comments are closed.