How Google May Rank Web Sites Based on Quality Ratings

Google was granted a patent this week that describes how web sites might be given quality ratings, based upon a model that looks at human ratings for a sample set of sites, and web site signals from those sites.

The patent tells us that the advantage of such an approach would be to:

  • Provide greater user satisfaction with search engines
  • Return sites having a higher quality rating than a certain threshold
  • Ranking sites appearing in search results based upon quality
  • Identifying quality sites without having a human view the site first

This patent was originally filed in 2008, and the use of quality signals sound similar to what Google has shared with us regarding the Panda Update. It’s more of a search quality “improvement” than a web spam penalty.

The patent uses blogs as a type of site that it can be applied to within its claims and description section. One of the inventors, Christopher C. Pennock was a Senior Software Engineer on Google Blog Search, according to an early 2009 SMX Session with him which discusses ranking signals in Blog Search.

One aspect of this ranking approach is to have human raters rate the quality of pages of a site (all pages), scoring each on a scale of 1 to 5, with 1 being low and 5 being high quality, and aggregating those together for the site as a whole. Those ratings are augmented with factors from the web site such as:

  • Originality of the arguments or information on the site
  • Amount of original content versus copied content
  • Layout of the site
  • Correctness of grammar and spelling of the text on the web pages
  • Whether obscene or otherwise inappropriate material is presented
  • Whether the websites have blank or incomplete pages
  • Other factors that would affect the quality of the site

These signals are very similar to ones that were published in the Google Webmaster Central blog post, More Guidance on Building High-Quality Sites from May of 2011. The post was aimed at explaining “how Google searches for high-quality sites,” by providing 23 sets of questions that someone at Google might ask themselves as they attempt to “write algorithms that attempt to assess site quality.”

This patent definitely doesn’t explain exactly how Google’s Panda update works, but the concepts behind it are similar in a number of ways. As Google notes in that blog post:

Of course, we aren’t disclosing the actual ranking signals used in our algorithms because we don’t want folks to game our search results; but if you want to step into Google’s mindset, the questions below provide some guidance on how we’ve been looking at the issue

That might be the best way to approach this patent (and many other patents), enabling people to view the issue of presenting quality pages higher in search results from Google’s perspective. The patent is:

Website quality signal generation
Invented by Christopher C. Pennock. Jeremy Hylton, Corinna Cortes
Assigned to Google
United States Patent 8,442,984
Granted May 14, 2013
Filed March 31, 2008

Abstract

Systems and methods relating to website quality rating are disclosed. Websites are rated, relationships between ratings and website signals are identified, models are generated and modeled ratings are assigned to unrated websites by applying the models to the website signals of the unrated websites.

Other Human Rater Actions

When human raters look at pages, they also perform some other actions other than rating pages from 1 – 5.

One of those is to skip over some sites completely when URLs on the site show objectionable content such as spam or pornography or because pages from the site don’t load. These sites might be determined to be “invalid” for rating. In part, this categorization as “invalid” by raters can filter some sites from the rating process because there might be a bias on the part of the rater to negatively rank pages they personally find objectionable.

Another is to select a viewing appeal for the websites.

Broad appeal – if the content of the site appeals to a broad segment of the population such as a website related to high profile national or world news events.

Niche appeal – if the content of the site appeals to a very narrow subset of the population such as a website dedicated to electromagnetism.

The viewing appeal might be used as a factor to rank or filter sites presented in response to a search request. (The patent doesn’t tell us if “viewing appeal” is a positive or negative ranking signal, though.)

Applying Quality Signals to Blogs

The patent claims section does call out blogs as the types of sites that would be covered by this patent, but with the removal of a couple of short sentences, those claims could easily be applied to any kind of web site.

It’s quite possible that there’s a very similar patent filed by Google as the USPTO that explores how quality signals could be applied to non-blog sites.

Google specifically points out things like click rate, blog subscription rate, and PageRank scores as web site signals that can be associated with a blog.

Click Rate - There might be two different click rates used here – the first involving how often a URL for the site was clicked upon when it appeared in general search engine results, and the second is the number of times a URL from the site was clicked upon in a blog search. The patent tells us:

The click rate is a blog popularity indicator and therefore a potential quality indicator.

Instead of a raw number of clicks, these click rates might be defined as a ration of the number of clicks a page receives as opposed to the number of times it’s shown in search results. Those may also be normalized based upon the position that the page was at in results since a page at the top of results will probably be clicked upon more than one at the bottoms of results.

Blog Subscription Rate – Funny to see Google Reader listed as one possible source of information like this, though the patent tells us that Google might extract information like that from other sources as well. The importance of this information is explained here:

Blog subscription rate is indicative of the quality of the blog because it is a measure of readership. A higher readership is indicative of a higher quality blog.

PageRank Score – This score is another signal that might be used for blogs, and it would likely play a very similar role in building a quality rating as it may in ranking other types of pages on the Web.

Take Aways

The patent provides more details on how human ratings and signals from a website might be used to create a model for quality rating that can help determine how pages are ranked in search results, using a machine learning approach to generate ratings for pages based upon the sample set of pages actually rated.

One thing I found really interesting was the patent’s description of when pages might be re-rated, or reclassified.

One possibility might be that pages would be re-rated on a periodic basis. That seems to have been what had been happening with Panda updates.

More interestingly though, is a different option, where re-rating a page or site might be triggered by some pre-defined change in web site signals:

For example, if the PageRank score associated with a website varies by a defined percentage (e.g., 10%) then the process can be triggered to update the model that characterizes the relationships between the website signals and the website quality rating.

With Panda updates in the past, Google was providing warnings of when data might be “refreshed,” presumably meaning that sites impacted might be re-classified based upon periodic updates. In March, Google stated that Panda updates would happen in an ongoing process instead.

Does Panda work like the process described in this patent for determining quality rankings for pages? Does this mean that those Panda updates may now be triggered by something like a certain level of improvement in a quality signal, like PageRank, that might set off an update for a site?

If so, a site that’s been negatively impacted from an upgrade such as Panda might have to improve in terms of quality signals such as PageRank above a certain threshold to be rated again in a way that might improve its quality rating.

Share

39 thoughts on “How Google May Rank Web Sites Based on Quality Ratings”

  1. hey Bill! This was real easy to follow right up to where you talked about the takeawys and what might trigger a re-classification of a Pandalized site. What confused me was your thinking that the link metric (PageRank) would cause a Panda re-classification. I was thinking things like removing items that were making pages/site ambiguous, moving text up the page, consolidating content (which if URL changes …. pretty much removes Panda and would force re-classification) etc. IE: was thinking more along the lines of onpage rather than off page metrics would trigger a Panda re-classification.

  2. Awesome post Bill, I think people spend too much time looking for link opportunities and not enough time optimizing their sites for better performance and quality.

  3. Hi Terry,

    The PageRank example was from the patent itself, and PageRank is one of many “quality” signals that could change over time, and could possibly be used to trigger a re-evaluation of a rating. Regardless of the fact that a link analysis approach is used to calculate PageRank, that doesn’t mean that PageRank can’t be a “quality” signal by itself, and the patent goes through a lot of effort pointing that out.

    Chances are that it’s a factor in a Panda quality ranking as well. In both cases, the quality of a page/site isn’t going to improve if the kinds of actions you mention aren’t undertaken. A change in PageRank at a certain threshold level seems like a good triggering event because it doesn’t have to involve doing a more involved analysis of the site by the search engine. It’s also not too likely that the pages of a site are going to increase in PageRank if changes like you mention aren’t made. The assumption in the patent seems to be that it’s a lot easier to earn links to higher quality pages than to earn links to lower quality pages. If PageRank increases significantly, other quality changes must have taken place. That’s not an automatic review, but just a triggering event that will bring a new review. :)

    Of course, other signals might be used as well, and the patent threw that one out as an example.

  4. Thanks, Mark.

    Finding ways to attract and acquire links is something that people should keep in mind with their sites, but I think you need to give people a reason to want to even link to your site in the first place. :)

  5. Nice Article Bill. You have summarized all important points in one post, but I disagree with your point regarding page rank, There are many black hat techniques to get higher page rank even we can gain PR7+ easily, PR is important for links buyer and sellers.

  6. Hello Bill,

    I am really thinking (and somehow in doubt)if Google is capable to deliver the valuable information to the reader after I have seen the “quality signs”. Even a junior in the online business knows that most of these signs could be manipulated from the evil point of view (and the online world is not full of angels). The click rate, the newsletter subscription and the pagerank (in a lesser measure) could be editable (in an automated manner).
    There is a risk (waiting for your opinion here) that a precious information should be resting far from the top of the SERP. Did it happen to you to search for something and to find only at the 4th or 5th page, and the top to be full of crap ? Cause it happened to me more than once.

  7. I know G doesn’t use all of their patents but I believe this is one that would be implemented. The ranking factors mentioned make sense from everyone’s point of view.

    The fact that click rate is part of their list really increases the significance of having microdata, strong title and meta description.

    Additionally, I think social media would be a part of this had the patent been filed later. Considering they axed G reader, social data would be the next best venue to obtaining human interaction with a brand/site. Unfortunately, I feel like its just as gameable.

    Always a pleasure to read your patent analyses Bill.

    Cheers,
    Oleg

  8. Hi Bill,

    Great article, you’ve really got to the nitty gritty of the matter. To be perfectly honest the whole notion of the quality score scares me somewhat, as I’ve got a feeling it leaves the door wide open for abuse by a new wave of ‘negative SEO’ companies. Obviously I’m sure Google will have measures to counteract fake ratings but it’s still a worry.

    I think it’s a good idea to restrict it to certain types of sites (eg: blogs) as it’s probably not relevant (or even appropriate) to rank corporate sites by a human quality score. For example if your company produces toilet rolls, you’re hardly likely to have the most inspirational set of content on your site and it would be pretty tricky to get people to engage with your site.

    Keep up the good work!
    Chris

  9. Could the ‘Panda-like’ periodic re-ranking be to do with differentiating between evergreen content and content of a transient/topical nature?

    My worry about

    is that it may dampen originality and the whole web would gradually conform to Google’s model of what they think a website should look like.

  10. Haven’t quite got the hang of the html tags yet, there should have been a quote of “models are generated and modeled ratings are assigned to unrated websites by applying the models to the website signals of the unrated websites” between the 1st and 2nd parts of the above post

  11. Google has been doing this with independent contractors and/or employees they are trained for some time now. This looks to be a method of automating the task that is currently being done by individuals. It probably will be more uniform than the human method of the same process. The bottom line looks to be that Google wants to make sure that their results are showing the best and most relevant content to the end-user. A daunting task at best. I do agree with the comment above that for those serious about ranking on Google, Google’s program will certainly damper originality in an effort to try to please the Google algorithm that’s currently in effect.

  12. Hi Bill – large companies apply for a of lot patents – how, if, when they use them is a different matter. We know what Google has a trend of devaluing all links apart from editorial links so I don’t see they would open the door that much to something that can be easily manipulated. As always though its fun to speculate!

  13. Hi Simon,

    Google’s Panda Update is a quality rating approach. So far, it doesn’t look like Google has opened the door to something that can be easily manipulated with Panda.

  14. Hi Chris,

    I don’t see people abusing and manipulating Google’s Panda update. I would suspect that any input from human quality raters might be such a small part of training a seed set, if used at all, that it’s probably not worried about. Regardless of what a site is about, one of the challenges is to make it interesting and engaging.

  15. Hi Oleg,

    Thanks. Panda is a document classification system that depends upon improving the quality of sites that show up in search results, from what sounds like a quality rating approach. It may or may not have a human rating aspect like this patent describes, but it has shown itself so far to be hard to game.

  16. Hi Alex,

    The patent includes a few “quality” signals that might be used. When Google elaborated on some questions that might be aimed at improving “quality” for pages that might be impacted by Panda, they stated clearly there that they wouldn’t be sharing the “signals” they might use to re-rank results based upon quality signals. They are pretty much unlikely to expose those in a patent like this one in detail, as well. But some of the ones that they do point out, like originality of content about the content, and uniqueness of content on the web, are the kinds of things that would make sense to include. Not sure how easy it would be to manipulate those, except to try to use original and interesting content.

    There’s always a risk that precious information might be far from the top of Search Results. There are a lot of potential reasons for that, which might partially involve a site such as a lack of PageRank, or a focus upon other keywords, or a lack of optimization completely.

  17. Hi Bilal,

    One of the reasons why Google’s been working on adding additional ranking signals, such as knowledge base results, and using quality signals into the mix is so that search results are less prone to get manipulated. Google does have a long history of devaluating sites that are engaged in link buying and selling, and has been very active in the last year in removing PageRank from private blog networks and link directories

  18. I presume that a new slew of “links are dead” articles will soon appear on SEO blogs. This is an interesting patent because it focuses on things inbound marketing has already been focusing on for years.

    I’m off to read the official patent. Wish me luck.

    -Bryant

  19. Hi Bryant

    Thanks.

    Nothing in the patent says that PageRank is going away, or that links will stop working. Also note that the “filed” date on this patent is 2008, so it’s quite possible that Google has incorporated some kind of quality ratings into both blogs and websites at this point. The Panda update is one that looks for quality signals, and that’s been around for a while.

    Many marketers and SEOs have been focusing upon these types of quality signals for as long as there have been search engines, and even before. :)

  20. Hi Bill,

    If in the long run this helps to score down the spammy sites and the sites with little viable information, then i’m all for it.

    Thanks for the great post.

    Best Regards
    George

  21. It may be worth noting that Google’s Panda algorithm incorporated new, previously unused signals so while this patent might suggest some of the signals they had available to them you cannot use it as a guide to reverse-engineer Panda downgrades.

    Also, Blogsearch was at one time a separate search index (or algorithm set). I’m not sure how separate it is now. They could have borrowed ideas from team to team (and probably did on many occasions) without exactly replicating each team’s specific implementations.

  22. I think that link bulding is a task as important as creating quality content. Remember that the popularity and reputation of a blog is based on the links you get from other blogs. Great post, Bill.

  23. How interesting that grammar and spelling would factor in ratings! I agree that these 2 things certainly add to the quality of a site. When I see blatant mistakes like these on a page, it always makes me question the site owner’s authority – for example, should I trust real estate advice from someone who describes a “seperate dinning room” in a house? If you’re terrible at grammar and spelling, get someone to read over your work before you post it on your website!

  24. Be interesting to see whether Google can factor in all authority reviews and rating websites from all across the web into determining a website’s overall quality score.

  25. Great article Bill! I think Pagerank will still be at play but then again Google has way more than factors today applied in their algos, aiming to game the search engine will prove to be futile…It will still be possible but it will be very hard to the point that it’s not worth anymore, at least if a site is looking towards positioning itself as a leader in its industry.

  26. And still no-one is talking about the best metric to measure the integrity of a site/blog: The absence of advertising. Would I trust an ‘advertising-free’ blog more than one that offers ‘promoted posts’? I sure would. But how do I actually know?

    All of private blogs don’t have advertising on them. I don’t do advertising. Mainly because I don’t need to make money from blogging [and I also can’t be bothered]. No Adwords, no stupid blinking banners, no sneaky in-content links.. it’s a question of integrity and probably it’s also me saying “You know what? You CAN’T buy everything. Put that in your pipe”. It’s also a logical response to the frequent paid link requests that I get, the great majority of them very arrogant and impersonal. Why would I do business with you if you make me feel like your possible servant right from the start?

    But I do have some do-follow text links in my side bar that point to friend’s websites. Now, the problem is that no algorithm in the world could tell if these links are paid for or given by merit. You just can’t tell. Now, do I need to no-follow these links to make it crystal clear to Google that I am ‘clean’? No way. Why would I do that? It’s my site and I give links by merit.

    I noticed that during the last PageRank update my blog went from PR2 to PR3. Did I acquire lots of new valuable links during the previous 6 months? No I didn’t. Plus Google UK dropped my blog for about 3 month down to #48 for my exact-match brand keyphrase. When it finally returned to the top of the UK SERP for my brand name it had a PR of 3.

    But ‘clever’ Google also started ranking a site that owns the .com version of my brand name right behind me. It’s a site that sells fake handbags and has all possible spam links pointing to it. As long as I see that site ranking directly behind me, I know that Google still can’t [or won’t] drop low quality EMDs.

    Google must be desperate to get away from links as main quality and authority signals because nothing is easier to game than a link-based algorithm. Always has been, always will be. It just got a bit harder and more expensive.

    My guess: Google is preparing to collect the majority of its ranking and quality signals from Google+, Blogger, Google Analytics, Google Friend Connect, Google Glass. Not now, but maybe in 3-5 years. They are in it for the long run. This is just the beginning.

  27. I am pretty sure Google Analytics and factors like Bounce rate, previous clicks etc are already being used to personalize the results already but using those + other factors mentioned in article will make a lot of sense going forward.

    So called SEOs need to up their game!

  28. Great article,Bill. What I see is that Google has dramatically changed the way they rank pages. What worked in the past does not work now. I agree with Mark when he says: “I think people spend too much time looking for link opportunities and not enough time optimizing their sites for better performance and quality.”. I could add “don’t be over-aggressive with link building.”

  29. Even 10 years ago, the basic principle was the same – everything need to be perfect from planning to implementation. Sure, nowadays there are many more factors that matters, but again same evergreen strategies apply and mostly it is always related to proper business model and understanding the audience.

  30. Wow, this is another confirmation on a hunch a friend and I have on what the latest Penguin/Panda updates are doing. I think many SEOs haven’t figured out how important the overall content quality score of your site is. Google has been hinting at it but hasn’t been 100% straight forward but of course they don’t want to give away the secret sauce.

  31. Most of the things do remain the same but the aspect of having a good layout and user friendly site has become the key things in my eyes. So apart from posting good quality content, we need to focus on creating a good lay out for our blogs as well.

  32. Great article, once more on the money Bill.I like the idea of the quality as it should mean a level playing field for all high quality sites. Ratings I am not so sure about, but content rich sites which exude quality, have correct grammar and actually inform and engage can only be a good thing.
    We might see the end of non-validationg, spammy sites and this can only be a good thing in my opinion.
    I would rank all sites this way, as long as there are no anomolies (as this normally happens in Google-Land, in my experience).

    Keep up the good work!
    Chris

  33. Obviously, quality is what people want in their search results. The problem is that quality in many websites is as subjective as quality in a painting. Two people can look at the same “work of art” and one may love it the other may hate it. To some degree, website quality is in the eye of the beholder.

  34. Hi Bill,

    Something that people should consider as well, before this, people were focused mostly on building links and not so much on the content end. With this, now if people wanna be seen on Google they have to work their butts off to be seen, quality is important in my point of view. I see it as a challenge to better my own content, great post by the way!

  35. “Hi Bill!

    I really didn’t know that when human raters look at pages, they also perform some other actions other than rating pages. It is so nice to know that one of those actions is to skip over some sites when URLs on the site show objectionable content such as spam or pornography.”

  36. Thanks for a great post Bill, I agree with others comments on how much Google has dramatically changed the way in which they rank pages, I’m seeing a trend of more brand/authorship inbound links being more powerful, be interesting to see how things play out in the SERPS over the next year or two.

Comments are closed.