How Google May Calculate Site Quality Scores (from Navneet Panda)

This post is about a Google patent from a well-known Google Engineer, that describes ranking search query results at an internet search engine, such as Google.

DSC_0081

Google aims at identifying resources, of different types such as web pages, images, text documents, multimedia content, that may be relevant to a searchers situational and information needs and does so in a manner that they hope is as useful as possible to a searcher. They do this while responding to queries submitted by searchers.

One of the inventors from this patent carries one of the most well-known names at Google, the surname, “Panda.” which became well-known because of a Google update that was named after him in February 2011.

A focus of that update was upon improving the quality of sites that it targeted, and Navneet Panda specializes in site quality at Google. When I saw that a patent was granted this week that listed his name as an inventor, I looked forward to reading it, and seeing how it might attempt to define site quality, and how that definition might be used to rank search query results.

This recently granted patent did provide a way to measure the quality of a web site, and that measure could influence how well or poorly a site might rank in search results for a particular query.

The patent tells us explicitly what features it was looking for in a site that might seem to indicate that the site was a quality site. It tells us:

The score is determined from quantities indicating user actions of seeking out and preferring particular sites and the resources found in particular sites. A site quality score for a particular site can be determined by computing a ratio of a numerator that represents user interest in the site as reflected in user queries directed to the site and a denominator that represents user interest in the resources found in the site as responses to queries of all kinds The site quality score for a site can be used as a signal to rank resources, or to rank search results that identify resources, that are found in one site relative to resources found in another site.

That patent is:

Site quality score
Inventors: April R. Lehman and Navneet Panda
Assigned to Google
US Patent 9,031,929
Granted May 12, 2015
Filed: June 27, 2012

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining a first count of unique queries, received by a search engine, that are categorized as referring to a particular site; determining a second count of unique queries, received by the search engine, that are associated with the particular site, wherein a query is associated with the particular site when the query is followed by a user selection of a search result that (a) was presented, by the search engine, in response to the query and (b) identifies a resource in the particular site; and determining, based on the first and second counts, a site quality score for the particular site.

Site Queries

This patent is about a search system that includes a site scoring engine that generates site quality scores for sites.

The site quality score looks at queries submitted to the system by users. One type of query that it seems to favor is a query that includes a reference to a particular site. These queries are referred to in the patent as “site queries.”

Queries can categorized as ones that refer to a particular site in a number of ways. It can be categorized as referring to a particular site if it includes a site label that identifies the particular site.

(1) One site label identifying a particular site can be specified using an operator, e.g., a “site:” operator, followed by a name, e.g., a domain name, for the particular site.

Queries that refer to a particular site can be used to request resources that are in the particular site. For example, a query “named entities site:www.seobythesea.com” can be used to request resources responsive to the query “named entities” that are in the site www.seobythesea.com.

(2) A query can also be categorized as referring to a particular site if it includes a term that has been determined to be a term that refers to the particular site.

For example, if the search system has data indicating that the terms “example sf” and “esf” are commonly used by users to refer to a site “sf.example.com,” queries that contain the terms “example sf” or “esf”, e.g., the queries “example sf news” and “esf restaurant reviews,” can be counted as queries that refer to the site “sf.example.com.”

(3) a query can be categorized as referring to a particular site when the query has been determined to be a navigational query to the particular site. To site users, a navigational query is a query submitted aiming at getting to a single, particular web site or web page of a particular entity. For example, I usually type the four letters ESPN into the search box to visit the ESPN website. The patent tells us that a search system may determine that a query is a navigational query to a particular site “when a search result linked to the particular site has received at least a threshold percentage of the user selections that were received for all search results that are responsive to the query.”

Determining a site quality score

These are the steps outlined in the patent to determine a site quality score.

1. Unique queries that are categorized as referring to a particular site are counted: Such as all queries received by the system in the preceding day, two days, week or month, for example, or over all query data available to the system.

2. In some approaches, all queries containing the same query terms, regardless of their order, are counted as a single unique query. So, multiple queries of “san francisco site:example.com” and “francisco san site:example.com” are counted as one unique query.

3. In some other approaches, the patent tells us, order does matters. Multiple queries for “san francisco site:example.com” would be counted as one unique query while multiple queries for “francisco san site:example.com” are counted as a different unique query. They may ignore the placement of the site label within the query itself.

4. Additionally, the system may use user information or user device information when counting these unique queries. The same query might be counted as two unique queries if it could be identified as having been submitted by two different people. In this context, whether a query is the same query can be determined with or without regard to the order of the query terms, as described above. Different users can be identified based upon users being logged into a user accounts, different Internet protocol addresses associated with the user device being used by user, or by using information provided by the user device, such as an Internet cookie.

5. Different from queries that refer to a site are ones that are just associated with that site. The system may associate a particular query with a particular site when a search result that was presented in response to the particular query identifies a resource in the particular site, where the search result has received a user selection. For instance, a particular query “example restaurant reviews” can be associated with a particular site “example.com” when a user selects a search result that was presented in response to the particular query, where the search result identifies a resource in the particular site, e.g., “http://example.com/resource”.

6. The system determines a site quality score for the particular site, and might be determined by computing a ratio of a numerator and a denominator, where the numerator is based on the count of unique queries that are categorized as ones that refer to the particular site, and where the denominator is based on the count of unique queries that are just associated with the particular site, just don’t refer to it in the same kind of way.

Calculating Site Quality Scores

The patent provides some examples of other ways it can calculate that site quality score using those two different counts.

It may also treat a site that is a “collection of resources” as a site, under this site quality Score Approach. These collections can include multiple domains that exist on the same domain, or a site that is broken down into subdomains or subdirectories.

.

Summary
Article Name
How Google May Calculate Site Quality Scores (from Navneet Panda)
Description
A Google Ranking Signal based upon a quality score that looks for queries that refer to the site.
Author

58 thoughts on “How Google May Calculate Site Quality Scores (from Navneet Panda)”

  1. Have i just digested that a part of this patent (and your translation of it god bless you for that!) is to draw and provide signals through the sitelinks search box & “more results from” beneath sitelinks?

    So the more times a resource is sought using either trigger could be a signal as to the importance of that resource?

    The reason I ask is mere mortals do not use a site: search operator but the sitelinks search box & “more results from” beneath sitelinks are the only navigational ways outside of advanced searchers.

  2. Great Article as usual. Here few things, I have understood.
    Please correct if I am wrong.
    1.Topical Categorization and thematic expansion of overall website segment is necessary. i.e. Category can be fat head competitive term and post/articles shall have all useful information with proper interlinking useful for users.
    2.Low Bounce Rate, Engagement, Avg Time On Site, More than 1 page visits etc. are metrics we shall pay attention to. (Search Engines are considering it directly from analytics? i.e. Data Hub Activities)
    3.Site Search Feature:Analyse what people are searching on website. Think how we can offer best content as per frequently entered site searches.
    4. Branded searches are important. This shows Authority and Trust. (Ref:David Amerland)

    Please correct if I am wrong.
    SAMEER

  3. Bill, this is a fascinating read. I’ve read it three times now and I still feel like I have a lot to grasp here.

    I’m having trouble understanding how “unique queries referring to a particular site” are determined. For most sites, likely no one other than the site owner and their SEO has ever done a site: query. And, unless you are a recognized brand, many sites are not ever searched for by brand name.

    For sites like this, do you know how would Google determine the “queries associated with the site?”

  4. Hi Dean,

    So, the patent seems to place a lot of value on queries that somehow mention and refer to the site when they ask for information.

    I suspect that most searchers don’t use a “site” operator when they perform searches, but I hestitate when putting words into the mouths of the inventors of a patent like this. If they had meant to target the sitelinks search box queries or that “more results” link, they should have specifically mentioned those. I think it’s more likely that people searching for content from a specific site might include some kind of label in their query, like if I wanted more information about the football draft, and I wanted it specifically from ESPN, and I searched for [football draft ESPN], that’s one kind of query I expect the inventors of the patent were anticipating.

  5. Hi Sameer,

    The patent doesn’t really state all of those things. It keeps things much simpler, looking at the queries that people perform when it looks like they are trying to target answers or responses from that website specifically. I guess the assumption we could make of Panda, the patent writer is that someone who purposefully crafts a query to try to target a site, via a site operator search or adding a well-known term the site is known for to their query, or performing a navigational search, has found quality in that site and wants a response from it specifically. The patent doesn’t mention the word “Brand” and so I’m not going to either, and I think it would be a mistake to do so.

  6. Hm, this mechanism would probably work – as long as it is not known to anyone that this mechanism is used to rank pages. But otherwise … it would be rather easy to cheat.

  7. Hi Marie, In the patent they talk about a number of queries including a “label” indicating a desire to see results from a particular site. The most obvious, and least likely that people will perform is a search with a “site:” operator, but some sites do have words that tend to be tied to them. If I search for [football scores ESPN] I likely want results from the ESPN website. The patent also mentions “navigational query results” as being another indicator that a searcher wants to see a result from a particular site, and it defines those as:

    From the user point of view, a navigational query is a query that is submitted in order to get to a single, particular web site or web page of a particular entity. For a search engine, this is a matter of inference. For example, a search system can determine that a query is a navigational query to a particular site when a search result linked to the particular site has received at least a threshold percentage of the user selections that were received for all search results that are responsive to the query.

    So Google may be looking for queries that people tend to click upon results from a particular site for, when it comes to these navigational queries.

    And the Site Quality Score isn’t about the quality of the content on the pages of the site, but rather about how people tend to search for the site, and what then tend to click upon after performing those queries.

  8. Thanks Bill it makes sense now, basically 1 step back using the “label” as the trigger to understand the query, as opposed to using a brand query and then using the sitelinks search or “more results from”.

  9. Good find, as usual, Bill. I think it’s interesting that you have covered at least one other patent that refers to “site quality score” in a past article, but this patent is only just now coming up. I cannot yet find it in the Google Patents archive. It may be worthwhile to revisit the concept of “site quality score” to compare methods for computing such a score, with no need to infer any possible connections but rather to look at how Navneet Panda is developing algorithms based on ratios in data.

  10. Hi Dean,

    Right, or the label may be something intended to trigger a navigational result, or be a term that the site is well known for. When I’m trying to find a patent that I’ve written about in the past, I’ll often search for [Patent topic seobythesea) and including my domain name in the query inspires Google to return results from my site. That seems to be the kind of label that the patent is talking about.

  11. Hi Mattias,

    I’m not sure that cheating is necessarily, if the point of the metric is to build pages that people strive to search for.

  12. Thanks for the reply Bill. The way you have described this now makes it sound to me like this patent could be used to determine how authoritative a site is, or in other words, how likely is it that people view this site as a brand.

    For example, let’s say I am searching for “buy bird feeders” and I click on a site called something like, “best-cheap-bird-feeders.com”. I immediately see that the site is not high quality and I don’t engage with it. Next, I click on urbannaturestore.ca and I navigate throughout the site and make a purchase.

    This type of activity could tell Google that Urban Nature Store is more likely to be a high quality site for people who are searching for buying bird feeders.

    Or maybe that’s too much of a simplification?

  13. Hi Marie,

    Thank you. That does seem like a good example. I hadn’t ever been to urbannaturestore.ca but it’s definitely a store that focuses upon birds and offers a lot of bird feeders. I could see people purposefully targeting queries at urbannaturestore to see more from their pages, and this patent is aimed at trying to learn things like that from the searching behavior of individuals. I hesitate at using the word “brand” because I’d hate to build the association that Brand = Authority, which I’m not sure would be doing that patent justice.

  14. Thanks for sharing your findings and interpretation of the patent. It’s content like this that keeps your blog at the top of my must reads for the day. Keep up the great work Bill!

  15. Good post!
    I still doubt in influence of CTR on the rankings of specific keywords, but as you said every modification in query does matter. Bounce ratio, CTR, time spent on website etc. won’t matter for Google for “static phrases” as it would affect small companies, which exist for specific clients. Maybe it matters a little bit, but that’s what I think. Specific queries – “long tail” are more useful and friendly, as it really depends on group of people and what analytics indicators were. Pretty simplified summary, but that’s how I see it.

  16. I hope Google continues to parse out EMD’s as not being a ‘brand’ or ‘more important’ to satisfy a search query, e.g.: seo services anytown —> and not preferring a domain name “www.seoservicesanytown(dot)com”

  17. Hi Simon,

    Click throughs are one of the easiest things for Google to track, in response to a query. I suspect it plays a fairly large role at Google.

  18. Hi Harekrishna Patel,

    I think focusing upon specific queries can be a really good idea; choosing the right ones for your site, crafting persuasive titles and meta descriptions that might be used as snippets, adding semantic markup that might lead to rich snippets, and so on is probably a good idea.

  19. I think quality score is a great way for content marketers to keep check of the sort of content they are building and curating.

    If another Panda Style update does come about, it will no doubt clean up all the over optimised, thinly built sites.

    It is good to see he is somewhat ‘acknowledged’ for his work, even if not 100% transparent.

    Another Great Article and well found patent Bill

  20. Thank you for your attention to detail and great writing style. Your professionalism shows in your article. I like your interesting views and appreciate your unique ideas. This is quality.

  21. Hi Bill, its nice and really helpful for new bloggers like me. Now I can easily figure out that what i have to look after for good ranking and it’s more depend on search query i think. so thanks once again

  22. Great post Bill, thanks for sharing quality score calculation. I personally feel that we need to more focus on search queries.

  23. Pretty nice post. I simply stumbled upon your blog and wanted to say that I have really
    loved browsing your blog posts. After all I’ll be subscribing on your rss feed and I hope
    you write again soon!

  24. Hi Sunny,

    Paying attention to search queries is likely a good idea, especially if Google is paying attention to the ones that may be used to find the pages of your site.

  25. Such a great info. Google actually making itself more user friendly with all its algorithm. surprised to know that there are lots of technical aspects present to rank in google search.

  26. Great article. I have to say that, have not enough time to think about Google’s algorithm, but assume that it is important subject.

  27. Quite informative Bill. It might be useful as a blogger like me. Google is trying to offer us good search results. However, It makes me wonder sometime, with all these algorithms, there has to be some slip ups with these systems. Through this article, it helps me more understand how Google works. I’ll be looking forward to your next post. Thanks!

  28. Hey Bill… greetings from your old stomping grounds.

    *I’m* going to use the word ‘brand’ 🙂

    Not in the traditional sense, rather as an entity of trust, recognition and topical expertise.

    I think in this instance, the signals from query types, content and CTR (along with dwell, query type, query modification, query chains etc.), help Google identify trusted authorities which many equate to the attributes of a brand.

    Clarifying that point from a chicken – cart – chicken perspective, I think that Google is both reinforcing existing brands (known topic experts) and creating new ‘brands’ from a search perspective, building ‘brands’ within their database (i.e. trusted entities around specific topics) through user query and interaction. Wrap this is an better understanding of user intent / context, and it’s a very powerful set of data points for Google to leverage to assess quality as it relates to specific queries to site content (and sites as a whole).

    Exciting stuff and, as always, great coverage.

    Cheers

    Grant

  29. Hi Grant,

    Good to see you!

    It does seem like this patent focuses upon things that people might associate with signals that brands send, and referring to those as “trusted authorities” is probably a better choice of terms. It does show the kind of things that Google believes to be indications of quality; and that’s a good thing to know as you build web pages that you might want to be perceived as trusted authorities by Google.

  30. Hi Bill, I always follow your valuable insight articles. Today, I saw a new google algorithm update which is not related to panda, Google’s John Mueller.

    A lot of sites start getting affected, any new reviews from you? Cheers!

  31. Bill,

    Thanks for the read! I was actually thinking right in line with Marie Haynes. After Penguin hit, it was clear that quantity of backlinks was no longer an intelligent SEO strategy. People in the industry quickly realized that Google appeared to give preference to larger brands, which led me to think, “What makes Google determine a website is a large brand, if not the quantity of links?” One likely metric was the quality of links, but I was looking for a more definitive answer.

    I was heavily engaged in manual action recovery at the time due to the practices of the agency I had worked for, so I had the fortune of reviewing hundreds of domains afflicted by either Penguin or a manual action. One thing they all had in common was little to no mentions of brand name within their anchor text profile. I specifically remember reviewing backlink profiles for numerous high ranking domains at the time, and I saw there definitely appeared to be a correlation between “Penguin-winners” vs. “Penguin-losers”: brand mentions in anchor text.

    I don’t have any of the data anymore, but I do recall seeing that more often than not, high ranking domains had a larger amount of naked URL’s, brand names or compound keywords (i.e. brand + target keyword) within their sites top 10 used anchor text variations. Essentially, I believe the focus of brand name within anchor text was one of the key indicator’s for Google (at least at the time) of a brand’s authority.

    If my hypothesis was on point, I don’t think it’s too far-fetched to believe that the same logic for determining link quality could also be applied to determining site quality. But rather gauging the quantity and # of variations of brand related anchor text, they’re measuring the quantity and # of variations of search queries associated with a site to determine it’s potential quality.

    I do recall reading recently that Gary Illyes confirmed that user experience is only a ranking signal for mobile as of yet; sounds like we might not be too far away from the same being applied to desktop?

  32. Hi Matt,

    There’s nothing quite like rolling up your sleeves and experiencing those things first hand, to get an idea of what is going on. That is an interesting hypothesis, and seems to fit with the idea behind this patent, that when people search for a site, if their query looks like it is targeted and finding information from that site itself, it’s a sign of high quality. A link with a brand name in it is different, but seems to be set along the same lines.

  33. So, search queries are one of the bigger aspect for google to find out quality score. How about the time spent on the webpage by the user? This might also be counted to check quality score of the website. Anyway its good to find out how google site quality score works. Thank you for sharing this article.

  34. Hi Sulabh,

    That may be another way for Google to measure the quality of pages, but wasn’t described in the patent I wrote about as being one.

  35. This is really awesome. Great insights. Though i am still not able to figure out how to make it actionable for a layman. Thanks for the great information.

  36. Hi Dan,

    Thank you. I’m not convinced that there is an actionable for a layperson take-away, except it got me thinking about the computer systems already built into cars a little more, and the sensors already built into those.

  37. Yo Bill thanx for a great discussion on the issue of site quality scores. really appreciated and truly enjoyed how you framed these issues.

    Shared again through my G+ page…

  38. Hi Frank,

    Thanks. There seems to be a lot of talk from Google about things like quality Scores, so when a patent says it involves the topic, I tend to pay attention, to see what I can learn from it.

  39. Thanks, Bill, For sharing Good quality score calculation. I think that we want to focus on search queries.

  40. Thanks for sharing. I haven’t put much thought into how Google may be judging the quality and authority of websites based on queries specifically directed at the site.

    My two takeaways are that you want a memorable brand and some really solid pieces of content people will want to come back to again and again. This way they perform a search to reference it again and you become more of an authority on the topic since people are actively seeking out YOUR content over others. I imagine visibility for similar keywords without specifying the “brand” or site will rise with these searches.

    These ideas don’t seem ground breaking but its definitely something I will take into consideration now.

  41. Very interesting article, with QS being such an important factor always great to hear new takes.

Comments are closed.