Google’s Quality Score Patent: The Birth of Panda?

In 2005, Google’s John Lamping gave a presentation to a class at Berkeley on the Quality of Information, titled On the internet, nobody knows you’re a dog (pdf). In his talk he raised questions such as:

  • Why is the Daily Californian advertising German pages?
  • How much can the spam industry make by spamming search engines?

One of John Lamping's slides from his Quality of Information presentation showing two different paragraphs where madlib style keyword insertion has been performed on the content.

During his speech, he pointed out ways that people have attempted to manipulate search results, such as mad libs-like insertions of keywords into templates for pages like in his slide above, cloaking and other spam approaches to optimizing pages, and paid links and comment spamming. In addition to talking to academic audiences about search quality, he has been working on doing something to increase the quality of search results.

I wrote about one patent John Lamping co-invented in a post titled How Searchers’ Queries Might Influence Customized Google Search Results. My post describes a 2003 patent exploring how to improve search rankings by looking at results for related queries. In 2005, he was a co-inventor with Mark Pearson on a Google patent granted today, presenting a way to create and use quality signals for documents and sites based upon results from search queries.

Returning the “best” search results

Most patents include a section describing problems they intend to address; in this new patent we’re told the intent is to return the best results for a search based upon a measure of the quality of documents by adding an additional quality score for at least some queries.

When the patent was written in 2005, the quality of pages returned for a particular query was measured with an Information Retrieval (IR) score calculated by how relevant a document might be to a query, and a score based upon links pointing to pages.

That IR score might be created by looking at matches between queries and the words on a web page. Matching words in a query and a page’s title might score higher than a match between the query and words in the page footer. If matching text is found in fonts that are larger or bolded or italicized, that text might count more than words in normal text.

A page that includes all of the terms in a query might also have a higher IR score than a page that only includes one or some of the terms.

These and other similar types of signals might be combined to create an IR score for a page to determining the “quality” of results in pages for a search.

In addition to using an IR score for pages, a search engine might look at the link structure between pages to rank documents, though we are told that there are times when that link structure might be “unavailable, unreliable, or limited in scope,” which would limit its use and value.

One solution to the limitations of a score based upon Information Retrieval (IR) and Link Analysis is for the search engine to analyze other “associations” between queries and pages found in search results for those queries, to create a “quality score” for those pages. That quality score might be created by looking at:

  • The different queries a page might be found for,
  • What anchor text is pointed at that page,
  • How prominent the text in a query might be on that page, and
  • How frequently people select certain pages in response to particular queries.

Does it seem like when someone enters a query into the search engine that they are specifically asking for, or requesting a page that they already have in mind? Something we often refer to as a navigational query these days? If so, that might be a signal of the quality of that page. For example, if I search for [ESPN], chances are that I’m looking for the ESPN home page. My search for [ESPN] and my selection of the ESPN home page might be considered a quality signal by Google.

When you compare those query terms with anchor text in links pointing to a page from the search results for that query, is the text in those links often similar or the same as the query terms? Does that page tend to have more links to it using those words than other pages in the same search results? Again, that’s something that could be seen as a signal of quality for that page. If you google [ESPN], do one or two pages tend to have more links in them that include “ESPN” than the other pages in the search results. Again, that’s a positive indication of “quality” for those pages.

If multiple searchers use a certain query or a similar one, and tend to select a certain page, that’s another signal that can raise the quality score of that page. If most people searching for [ESPN] tend to select the ESPN home page, that’s another quality signal that Google might track.

The Google patent on quality scores granted today is:

Deriving and using document and site quality signals from search query streams
Invented by John Lamping and Mark Pearson
Assigned to Google
US Patent 7,962,462
Granted June 14, 2011
Filed May 31, 2005

Abstract

A system analyzes one or more search streams to detect one or more associations between a document and one or more queries in the one or more search streams. The system further derives a value for the document based on the detected associations and uses the derived value in evaluating a quality of the document with respect to one or more subsequent queries.

Quality Scores and the Panda Updates

On February 24th of this year, Google’s Matt Cutts and Amit Singhal co-published a blog post at the Official Google Blog titled Finding more high-quality sites in search, which described a significant change in the way that Google ranks pages in search results, which would impact almost 12% of all search queries. We were told that the new approach would reduce rankings for lower quality pages, and boost rankings for higher quality pages.

The post provided some hints as to what Google considered high quality and low quality pages, and it was followed up by more statements from Cutts and Singhal, including a joint interview with the two on March 3, 2011, TED 2011: The ‘Panda’ That Hates Farms: A Q&A With Google’s Top Search Engineers where we learned more about the update, including the fact that it was named after a Google engineer named Panda.

I read the interview and when hunting for more information about that engineer, hoping to find something that he might have written that might help provide more information, and later that day wrote Searching Google for Big Panda and Finding Decision Trees.

It appears that I might have found the right engineer when I ran across Biswanath Panda, who was involved with research on how to efficiently and effectively use a certain kind of machine learning approach on very large data sets, like Google’s web index, to compare and classify pages based upon certain features about those pages against a known set of pages to determine the quality of those pages.

In the TED 2011 interview linked in the paragraph above, Matt Cutts tells us:

And we actually came up with a classifier to say, okay, IRS or Wikipedia or New York Times is over on this side, and the low-quality sites are over on this side. And you can really see mathematical reasons…

We were also told that the features considered as potential signals of quality were based upon a series of questions about pages such as whether or not you would trust a site with your credit card information. Amit Singhal published another post about the update on May 6th that included a number of the kinds of questions that inspired the update.

Here are the first five of the 23 listed:

  • Would you trust the information presented in this article?
  • Is this article written by an expert or enthusiast who knows the topic well, or is it more shallow in nature?
  • Does the site have duplicate, overlapping, or redundant articles on the same or similar topics with slightly different keyword variations?
  • Would you be comfortable giving your credit card information to this site?
  • Does this article have spelling, stylistic, or factual errors?

The questions cover a wide range of topics, from trust and credibility, to the depth of content, to problems involving site structures and grammar and spelling. It didn’t provide detailed descriptions of the kinds of features that might be used to determine the quality of pages and sites.

The approach described in Biswanath Panda’s paper was tested in Google’s sponsored search to see if features found in advertisements and queries and landing pages could predict bounce rates from the landing pages those advertisements pointed to. That test is described in the paper, Predicting Bounce Rates in Sponsored Search Advertisements.

It’s quite possible that approach could also be used to classify features on pages and sites to provide quality scores for them which might boost or lower their rankings in search results.

The process of assigning quality scores to pages and sites is something Google has been exploring for a while before the Panda updates, as we can see in this recently granted Google patent on document and site quality signals from search query streams.

Panda may look at other features to determine quality scores for different kinds of queries, but the idea of defining “quality” with a score to add to the IR and Link Analysis scores in ranking pages may have gotten its start with this patent.

Looking at little deeper at the signals from the quality score patent, there’s one main question that it seems focused upon.

Is a query asking for a specific page?

The patent tells us that it might assign points to a page if a specific search query is deemed to “ask” for one or more specific pages. This sounds somewhat like how Google might respond to a query that they believe is navigational in nature, and where a specific site is likely to be an authoritative page for a particular query.

Another Google patent that describes other ways that Google might identify a page or site that might be “authoritative” for a particular query is Propagating useful information among related web pages, such as web pages of a website, which I wrote about in a 2007 post titled Google Determining Search Authority Pages and Propagating Authority to Related Pages.

That Authority Pages patent looks at a number of features, both on-page and off, and even some offline evidence to determine if a particular page or site might be authoritative for a particular query. Interestingly, John Lamping was also one of the co-inventors listed on that patent.

A search query might be said to ask for one or more pages if:

  • The pages have similar text to that in the query in places like the page title, in prominent text of the page, or in the URL of the page.
  • If more links, possibly a majority, found on other pages of the Web that have similar text to the query point to that page or pages.
  • If people using the same query or a very related one tend to pick the page or pages from search results

Points may be assigned to that page or pages by the search engine when it’s found that the search query is “asking” for those pages.

For example, assume two queries are somewhat popular – [London Hotels] and [Ritz Carlton]. Most links using the text “Ritz Carlton” likely point to an official Ritz Carlton page on the Web, so it receives points for the query when people search for it. On a search for “London Hotels,” links using that text tend to point to a wide range of diverse sites. So, there probably aren’t any pages that gain points on a search for “London Hotels.”

The patent tells us that it might look for a majority of links pointing to a particular page in that situation, or it might consider a certain threshold, such as 20 links pointing to a page, as sufficient for a page to be given points towards a quality score.

Another way to accumulate quality points, as an indication that a particular query might be “asking” for a particular page relies upon other people using that query selecting the same page. The amount of previous searchers selecting the same page might be a specified minimum number, a specified percentage, or possibly a preponderance of searchers.

Those points might then be used as a quality signal for each respective page for subsequent searches performed at the search engine, regardless of whether or not those followup searches include the same query terms. There are a few different ways that the points from this quality score might be implemented, as described in the patent, including possibly influencing the PageRank of pages:

The assigned points may be used, for example, in any type of subsequently executed document scoring/ranking algorithm. In one implementation, the assigned points may be used as an input in a subsequent PageRank computation.

In another implementation, a combination of the points assigned consistent with aspects of the invention and results of a PageRank computation may be used to derive a quality signal. This combination may either be mathematical (e.g., an average) or otherwise (e.g., using different signals at different places or times in a ranking/scoring algorithm).

The points assigned, consistent with aspects of the invention, may generally be used as a signal of document quality, and can be used in many different ways in any scoring/ranking algorithm, or for deriving other quality signals that are used in a ranking/scoring algorithm.

Losing or Limiting Quality Points

Under the approach in this patent, a page might accrue points towards a quality score when the search engine believes that the query “asked” for a particular page as I described above.

There are some limiting and even negative factors involved in this method of accruing points.

For some queries, certain pre-designated search queries might result in no assignment of points to the one or more pages found in the search results. We’re not given an example or explanation of what those particular queries might be.

If it appears that searchers are searching for a particular query “solely to attempt to amass points for a specific document,” then the search engine might subtract points assigned to a page. We’re also not told exactly what that might mean, but it does sound like hiring a number of people to search for a particular query and having them click on a certain result might not be welcomed.

Funny, but I’m reminded a little of Google’s Bing Sting from earlier this year.

A page might only be able to accrue a certain number of points for a particular query regardless of who is doing the searching. The number of points from a single user to a particular document, or to a range of queries for different documents might be limited as well. The number of points from the same Internet Protocol (IP) address might be limited to a certain number per day or per week, too.

Site Wide Implications

While the patent describes how pages might be assigned quality points based upon whether or not a query appears to be “asking” for that page, the conclusion to the description of the patent tells us that these quality points might be either “additionally or alternatively” assigned to the site that hosts the documents.

A site, under the patent, is broadly defined as documents that are “under common control,” such as pages:

  • Associated with an organization
  • A particular domain name
  • A particular host name
  • Created by the same person or group of persons

More broadly, a site might be considered to include:

  • A group of documents about a topic
  • A group of documents in a particular language
  • A group of documents hosted in a particular country
  • A group of documents written in a particular writing style

A point assigned to a particular page may also be considered a “vote” for the site associated with that page. Points assigned at different levels, both the document level and different site levels, may be combined in some manner, such as when scoring and ranking pages.

Conclusion

The process described in this newly granted Google patent appears to be best suited to identify navigational search results – pages that searchers already know about and seem to want to find when they issue their query.

One of the ongoing mysteries surrounding the Panda updates, in the initial announcement about them, was that the update “noticeably impacts 11.8% of our queries.” In 2006, I wrote a post at Search Engine Land titled Why Do People Google Google? Understanding User Data to Measure Searcher Intent, about a presentation from Google research scientist Dan Russell, who provided a breakdown of the different types of queries that Google received by whether they were navigationa, informational, or transactional.

At the time navigational queries amounted to around 15 % of the searches they received, transactional about 22%, and informational accounted for the final 63%. I haven’t seen an update to those percentages since them, and there are other ways to classify queries, but it’s possible that people are performing less navigational queries and more of the other types. Are “navigational” queries the types that we were told would be “noticeably impacted”?

What’s really interesting about the process described in the patent is that it sets up a framework for assigning quality points to particular pages and sites to be used to determine a quality score for those pages, and those quality scores might potentially influence search results for unrelated queries that might not be navigational in nature.

It doesn’t cover the wide range of features that might be assessed under the Panda upgrades based upon the questions Google presented to webmasters about how to improve the quality of their pages. The Berkeley presentation from John Lamping, around the time that this patent was filed, does tell us that many of those quality features were a concern to Google’s search engineers at the time.

Did the “quality scores” in this patent lead to the Panda updates? Maybe.

Share

56 thoughts on “Google’s Quality Score Patent: The Birth of Panda?”

  1. Thanks for the heads up about quality score patent and really a worth-to-read article.

  2. Superb article as usual, Bill. However, I think the 11.8% (and subsequent 2% and undisclose percent) queries encompass a variety of search types. Navigational queries might be in the broad spectrum of Panda-affected queries but I have seen many of the queries for several Panda-affected sites and they are NOT navigational.

    At least, they are not SITE navigational queries.

    You could characterize many of these queries as something akin to “Knowledge Navigational” queries, maybe. I don’t think Panda is about knowledge queries, though. I just think that the signals Panda evaluates are probably most drastically altered by aggressive search optimization in knowledge-seeking queries.

    The queries themselves create a pattern that is very consistent in the data to which I have been given access. I can’t say this is necessarily a universal effect.

  3. @Bill and Michael

    It is almost like the Panda update counter shifted the importance of SEO from what has seemingly been mostly off-site back to mostly on-site…the way it used to be.

    Personally, I like that better. I feel that those who write for people instead of simply generating “spider food” will benefit from this.

    Additionally, the “grapevine” has been telling me that a lot of lean sites have been taking it on the chin since the “Panda” update.

    Google really seems to be liking fresh, relevant content with no keyword stuffing or link building campaigns…maybe…

    Mark

  4. Mark: What google really likes isn’t just a static not updated billboard to advertise your business. It should be a living Page that changes, grows and engages your target audience on a regulary base. I like the idea but it may be just a bit too much as some really good “static sites” may become less relevant for big G.

  5. Panda may be my arch nemesis. Right now I have no idea what it’s doing to my websites sometimes. My small sites seem to be doing fine while some of my authority sites are falling in rank. I always though if a major update was going to happen, it would be the opposite.

  6. Some of the sites we manage lost rankings after Feb update. Over the last couple of months we managed to regain a significant portion of the traffic with a few changes.
    1. we removed similar-topic pages
    2. reduced the number of internal links
    3. in WordPress blog sites where people use yarpp plugin, we reduced the number of words showing the excerpt after the anchor text.
    4. we started increasing the size of content on each page – from 250 words to 400 words minimum.
    5. eliminated pages that had shallow content (lots of links and less real content) and no visits from Google SERPS for the last 1 month. In other words, Removed what Google bot considered junk.

  7. It pleases to read an article serious and reflected about the subject. Like Michael Martinez I do not think that Google takes into account the knowledge queries. But the other factors must be taken into account: site bounce rate with versus sector bounce rate, the quality of the contents versus the sector…

  8. Hi Nishant,

    Looking back on how the search engines have started to become more preoccupied with “quality” has become both interesting and a little frustrating. I know that the Panda upgrades have impacted a good number of sites and site owners both positively and negatively. Panda doesn’t quite have the simplicity of an SEO where you do keyword research, place keywords in the right places on a page, attract some links and you find yourself ranking well. Then again, it’s been a long time since SEO may have been that simple.

  9. Hi Michael,

    Thanks. Your thoughts are always appreciated.

    I’m not sure that I’ve seen a search engine announce a change to how they rank pages, and give us such an exact number regarding the percentage of queries or searches that it might impact.

    This patent itself does seem to focus upon navigational queries, and how Google might identify sites that are most responsive to queries that seem to be intent on finding a known resource, so that they can rank those resouces well. It doesn’t describe the Panda that we see today, but its preoccupation with “quality” and defining a quality score for specific documents or sites does seem to be one of the roots of the Panda updates.

    Panda does seem to focus more upon “knowledge” queries as you note – finding the “best” resouces to fulfill an informational need. It seems to favor sites that aren’t oversaturated with advertising, that appear to be original resouces with writers who have a fair amount of expertise on a subject, and that exhibit indications of credibility and trust.

    If this patent, and the Authority Pages patent I refer to in the post focus upon finding authority pages for navigational queries, Panda seems to be attempting to do the same thing with certain informational type queries.

  10. Hi Mark,

    I agree that there are potentially a lot of things that someone can do on the pages of a site that has been impacted by Panda that can help, but there may be some off page elements of Panda that aren’t necessarily intuitive or obvious. For instance, in this patent, links that have certain anchor text in them may be weighted differently than links with other anchor text.

    For example, I’ve also seen a few Panda sites where technical site architecture problems may potentially be one reason why those sites saw drops in rankings, such as problems with relative anchor text that might have spawned many copies of the same pages at different URLs. When your 3,000 page website has tens or hundreds of thousands of potential URLs that could be indexed by a search engine, but many of those are exact or near copies of the same pages, that’s not a good thing under any circumstance. That technical error might give the appearance of an intentional attempt to manipulate PageRank and search results.

  11. Hi Michael,

    For some queries and in some contexts, I believe that Google may favor some pages that don’t change much, and for other queries and contexts, a page that contains timely and topical information, and is changed regularly may be more valuable than others.

  12. Hi Mark Anthony,

    I think you will find that you have a lot of company. One of the things that is interesting about this patent is that in many ways it attempts to find “authority” sites for navigational type queries. Since you described some of your sites as “authority” sites, how would you define authority when it comes to content, to design, and to site structure. What makes those pages of yours authoritative? To a degree, I think that’s what Google is working at defining themselves with the Panda updates.

  13. Hi Santosh,

    It’s good to hear that you’ve been able to affect some positive changes with your sites and your rankings in Google. Thanks for sharing the steps you’ve taken and your experience.

    Since Panda appears to look at a wide range of features, it’s possible that what might work well for one site might be different for another sites. I do think that taking steps like improving your information architecture and site architecture, improving the quality of content on a site, as well as visitor usability are positive steps to take regardless of Panda’s impact.

    For most sites, it’s also not a bad idea to find positive ways to attract the right visitors to your pages in other ways as well, that aren’t so much reliant upon the search engines.

  14. Hi Aurélien,

    Thank you. There are a lot of potential factors that may play a role in how Panda influences the rankings of a website that go far beyond whether or not people click on a page they see in search results, or what number of percentage of anchor text containing certain words might be pointed to different sites, or how well represented the words of a query are on a particular page.

    Spending a fair amount of time attempting to answer the questions that Amit Singhal posted in the Google Webmaster Central blog might suggest a lot of those, and hopefully that’s something that a lot of people who were impacted by Panda are doing.

  15. WOW! This article is more like a book on Google Panda to me. Well researched information man.

  16. “When your 3,000 page website has tens or hundreds of thousands of potential URLs that could be indexed by a search engine, but many of those are exact or near copies of the same pages, that’s not a good thing under any circumstance. That technical error might give the appearance of an intentional attempt to manipulate PageRank and search results.”

    True, a robot is a robot. Bill, have you seen any such site come back after fixing it? Was it full comeback, partial or what if it happened.

  17. Another stellar article!
    One of the things that made this update is that it seemed so arbitrary. We had a few websites that had done nothing wrong but were punished. I mean setting Wikipedia aside seems so random, because that site can be just as full of poor content and spammed links as any other.

    Also, the site wide implications were very disappointing. One page that might have been copied by scrapers suddenly impacts on the whole website?

  18. Hi Jamie,

    That is an issue that I’ve now seen on a few sites that have been impacted by Panda, though there were many other issues to resolve as well. They are still pending implementation of recommendations.

  19. Hi Robert,

    Thanks. I’ve seen more than a couple of websites now that you wouldn’t suspect were impacted by Panda, with the appearance of very high quality content and great user experience, but there were a number of reasons why they might have been targeted by the upgrade.

    The patent above notes that it could apply to sites as well as pages, and it provides a very broad definition of sites to possible include all domains under common control of a person or organization, or even all sites in a specific language or from a certain country.

  20. Pingback: The Silverback Marketing Round-up for June 13-17, 2011
  21. These are all great points when trying to get to the top of that renowned search engine (obviously Google). It;s a dog eat dog world out there and everyone’s just trying to make it to the top of the business and search engine world. Thanks for the insight!

  22. Hi Jamie,

    Thanks. One of the interesting things about improving the quality of your pages is that it can not only potentially help you with the rankings of your pages in Google, but it can also possibly lead to more readers, more links and bookmarks pointed to your pages, more referrals from other people, and more conversions if you provide goods and services.

  23. Hi John,

    Google has been defining “quality” in its rankings for years, and as a search engine, that’s a large part of what it does. If it just returned a list of all of the pages that it found that contain certain words in them, in no particular order, it wouldn’t be very useful.

    Regardless of how subjective or objective we might believe quality to be, Google is defining quality, and if we are publishing something on the Web that we hope people will find through Google, it can really help to have an idea and understanding of how they might be defining it.

  24. Hi Guys
    I cannot see how google can patent a “quality control” as quality is very personal and what some people might find as quality other might find useless info!

  25. Hi Bill great post but i have noticed that google seems to have pushed named brand websites higher and less well known lower in the rankings. While i understand this could be due to the quality of the sites i can’t help but feel it would make it harder for a non named brand i.e a blogger to get higher in google rankings.

  26. Hi Michael,

    The Panda updates didn’t impact any of my sites, but Panda was aimed at impacting approximately 12% of all of the queries that Google receives. There were some sites that saw improved rankings, and some that saw traffic to their sites drop, sometimes up to 80% or so.

    One Google employee started a thread at the Google Webmaster Central Help group, which now has 4600 replies. Many of those posting have very little idea of why their sites might have seen an impact from the updates, believing that their sites were high quality to begin with. The thread is at:

    Think you’re affected by the recent algorithm change? Post here.

    I’ve had a chance to spend some significant time with a few sites that saw traffic drops attributable to Panda, and they were some pretty good sites, with some great content and business models.

    I also know of companies that had layoffs because of the loss of traffic to their pages.

    So, for many people the Panda update has been hard to bear.

  27. Hi Craig,

    Many of the things that you see that benefit brands are part of a wider attempt by Google to associate named entities with specific websites. See my post, Not Brands but Entities: The Influence of Named Entities on Google and Yahoo Search Results.

    Google has been attempting to find authoritative sites for specific queries, and part of that does benefit many brands and other named entities. The patent that I write about with this post describes a specific approach to identifying those authority sites, by assigning them quality points for the features that I’ve mentioned in the post.

    How do bloggers compete? One way is to make themselves brands, like a techcrunch or gawker.

  28. Excellent article as usual. I agree with Craig that brands do seem to be getting preferential treatment from the big “G”. I just hate the randomness with the Panda update. Some very high quality sites (some of mine included) have been hit while I see others ranking in the top 5 that I definitely wouldn’t give my Credit card to. It’s almost as if one small thing triggers it and your off to the sandbox for ages. As always, thanks for sharing.

    Jeff

  29. Hi Jeff,

    Thanks. It is odd that some sites that aren’t very high “quality” continue to rank well under Panda, while other that appear to be much higher quality suffer. I’ve seen that more than a few times at this point.

    If the Panda updates follow a decision tree ensemble approach, then there are a mix of different metrics that may cause a site to be reduced in rankings, and it could potentially be one small area that triggers it. The Panda quality score process also seems to be computationally expensive enough that it isn’t run upon an ongoing basis, but rather at longer periods of time.

    I do have my eyes on a few websites that I’ve spent a fair amount of time with at this point, and I’m seeing what impact different changes might have upon those sites. But the biggest lesson that Panda may have brought to many is to diversify our sources of traffic and not rely too much upon any one, even Google.

  30. Good stuff.

    I found these two facts most interesting:

    1) “Panda was just one of roughly 500 search improvements we expect to roll out to search this year. In fact, since we launched Panda, we’ve rolled out over a dozen additional tweaks to our ranking algorithms, and some sites have incorrectly assumed that changes in their rankings were related to Panda.”

    2) “One other specific piece of guidance we’ve offered is that low-quality content on some parts of a website can impact the whole site’s rankings, and thus removing low quality pages, merging or improving the content of individual shallow pages into more useful pages, or moving low quality pages to a different domain could eventually help the rankings of your higher-quality content.”

    Source: http://googlewebmastercentral.blogspot.com/2011/05/more-guidance-on-building-high-quality.html

  31. Hi Dan,

    Yes, I’ve been seeing a lot of references from people at Google on how many changes that they make. I’m sure that some of them are on the fairly minor side, but I suspect that there are some significant ones as well that really haven’t gotten the kind of press that Panda has.

    The second point is an interesting one as well, but it’s something that I’ve been trying to follow as a regular part of SEO for years anyway. It’s not enough to focus upon one or a handful of pages on a site when doing SEO – it really helps to see what’s going on everywhere, to fix architectual problems that might cause duplicate content, to improve low quality pages, and to improve the quality of a site as a whole.

  32. Wouldn’t you say that the more navigational queries you get the more powerful your branding is? I’m not sure whether sites that are not branded sites will get so many navigational queries. Or they are sites that people know they will find in top positions for a particular search term…Please enlighten me Bil. Thanks!

  33. Hi Lisa,

    I’m not sure that its a matter of your branding being more “powerful,” as it is of it being more prominent, or more well known within a smaller niche or market.

    A small site or business may be associated with navigational queries if their name or the name of their product line is pretty unique, and doesn’t have much else to compete with on the Web, but their efforts to brand their business might not be on the same level as much bigger brands that are known to a much wider audience.

  34. Thanks for the heads for the great quality score patent. I had to read this whole thing. The panda update is big news in the web world and has changed the internet altogether! I think that the update has really helped google searches find the content they need and that the update was definitely necessary for Google. I just wonder how they came up with the Panda name?

  35. With Panda updates becoming a norm these days, people are now speculating what is up in Google’s sleeves next. Recently I’ve read many top SEOs inclined to believe the overlooked partial match keyword(any anchor text that contains at least one of your keyword phrases) is the one and Google will start to devalue exact match anchors as their algorithm evolves post Panda. Personally I’m not sure if this will happen but I know if it does, will cause lots of unhappiness to almost everyone I know. And yes that includes yours truly.

  36. Hi Dan,

    It’s quite possible that Google will do more to make exact match domains, at least for commercial queries, not have as much of a benefit as they may have in the past. I wrote a post about that in October:

    Google’s Exact Match Domain Name Patent (Detecting Commercial Queries)

    I really haven’t relied in exact match domains that much, and if you have one and optimize it as if it were any domain name, it really shouldn’t be that much of a problem.

  37. Dear Bill,
    thank you for these insights. Panda was a desaster for some of my clients pages and I always wanted to know why. Since I do webdesign and hosting, I do not have a real inside view on SEO topics. However, now several things got clearer to me. Kind regards, Michael.

  38. Hi Michael,

    Thank you. Panda affected a good number of sites negatively, and many of the people who worked on those sites are asking some of the same questions. It’s pretty clear from the statements of Matt Cutts and Amit Singhal that some kind of document classification system is responsible that developed quality scores for pages to rank or rerank them.

    I do appreciate when designers and developers try to take the time to learn about SEO issues like these, and that’s part of the reason why I try to spend so much time writing about them. Good to hear that this post was helpful to you.

  39. Exact match domains are out of control at the moment, since Panda doesn’t have rolling updates new sites are ranking quicker than existing sites are recovering therefore we’re in a period of EMD spam for a lot of the queries I monitor.

    From the article it makes sense that domains seem to get positive points for certain elements then negative points for others – overall this should give a domain quality score which determines the impact Panda has. I’d really like to see Panda be able to tackle this at a page level rather than domain level as it ends up hitting webmasters too hard.

  40. Hi Stuart,

    I haven’t been paying too much attention to EMDs that I see in the search results because it’s just too hard to see enough of them to notice whether the rate is increasing or declining – just not enough data to notice.

    Interesting that you are seeing more of them for the queries that you tend to look at.

    It’s almost impossible to tell whether an EMD is ranking well because it’s an exact match domain or because of other ranking considerations, but you can sometimes see a page from one that ranks well despite not having much in the way of higher quality links or many features on the page that make it more relevant that other pages. The few that I’ve seen like that that I spent some time looking at more closely were for terms that you might not consider that commercial.

  41. Hey Bill, first off just wanted to say thanks for another helpful and insightful post. Your analysis on this site is absolutely fantastic.

    I had a few questions that I hope you don’t mind giving an opinion on.

    If this was related to navigational searches, would that be a reason why affiliates may have been hit hard with this update? For example, a person searches for buy “item x” online. They then go to an affiliate site, see the top list of places to buy that product and head straight off to another site. I can understand how those two signals combined with the “manipulative” anchor text signal might be used to determine whether a site is seen as spammy.

    However, in terms of the affiliate model, I’m trying to work out how one could get around that while still making an income. If we remove all ads from above the fold, then that is obviously going to cut hugely into revenues. If we keep ads above the fold then natural visitor behaviour will be to click one of the outgoing links, thus creating a higher bounce rate. And so on.

    While I understand user engagement and how they behave on your site is a potentially important signal, I also think that many affiliate sites that work with some form f recommendation type system are exactly what people are searching for when they input certain search phrases into Google.

    I had two sites hit by Panda, one a 3 year old site with a solid backlink profile and well written content (quite a high bounce rate on the homepage as it was an action search where I was sending traffic off site via affiluate links) — that site lost 90% of traffic.

    The second site that was hit lost about 60% of traffic. This site had extremely well written content. Extremely in depth. Lower bounce rate and very few ads throughout the site. The content was written by a number of experts in the area and the site had received (and still does receive) small numbers of natural backlinks. Interestingly with that site, the homepage was worst affected with the site no longer ranking for it’s keyword phrase (but ranks for the domain name + keyword phrase). I think that was in the Panda 2.2.5 iteration while the above site was one of the earlier Panda iterations.

    Anyway, I think that got a bit ranty but would love to hear your insights.

    Thanks.

  42. Hi Ian,

    Thank you.

    The patent in the post above describes how quality scores might be used to identify when a page is a good match for a query from a navigational stance. That is, when someone performs the query, is it as if they were asking for that particular page? I pointed out the similarity to Panda in that it also calculates quality scores to determine how well suited pages might be for particular queries.

    Panda isn’t explicitly related to navigational results. But the idea that someone searching for a particular query might be looking for an original source of information or the primary source for particular goods or services as opposed to an affiliate does seem to lean that way.

    If sports betting were legal, and I decided to run an affiliate website for sports betting, I wouldn’t try to rank #1 for sports betting. I would create a website about the Superbowl that provided engaging and informative information about every Superbowl every played.

    Each year the game was played would have its own main page which described the game in detail. Off of each of those main pages would be (1) a page about the season itself that lead up to the game and the playoffs, (2) a page giving detailed statistics about the game, (3) a page about the stadium and audience and media coverage for the game, (4) a page looking at the advertisements that were shown during the game. The pages would include some fairly non-intrusive affiliate links, but those would be presented as tastefully as possible. I would want to create a site that sports writers, football fans, advertisers, and the communities that hosted those games would link to freely. Every new season would have blog posts about the games leading up to the superbowl, historic looks back at previous seasons and playoffs, and so on. To use Google’s parlance, these would be the thickest affiliate sites you could ever find. And I’d avoid terms like “sports betting.”

Comments are closed.