Around the beginning of May 2010, several site owners noticed a change in Google rankings for websites that caused many of them to lose traffic. Since the change took place around May 1st, that Google update was referred to by many as the MayDay update.
Google’s head of Web Spam, Matt Cutts, published a video in response, answering the question Is Google putting more weight on brands in rankings? where he referred to the Google internal name for the update as the “Vince Update,” named after one of the Google search engineers who worked on the project.
A week ago on February 24th, a post at the Official Google Blog titled Finding more high-quality sites in search, written by Matt Cutts, and Google Fellow Amit Singhal, announced a significant change in Google’s rankings of Web pages in search results. In the post, we were told that the change would “noticeably impact 11.8% of Google’s queries.”
The purpose of the change was to:
…reduce rankings for low-quality sites — sites which are low-value add for users, copy content from other websites or sites that are just not very useful.
At the same time, it will provide better rankings for high-quality sites — sites with original content and information such as research, in-depth reports, thoughtful analysis and so on.
Earlier today, Wired published an interview with Matt Cutts and Amit Singhal titled The â€˜Pandaâ€™ That Hates Farms: A Q&A With Googleâ€™s Top Search Engineers.
In that interview, we were told that the focus of the update was to rank higher-quality sites above lower quality pages in Google’s search rankings.
While many writing about the update have been referring to it as the “Farmer Update,” since it seemed to target content farm websites, Matt Cutts shared the internal Google code name for the update, telling us that it was named a “Big Panda,” after one of the key guys involved in the update whose name is Panda.
It appears that the update involved classifying websites on the basis of a number of questions about the sites, such as:
- Do you consider this site to be authoritative?
- Would it be okay if this was in a magazine?
- Does this site have excessive ads?
So, I went to Google and searched for Panda.
And I found Biswanath Panda
One of the papers that Biswanath Panda and several other Googlers published for Google in 2009, described an experiment that Google performed on their advertising system, seeing if they could learn about the quality of ads and landing pages based upon bounce rates associated with clicks on those ads.
The focus of the paper wasn’t so much upon the effectiveness of the ads in the experiment, but rather about the ability of the machine learning system to work on a very large set of data.
The paper is:
â€œPLANET: Massively Parallel Learning of Tree Ensembles with MapReduceâ€ (pdf), by Biswanath Panda, Joshua S. Herbach, Sugato Basu, and Roberto J. Bayardo, which was originally published in the Proceedings of the 35th International Conference on Very Large Data Bases (VLDB-2009).
We’re told in the conclusion section of the paper that while the authors are focusing upon problems in sponsored search with their experimentation, they expect to be able to achieve similarly effective results while working on other problems involving large scale learning problems.
The Farmer/Panda update does appear to be one where a large number of websites were classified based upon the quality of content on the pages within those sites. The process described in the Tree Ensemble paper is one potential candidate for the change in rankings, resulting in a reranking of search results based upon answers to the kinds of questions above that could be used to determine the quality of pages.
In Document Level Classifiers and Google Spam Identification last month, I provided an example of a patent that described how Google might classify web pages based upon many characteristics of those pages.
While the patent I focused upon in that post gave us some hints about how Google might determine the language used in a web page, the main idea behind my post was that Google might pose some questions about a page to determine and classify whether or not the page could be considered Web spam.
I mentioned several things that Google might look for in the Document Level Classifiers post, and there could be other factors involved as well, including the number and placement of advertising on found on pages, how much novel or duplicate content might be found on the pages, and more.
Is Biswanath Panda the “Panda” that Matt Cutts referred to in the Wired article? Did Google use an approach like the one described in the Tree Ensembles page?
I’m not sure that it matters.
What does matter is that the update focuses upon boosting sites in search results that Google considers to be higher quality, and demoting pages that are lower quality?
The takeaway from this update from Google seems pretty obvious – rankings in search results are now more closely tied to the quality of those pages.
How does Google define “quality?”
That’s the challenge facing site owners after this update.
108 thoughts on “Searching Google for Big Panda and Finding Decision Trees”
I think another point that’s missed out from this post is that while “rankings in search results are now more closely tied to the quality of those pages”, it’s also true – perhaps even more pertinent – that rankings in search results are now less closely tied to links on bad quality pages.
I say this because article submission sites, a key (slightly spammy)technique in gaining links, are some of the most heavily penalised pages of all on the web. See here:
I think part of Google’s objective with this update is to penalise (or lessen the effect of) spammy linking techniques as well as penalising poor quality sites/pages themselves.
Hm. I still think the engineer named Panda is probably Navneet Panda, whose background is in machine learning. Perhaps we should prevail upon Google to bring the Panda to SMX West next week.
Googlers have published at least 165 papers on Natural Language Processing and they have been working to apply these techniques to Mobile, Voice, and Web search from what I can tell — perhaps even more platforms.
However, I think there is a flaw in their statistical-based approach (or else they have not yet divulged how they have addressed a serious issue). The Web is not really static but an index of the Web may pass through non-dynamic states (I’m not sure I want to call them static states).
Language, however, is a living mechanism — well, living languages are. That is, a living language is a vector and the Web actually incorporates multiple data points from that vector into its content. English as it was written 160 years ago is not quite the same as English as it is written today, and yet you can find millions of documents written in 1800s English (from both sides of the Pond).
I wonder how the algorithm adapts to all the subtle variations in idiom and lexicon. I’d love to spend an afternoon chatting with the Panda. I’d have a thousand questions for him (or her).
“So, I went to Google and searched for Panda.
And I found Biswanath Panda”
haha, thats so funny. Thats exactly what i was doing. That was my exact thinink: “Look for that guy and try to figure out what he published”
I guess you’ll let us know, if you find anything else on that topic 🙂
all the best
Just like when the president wants to know “Where are the carriers?” when something bad happens, my first thought last week was: “Where is Rand Fishkin, he has the resources to throw at this!”.
After Rand published SEOMoz’s analysis that linking was not the key factor, I moved on to thinking about machine learning and classification and was thinking “Where is Bill Slawski?”
Good detective work and thinking.
My take on this is, if it’s truly a site-level “feature” they are classifying, this will force people to pull up camp and start new sites. If it’s a page-level “feature” they are classifying, there is some hope for existing sites to address this. The interview in Wired makes it sound as if it’s site-level but I’m not entirely convinced yet.
P.S. Bill, I suspect the results from that paper you mentioned might be those that were presented in another paper called “Predicting Bounce Rate in Sponsored Search” which really should have been titled “Quality Score, if there’s no history built up yet, equals relatedness”. An interesting read if you’ve not seen it.
I’m not sure that I would call this a penalization of spammy links or content, but rather a method to rerank results to try to get higher quality pages to appear above lower quality pages.
As a reranking method, boosting some pages and reducing the rankings of other pages, it may look at a number of features associated with pages to classify them at a certain level of quality. The pages initial ranking may still depend upon measures of relevance and quality, with links playing a role in that ranking. Many sites that contain lower quality content do depend upon links pointed towards them to rank well in search results, and the methods used in the Panda update may help negate some of the influence of those links.
Ted: I’m pretty sure it’s site-wide. I have detailed data on 3 sites I know intimately and all 3 have been hit site-wide across different page types. Long tail phrases seem largely untouched.
Interesting thing is that all three of these sites are based on very detailed original articles (often 2-3,000 words long with deep quality links) written by experts.
Collateral damage? Maybe, but even then there must be a reason they have been hit. Possible variables (from these sites) include lots of ads, low average page read (those long articles are often not what a searcher wants) and (as well as the quality conten) there are a lot of dynamic category pages. Reminder: all the different types of page got hit.
When I came across the word “Big” to describe Panda in the Wired article I should have known to search for more Pandas.
I read an interview yesterday afternoon with the Oakland Athletics GM Billy Beane, and the interviewer asked him about the time he was on the same team with someone else who shared his name. (It was somewhat of an irreverant and edgy interview.) Billy Beane mentioned that he was referred to as “Big” Billy, and the other Billy Beane was called “Little” Billy.
Both Navneet Panda and Biswanath Panda are working on machine learning approaches, and chances are that either could potentially be the Panda mentioned in the Wired article. The description of the algorithm in question does have the earmarks of a machine learning approach where a seed set of good sites with certain qualities and features is identified and compared against other sites.
I’d love a chance to spend some time talking with (the right) Panda as well. Actually I think a conversation with either Panda would be pretty interesting.
Thanks. I was pretty happy when I saw that Matt give us a few hints about the upgrade, including the name Panda.
I’ll definitely be keeping my eyes open, and doing a lot more reading.
Hi Ted and Mark,
We have heard language from Matt Cutts about this upgrade involving a “document level classifier,” but that doesn’t necessarily mean that the impact wouldn’t be on a site-wide level
Thanks for sharing some of your data. Interesting that long tail terms don’t seem to have been effected. On the sites that you have data upon, are the different features associated with pages very similar from one page to the next on the same sites? In terms of things like layout, use of advertising, reading level, length of content, size of image files, speed of sites, and other features?
I hadn’t seen that paper before, but it looks like the paper you’re referring to is included in the footnotes in the Tree Ensembles paper. Here’s the quote and the citation:
 D. Sculley, R. Malkin, S. Basu, and R. J. Bayardo. Predicting bounce rates in sponsored search advertisements. In SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pages 1325â€“1334, 2009.
Sugato Basu and Roberto J. Bayardo are co-authors on both papers as well. Interesting set of features used in the experiment.
Hope you’re well?
With a title like “Searching Google for Big Panda and Finding Decision Trees” I just had to comment! 🙂 I guess this fine tuning of the web is inevitable and ultimately only a good thing. Its frustrating as a user when you’re bombarded with “spam like” websites who know how to play the SEO game in order to gain rank but at the end of the day, offer nothing in return.
Looking at your notes ref the Google criteria for a quality site, its all rather straight forward isn’t it, I think Googles definition of Quality is much the same as ours, we all know what constitutes good content, don’t we? Or am I just not seeing the bigger picture here….
There’s additional interesting information on this subject on Jeff’s Search Engine Caffe at
Doing alright. Hope things are well with you, too.
I think this is a step in the right direction, though there seems to be reports of collateral damage where sites that have original and unique quality content may be seeing some harm.
This update seems to be taking a different view of the quality of pages in web search than Google may have been in the past. Instead of focusing just upon the content of pages themselves, it seems to be looking at other things as well, such as the number and placement of advertising and other features. Google seems to have gone from how well a page might meet the intent of a searcher to include how good of an experience pages returned in results might be. A page with 50 words of actual content, and 10 ad blocks might not rank as highly as one that shows 500 relevant words, and only an ad or two. Of course, there are probably a lot more features being viewed than that.
I stop by and check out Jeff’s Search Engine Caffe from time to time, and he usually has something interesting to say. In this case though, it looks like Jeff didn’t go much beyond what was covered in the Wired Article.
I think it’s great that Google is cleaning up the SERPs of low quality sites and giving better rankings to sites with high quality, but I’m one of those site owners who wants to know how is Google going to define quality.
How does Google define â€œquality?â€ For example, if you find two sites with the same type of content how do you decide which site is the one producing copy content and which site is the one with the original content?
I am all for Google weeding out the bad sites, I just hope they know who the real suspect sites are before they penalize them. Thanks for sharing as always Bill.
Thanks for the extra info. Yes I guess that’s the main concern, whether Google can (correctly) identify original and unique content. I can’t imagine how frustrating it would be if your site was penalised for no apparent/wrong reason. It’s not like you can drop Google an email to correct the error, plus I suppose if your site is pushed down (or up) the ratings, then someone elses site takes its place, in turn that site will also gain or loose out, its a knock on effect…which I guess is fine as long as its for the right reasons.
I’ll keep an eye on this and hope to learn more as the info comes out, thanks for keeping us informed Bill.
Cheers for now
I bet Google, even though they downplay the influence of bounce rate as a SERP metric is working furiously to include it in the algorithm in some form or fashion if it hasn’t been already. Honestly, what other way is there to measure whether or not a document is relevant to the human mind? However, SEO will figure out a way to manipulate any reformed version of this metric too.
â€œThe perfect search engine,â€ says coâ€“founder Larry Page, â€œwould understand exactly what you mean and give back exactly what you want.â€ – Google’s Philosophy opening statement
Seems Google has done exactly what they should, and following in there goal of a search engine. “Panda” seems to have created an artificial intelligence model to correctly (sometimes) identify websites with 50% or less quality content. With mass content websites, although they sometimes provide great content, most of the content is spammy links to sell products and contains information that does not answer a question or provides little “answers” to the question you may be asking.
We are sure Google has a neat little algorithm determining many key factors. I even believe domains like “some-keyword-with-dashes.com” is being evaluated by sniffing the href, or if a site is using redirects (mass redirects, means poor quality) in almost all cases. Still including the page content as a high weighing factor. However, after measuring that content as they did before, like Bill mentions above. “A page with 50 words of actual content, and 10 ad blocks might not rank as highly as one that shows 500 relevant words, and only an ad or two.” I truly hope this is being evaluated, users don’t want to be bombarded by advertisements. A user’s main focus is to find an answer to their question, the initial question being their answer. If they happen to see an advertisement they are interested in, well this will change users behaviour with ads and increasing the ROI for contextual advertisers since they followed an actual personal or business interest! Seems if Google can raise the ROI of advertisers, they would earn more in the long run and be able to raises costs or gain more advertisers.
Google’s great philosophies,
1. “Focus on the user and all else will follow.” and 2.”Itâ€™s best to do one thing really, really well.” both of which answer why this new algorithm update has happened, and if you read all 10 amazing business philosophies you will know the future of Google as it was meant to be and how the search should be. #4 has lost credibility over the years, but still a factor in some industries not all websites require #4 though.
For there philosophy on “Democracy on the web works.” we slightly disagree depending on the project. We have many websites which started with a single link, gained very high SERPS and still do extremely well in search engines without any link building services or plans. users are the ones who suddenly start providing links to the website using “natural methods” which identifies Google’s #1 Philosophy.
Mark, you ask “Honestly, what other way is there to measure whether or not a document is relevant to the human mind?”, I can literally think of 100’s of factors which can prove relevancy, and answers. Search 100 websites of the same question, you can identify each one, and simply write these identifications on paper, 1 by 1 as you identify them. You will be amazed. While browsing, identify the pros and cons of each website you visit in that industry. Look at all factors you can think of, including finding your answer and remove any bias thoughts (choose an industry you know nothing about). Measure your eye movements as best you can. Measure everything from doctype to last code before and you will see some extremely regular factors of poor content, and regular factors of good content. Good defined is simply “the answer you are looking for” in your search.
Measure Google Philosophy #1,#2,#3,#5 and #7 if you dont know seo from developer eyes, and you will see much failure on each of these philosophical areas. Also read Google user experience. Put extremely simply, Google gives us all the answers to what we should do, and what they will fix in the future if it is broken! They have much more information to read, these just give a good idea.
Mark, I don’t believe bounce rate should be determined for all industries. Some clients want the customer to “just call, or contact us”. If you provide everything correctly, and force that call or contact us which follows one of Google’s philosophies, and you provide all this on the first page the user lands on you can have a pretty high bounce rate. If you provide the answer they are looking for right away, you can get a high bounce rate. It’s whether the user returns, or you gain a high readership and returning visitors that is not organic, sites using Google analytics would give Google these answers.
If a user searches Google, clicks and no answer found, goes back to Google within a time period, searches again for relatively the same query – I believe this tells Google the bounce rate is bad. How to determine that? tons of information over 10 years, I’m sure every query by every user in some way shape or form could be evaluated. Knowing how a user searched your site, which we know Google does, and knowing what they searched for tells us whether they found their answer. That is how bounce rate should be calculated.
A few years back I started a campaign against “virtual blight” and presented the basic opinion that website, like neighborhoods, can become blighted. I suggested back then that both the quantity and the TYPE of advertising on a site played a significant role in our perception of the credibility we gave a site. The questions Google offers up as guidelines seem to go directly to this idea of blight.
I haven’t taken the time to analyze the types of ads on the sites that got hit the hardest, but my gut impression is that almost all of the losers here are actually blighted, filled with ads for scam offers, opportunity seekers, rebills, etc.
I was wondering what you had to say about this big change. I was really confused and hit hard by this latest update, I have a fairly big authority site and it was hit pretty hard in the rankings. I was really confused by it because I used nothing but original content and never BL it with spammy backlinks. I then realized that I used too man ads, I always had it in the back of my mind that there were too many ads and this change proved it. Well I got rid of a bunch of ads and now my site is back up and actually doing a little better. So like you noted those three questions are huge.
Do you consider this site to be authoritative?
Would it be okay if this was in a magazine?
Does this site have excessive ads?
I really used the first two before but never paid much attention to the last one, well it was a big wake-up call and glad I found out about this. Thanks for dissecting things like you do, it really helps us right brain dominate people for sure.
I wonder if anyone can confirm that only sites using some form of advertising model have been affected by Panda?
In some things its a little bit suprising that we need hints about that update. In Germany we donÂ´t have the rollout of this update at the moment. But if you have good websites with quality you donÂ´t must care for this update but keep the effectness of the update in mind.
It is only right for Google to implement this change. There are a lot of spammers out there that exceeds other sites who do not resort to black hat techniques. This move makes it fair for all of us who spend so much time and effort to follow the proper rules and guidelines in marketing our sites.
That was one of the little times that Google actually give details about the update of its algorithm. Personally i believe they shouldnt do announcements and do what they have to do about giving “quality” results. I dont mind if quality is a term that is only defined by google, because that makes SEO services more quality also.
Bill — altho, like you, I’m not sure whether it matters if Biswanath Panda is “Panda”, this is good solid investigative journalism and the more connections we can make between a specific algorithm update, the people responsible for the update and their papers and patents, the more informed we will become about what happens behind the curtain. This is also one of the reasons I think your patent posts are important. I don’t see others focusing on those. Thanx, Glenn.
What I want to know is, if Google implements a change like this, does that then discount the value of links FROM these content farms?
Google will definitely be implementing changes like these as they are growing tired of low quality spam filled sites. Keyword stuffed sites will also be impacted as Google are going to tighten everything up and begin to reward the sites that provide the viewer with more information. So keep in mind that quality is everything at the moment with Google.
In response to Lakis quote “I dont mind if quality is a term that is only defined by google, because that makes SEO services more quality also.”
Abosolutely brilliante way to state this update. I love it :)!
Simon, quality has been the focus of Google since the beginning of time. It’s the developers that find ways to intrude into the quality of results, Google being how great they are will allow it to happen so they can sniff out the bad quality in large numbers rather than fixing one error, and allowing newcomers to push in on a consistant basis.
If they resolve one problem at a time rather than many (1 site vs 1000), they would not be fixing anything. If you allow 100,000 websites to use the same fishy techniques for a period of time, all of which the webmasters know are against Google policies, than you can clear those websites in large numbers with future updates. Hopefully all the websites which provide terrible quality, will remain on the low list in Google serps. If they tricked the system once, they will try to do it again.
It’s brilliant as far as I am concerned. At least this is the way I would do it.
I guess the lesson here for SEO professionals is not to depend on search engines for the bulk of a website’s traffic.
Ed, that is not a lesson for SEO professionals. Anyone in the industry who is actually a professional and an experienced SEO service provider would already have been aware of your statement. SEO is a viable marketing strategy for start-ups or established business. It is just not your only strategy as stated in our twitter bio. There is tons that go into the strategy including what to do with the traffic from all sources of marketing. Seems everyone has a misunderstanding of the industry, caused by all the book educated so-called seo experts who don’t actually know what they are doing. I would like to say our definition ( http://www.facebook.com/#!/topic.php?uid=122108587667&topic=16689 ) is a much more understood explanation of the industry and what the seo service should provide.
Real experts in the industry could foresee this Google update, and were not only expecting but waiting for it so all the trash websites could be removed. Just a matter of time 🙂 even our industry requires patience. Proper services in SEO ensures clients were up in ranks when this update rolled out instead of the affect many experienced from following false and misleading education on the rules of optimization.
There has been an incredible increase in the quality of website on page 1 of Google results. Every time Google make an update it is for the better and we love how it dumps those “Over optimized” websites
At this point, that definition of quality is one of the big question marks, and it raises a number of questions by itself.
What other questions would you ask to determine the quality of pages?
There’s always the possibility of algorithms that don’t quite get things right, and that exists here too.
Identifying original and unique content does seem to be part of this update, but think about sites that tend to copy and scrape content and how they use that content. Often that’s done to quickly create content that can be monetized by advertising. One aspect of this algorithm is that it seems to look at how advertisements are used on pages – and if advertising is done in a way that seems to focus more upon people clicking on ads than focusing upon the content of a page, that might be a way to distinguish between an original and a copy – of course some sites with original content may be heavy on advertising as well.
Bounce rate isn’t the greatest measure of relevance or quality of web pages that are informational in nature. If the purpose of a page is to get someone to buy something, or get them to move to a page to learn more about goods and services, then bounce rate has some value.
But, if a page is informational in nature, if it’s been written to make it easy for visitors to find the information they are looking for (with headings and subheadings, lists, meaningful images, and more), then people may find what they are looking for and not move to other pages on the same site. That’s often true with news articles, with blog posts, and with other pages that provide answers to specific informational needs.
The purpose behind those types of pages isn’t necessarily to get people to view other pages on the same site, but might rather be to educate people, to advocate specific views, to build the reputation of the writer, to get people to bookmark or subscribe to a newsletter or RSS feed, to have them return to see newer stories or posts, and others. Someone visits, and instead of going to other pages, they bounce back to the search engine. The pages may be of such high quality, that someone visiting them may find what they are looking for immediately, and the page may have been the perfect page for a search engine to deliver someone to based upon the query used. Because of that, it might not be a good idea to use “bounce rate” as an indication of quality.
Bounce rate is a much better signal when it comes to people clicking on ads and visiting a landing page from that ad, and then bouncing instead of visiting a page to perform a transaction.
Thanks for your detailed and thoughtful comments and responses to other comments. Much appreciated. I think I agree with pretty much everything you’ve written.
I’m not sure that we can apply the label “artificial intelligence” in this situation, since it seems instead to be a machine learning approach that looks for and rates a large number of features on a site, though it does seem to be doing a good job of simulating artificial intelligence.
Amongst other things, it’s possible that Google is looking for some signs that might seem to statistically indicate the presence of spam, like described in the Microsoft paper Spam, Damn Spam, and Statistics. Your mention of domain names with multiple hyphens reminded me of that paper, and the use of multiple word hyphenated host names in domain names as a frequent indication of spam pages.
Thanks for pointing to Google’s Ten Things page as well. It’s definitely worth a read to people who hadn’t run across it before.
Agree completely with your response to Ed as well. It’s never a good idea to rely too much on any one site to deliver traffic to your pages, regardless of whether it’s Google or Facebook or Bing or anyone else.
There’s an element of the Broken Windows Theory to your concept of virtual blight on the Web that I think works well as an analogy. I know that I’ve seen many sites where I found the information I was looking for, but didn’t like the ads shown, the places where they were shown, and the amount of advertising. That was often enough for me to not really return to those pages.
Thanks, Santa Cruz
Good to hear that your removal of some ads made a significant difference. Thank you for sharing your experience.
I know that there’s a temptation even on pages that contain unique and high quality content to include more ads than maybe you should. If people visit your pages because of the great content you provide, and you display ads that are also relevant and helpful to visitors, ideally that shouldn’t be a problem. If advertising is the primary means of generating revenue on those pages, then a few more ads shouldn’t hurt if they don’t take away from a great user experience. But, it looks like Google might be telling us that it’s possible to push that a little too far.
Should that be a role that Google is taking on? I’m not completely sure, but it does seem to be something they are now doing.
I guess starting with sites in the US provides Google with a chance to experiment, test, and gather a large amount of data in a short period of time. Regardless of where you’re located though, I don’t think that it’s ever a bad idea to consider the quality of your pages and see if you can make improvements.
Not sure that anyone has verified that it’s only sites that show advertising that are affected.
We’ve been told by Google in the official Blog post that this update impacts 11.8% of the queries performed at Google. We aren’t told much about those 11.8% of queries other than that. I’ve seen a large number of breakdowns of queries into specific categories such as informational, transactional, navigational, commercial intent, geographical intent, local intent, etc. I don’t remember seeing any of those set at approximately 12% anywhere. So just what does that 11.8% signify?
I actually really appreciate when Google gives us details about algorithm changes.
A number of people from Google have told us over the past couple of years that the search engine has been averaging about 400 changes to their web search algorithm a year, so they do seem to be pretty quiet about most of the changes they make. Maybe they thought that this impacted enough queries that they should mention it, just like they told us about the synonym update.
Thanks. I read the Wired article and then searched around to see if anyone had tried to identify the “Big Panda” mentioned in the interview and didn’t see anything. Fortunately I found one, and a paper he co-authored that seems to provide ways to implement the changes we saw with this update. It might be a different Panda, and Google may be doing something slightly different, but I think the paper points out some of the challenges they faced in making this change, like the very large amount of data that would be involved and the difficulties of addressing that challenge.
It does feel good to see a Google update that focuses upon rewarding good content and pushes down lower quality pages, including duplicated and scraped content.
That’s an interesting question. Would this update have multiple impacts including a devaluation of links from lower quality pages? If it doesn’t at this stage (and it might not), it’s possible that it could in the future.
The approach seems to be a reranking approach, where rankings would be calculated as they were in the past, and then some measure of quality would be used to boost some sites and lower other. That wouldn’t impact links by itself, but that quality measure might possibly play into direct rankings at some point. It’s also possible that pages boosted in rankings may end up getting more links in the future than pages that have been demoted – so there could be an indirect impact on links and ranking.
There have been a number of recent highly critical and highly visible news articles and blog posts about the quality of pages found near the top of Google searches. If this update is Google’s response, it couldn’t have come at a better time. We know that Google has been working on this process for more than a year based upon comments from one of Google’s engineers at Hacker News
Glad to hear that you’re seeing more relevant results in Google.
I’m a little skeptical when it comes to big changes like this – there’s always the possibility that someone who shouldn’t have been impacted negatively by it may end up hurt.
I haven’t had a chance to sit down and read the Google Webmaster Hlep thread started by a Googler titled Think you’re affected by the recent algorithm change? Post here., but there are over 500 responses in that thread so far. Skimming through the first 4-5 so far, it does look like there are some site owners who were negatively impacted who feel that they shouldn’t have been.
How can Google define ”Quality content” i can understand if like they say they will give less importance to sites with say duplicate content, but saying the site is not useful? how do they know that its not useful. Google are smart but are they experts in all fields? i don’t think so. This way i see that they could be punishing some one who writes only original content which is of some use to someone!!
That is definitely one of the issues that is worth thinking about. How do you define quality? How might a computer algorithm determine whether one page is higher quality than another?
There probably is some element involving duplicate content is this approach, but what other signals should Google look at?
It’s possible that one involves advertising, and whether or not the amount and location of advertising makes it difficult to read the original content on a page. I linked to the Google Webmaster Help Central thread (in my reply to George above) for people who believe they were harmed by the downgrade even though they believe they shouldn’t have been, and I’m slowly reading through it. There are a lot of people who responded. I think a lot of people are asking the same thing that you are.
Another change I saw today is that the PANDA is eating symbols in front of website titles (which many people used to catch the eyes of the searcher).
Hi Bill, ive actually just taken the time to watch Matt Cutts video on the so called ”Vince update” where he mentions that nothing really has changed, which i dont think is true at all because ive noticed a change in some of my sites, many of which are small niche sites that i have taken time to add what i think is quality original content. He also states that if someone searches for X then Google will give the best results for info on X which takes me back to my original point who the hell are they to say what is the best info on X if they don’t know anything about it, especially if its a speciality niche in which ive worked all my life. Seems to me its far to automated now and some innocent webmasters are taking some bad hits!!
Great post by the way….. i could moan all day about this one!!! lol
That’s very interesting. I don’t know if that is something that is related to the Panda update or not. It sounds like something that could be independent from that particular algorithm though. Is there a particular reason why you think the two might be linked together?
“Youâ€™ve probably had the experience where youâ€™ve clicked a result and it wasnâ€™t quite what you were looking for. Many times youâ€™ll head right back to Google.” from http://googleblog.blogspot.com/2011/03/hide-sites-to-find-more-of-what-you.html
This is what I was referring to in my response to Mark above “If a user searches Google, clicks and no answer found, goes back to Google within a time period, searches again for relatively the same query â€“ I believe this tells Google the bounce rate is bad.”
This goes to explain that the information is used in some way, and with the new “blocking the sites you donâ€™t want to see.” with Google also saying “In addition, while weâ€™re not currently using the domains people block as a signal in ranking, weâ€™ll ___look at the data and see whether it would be useful___ as we continue to evaluate and improve our search results in the future.” in some way proves that with great amount of user feedback the information will prove to be useful and I am hoping Google uses this as a guideline for finding poor quality websites in algorithms :)!
I think that is too much talking about latest Google algorithm update.
Just focus on good quality original content and don’t worry about SERPs.
Oh my god, i am not on page 1 for my main keyword anymore.
Bu hu hu. Try to provide good content and you will see how you can rank for almost everything. Ok, this is a long term strategy, but the best strategy.
Win-win situation. Good for readers, good for spiders and crawlers.
Hiya Bill. Sorry for the delay in replying to your questions. I wrote up some of the answers as a Google Panda Survival Guide http://bit.ly/fWlewJ Perhaps the main points you’ll be interested in are:
â€¢ you can have as much quality content as you like but if there is a lot of other content that G might define as low quality (thin/dupe/add-loaded) then Panda might get you. This other content could be as ‘innocent’ as relevant category pages or genuine Q&A pages for users.
â€¢ It is site-wide (as Google have no said) but not exclusively so (there are exceptions but I don’t see any clear patterns to those exceptions.
â€¢ removing ads alone is likely not enough to lift the penalty.
First off, great article. Nice to finally read something beyond all the fluff surrounding this subject. Secondly it always pisses me of when people talk about quality sites vs. low quality sites as if the google algorithem actually measures such things. Say it like it is… they are TRYING to seperate the two. The analazyation of quality of a page is no where near reality. It is based on INDICATORS of quality not quality itself. Great article.
Hunting for the evil Panda 🙂 By the way, Google is never trusted in no way. I lost %30 of my earnings after the latest hit on gadget product reviews. Yeah, I have unique content mostly but no authority and I post rarely. It is the problem for small blogs and websites.
I think there’s no escaping the issue that you raise, which is that in any topic or niche, there are pages that experts would say are the most important and relevant, and it’s likely that an indexing program like a search engine may have difficulty find those based upon algorithms like PageRank or even this new update which attempts to rerank search results based upon a quality score.
The best page on a subject isn’t always the one with the most links to it, or even the most important links to it. That’s a failing of trying to imitation a citation analysis approach to indexing pages. Often the most popular page, from a linking perspective, is the one that attempts to present a topic to a mainstream audience. For example, a paper on dwarf starts orbiting around clusters of black holes may be explained extremely well in an article on an obscure scientific journal, and the highest ranked page about the topic might be a heavily linked to article on a science site written for a much larger audience.
The Panda update doesn’t address that problem, but then neither did the algorithms that came before it.
Bounce rate is such a noisy signal, that I wonder if it can ever be really determined to be useful. It’s possible that someone found something useful on a page that they visited, such as a phone number or email address of someone to contact or they bookmarked or saved the URL of a page, but they are comparing other sites that might offer similar services or goods so they return to search for more.
There are 739 comments in the Google Webmaster Central thread right now, many from site owners who have been building very high quality sites, who have lost a considerable amount of traffic to their pages. Pretty much proof that just focusing upon building high quality content isn’t enough.
Thanks for providing a link to your page on surviving Panda. I think it’s very useful. Some interesting discussion in the comments as well.
It’s interesting to see some sites that I would probably consider to be “low quality” seem to have weathered this update just fine, while others that I would consider high quality find themselves to be collaterally damaged in rankings and traffic. It is about “indicators” of quality rather than an absolute judgment of quality itself.
I agree, and I’ll take that a step further: It’s never a good idea to rely almost completely on any one site for traffic to your pages, regardless of whether it’s Google or Facebook or anywhere else.
I am sure that you’ve probably read SEOmoz’s The next generation of ranking signals(you’re posts are even mentioned in it!), but because it is also really helpful, here are some of the key points from the post about thriving through these algorithm changes are discussed, for your visitors.
– does the site have people working for them?
– does the site own their own social pages (ie. have links from LinkedIn)
– is the site a brand and does it get branded search volume?
– is there a physical address on the page, and is the site listed in authoritative business directories?
– is there a comprehensive ‘About Us’ page?
– does the site run offline marketing?
– Is there any beedback from human quality raters?
– What does the clickstream data reveal? After clicking from SERPs, are users engaged, or are they immediately bouncing back and clicking on other listings?
Content has been king for a while now, but can I ask what do we do for sites that are product based and it is limited on what you can write about one product day after day?
I’m proberly missing something here.
Bill, your response to peter and response to me (James) on bounce rate answered from my opinion in this comment. As for your response to Peter,
There are 739 comments in the Google Webmaster Central thread right now, many from site owners who have been building very high quality sites, who have lost a considerable amount of traffic to their pages. Pretty much proof that just focusing upon building high quality content isnâ€™t enough.”
I can’t say I completely agree with this statement, simply because if those webmaster factor in all the fancy stuff they are doing to get traffic (blog links, comment linking, keyword linking, other bad linkbuilding programs everyone is so dedicated to, and others, etc..) and many more factors, all of which have a negative affect on website ranking. Also, who are they linking to, is there content in one category, etc.. etc.. I can think of a hundred reasons any one website could have been affected and history plus experience tells me every one of those 739 comments is because of providing quality content without following the negative SEO seen all over the place and sold to un-educated consumers. This algo was not just to target content farms 😉
I have many websites, all with what I consider quality content, not one was affected by the algorithm update in any negative way. 7 of those websites with no affect have over 1 million pages of content. The difference, I never did any of the SEO you find in books, or traffic building you read in articles across the web, or bad SEO as I call it. All of which I argue about daily to push business and new websites away from, these services sold worldwide which in the end will fail. If we were to analyze all 739 comments – probably 400 website owners, and they can show the negative impact on traffic. I can assure, the traffic they are missing is for reasons that are expected. Possibly even losing link love and other ranking factors because the websites they once relied on to pass that link love were given penalties.
Link love, since I can remember, and the more backlinks etc, has NEVER been my focus, nor provided the best results or the highest ranking in search engines or received the best traffic. I want to give some examples, which I am going to do in my new blog soon with some real live examples. One example will be a 4 month old website, unaffected by the algo update that receives more than 300,000 uniques from search engines monthly, over 10K a day, its new, built in October 2010, average positions of 1-10 in Google for more than 10,000 queries and yet, has no links or “link love” to the website itself. Its one topic, and has 298,000 indexed pages of content. Quality content does matter, it truly is how you define quality content.
So, my argument is content is still king, as long as it is quality, and, one subject, also the architecture, bounce rate, avg website viewing time, and many others ALL have factors in rankings and continued rankings. EzineArticles, although good content 20% of the time, I bet the bounce rate was tremendous (50%+), which, if viewed from a statistical point of view is not providing quality or a service to users. Although users are shoppers and may go back to Google to search, undetermined but my best guess says that number is miniscule. Similar to the “nofollow” links which is less than 2% of links on the web.
I don’t link to any of my projects, never did link building for even my own websites, anywhere, not even in comments, not on forums, never really. So my new blog will be the first time in 10 years linking to a website that receives loads of traffic without the process of personal or business link building, I bet, from this test I will do and the new post coming up, we will see a negative affect which should change the view of thousands of SEO providers and specialists everwhere 😉 I will use my new website, since it’s new I am not to worried to lose it’s rankings.
Quality, is king, content is king… imagine the royalty of quality content. – “tweeted by You Go Media”
Just figured out what I am calling the new post –
Thanks for including information about the SEOmoz post. When the Panda update started being presented as one that looked at quality signals on webpages, I couldn’t help to think about a post I wrote a few years back involving quality signals that Google might be looking at for paid search advertisements – How Google Rejects Annoying Advertisements and Pages.
Of course, Google is going to look for a different set of signals to use to determine the quality of pages, which might involve things that it finds on pages themselves, interactions with those pages, site structure, uniqueness of content, and possibly much more. Hopefully anyone who may be potentially impacted by the update (pretty much anyone who publishes content on a website and is concerned about traffic from the search engines), is asking lots of questions about how a search engine might gauge the quality of their sites, and are taking steps to improve them.
My gut instinct everytime I see someone write or hear someone say “Content is King” is to reply with “context is king.” In other words, the right content at the right time, in response to the informational and transactional needs of visitors to your pages is what helps you meet the business objectives of your pages. Great content is wonderful, and you’ll find very few circumstances where I will argue against it. But sometimes someone comes to a site, already willing and able to make a purchase, and you don’t want to impose barriers that will keep them from doing that.
Writing about products is tough, and if you only sell one product, then you may need to write about some other things, like the lifestyle associated with that product, the different benefits that it can bring, the technology around it, the options that it might offer in the future, the unique ways that some people are using it, and so on.
If I sell basketballs, and only basketballs, and I want to run a blog about my business, I’m not going to write about basketballs everyday. Instead I’m going to write about the game itself, at a professional level, at a collegiate level, and in high schools. I’m going to write about midnight basketball programs (if those are still around), and community centers that are planning on adding basketball courts for their neighborhoods. I’m going to share an enthusiasm for the sport, and be an evangelist for it, and an expert on it. I’m going to embed YouTube videos that show great shots, miracle comebacks, and terrific players. Get involved in helping to bring basketball courts to needy communities, and you may have people helping you sell your basketballs as a result.
If you can become the site that people go to if they want to learn about basketball, and news about basketball, you can also be the first place people think about to buy basketballs.
Chances are that you’ll probably never see me advocate against creating great content for a site, and the majority of sites that I run into on the web can often do a great deal to improve the quality of their content, and the user experiences on their pages.
I pointed out the Google thread because there are a number of pages that I saw that that I visited in person, and found the content of those pages to be pretty strong. We don’t know if those sites were engaged in activites like “blog links, comment linking, keyword linking, other bad linkbuilding,” but there were a number of them that had legitimate links from great resources. I’m sure that if I spent some time analyzing many of them, I could find a good number of things that each could do to improve their content, and the experiences that visitors had to their pages.
Chances are that with your knowledge of search engines, you probably aren’t making a lot of the same mistakes that those sites are, but often quality content by itself isn’t enough.
The focus of this update seems to include the quality of content, but user experience and things that indicate a good user experience seem to be a strong part of it as well.
Very much appreciative of the efforts to clean up the supposed continuing dwindling quality of search engine results. But what are people’s thoughts on the fact that one of the biggest ‘content farms’ eHow has not lost rankings as a result of this update, and is this in someway connected to Google’s close ties to Demand Media? If so it’s hard to believe that Google are still acting in the best interests of the end user…
Bill, just a word to say thanks for you detailed reply, I have a clear picture of what actions I can take.
It’s interesting that eHow is still continuing to rank well, and rather than being upset about that, I think it’s an opportunity to compare eHow with other sites that have been negatively affected, and get a better understanding of the changes that Google’s new algorithm has brought about.
I agree with JB, but not just about eHow…there are many other sites out there that actually seem to have increased in rankings after the update and yet seem like content farms.
Personally, I was hit pretty hard with the update for the first week but now traffic is higher then before the update. I guess it takes time for these algorithms to really take effect.
I still see lots of sites that are stealing my content ranking higher – that’s annoying.
Good to hear that your site rebounded from the affects of the algorithm change. It is annoying to see a site stealing your content, and ranking well because of it. I’ve contacted a number of sites and asked them to remove my content, and also ended up switching my RSS feeds from full to excerpts because of people scraping my RSS feed.
I think there may be some things to learn from the content farm like sites that benefitted rather than being harmed by the update. What things distinquised them from sites that did end up with lower rankings and traffic?
I have recently posted the article at my blog with my view on Panda update. To cut a story short I would like to say that if you are owner of website and always post 100% original content you have nothing to worry about!
Bill, sometime I face my posting ranking in Google SERP is lower than a ‘posting’ from auto-generated-content website. I am not sure whether the ‘posting’ from auto-generated-content website is defined as ‘higher quality or lower quality’ by Google.
There are a lot of aspects to Panda that are still being explored these days, especially in light of Google’s recent spread of the update to all English speaking users of Google search.
Orginal content may have an important role in how pages are being treated by the content, but I’ve been hearing reports that a number of sites that appear to have been affected by the update are ones that do create all original content, and have been copied by a good number of other sites.
I would suspect that in many cases, the auto-generation of content puts a site more at risk in a system that attempts to rerank search results based upon quality.
What I am seeing on a lot of sites that were supposedly impacted by Panda is that they are making a lot of mistakes when it comes to some very simple basis approaches to SEO, regards of the originality or quality of their content.
Nice article. Good research on who is Panda and whether he is the same Biswanath Panda. But with this update Google is making its intentions clear that black hat SEO will be almost eliminated in future. Content farms were the one who were targetted in this update.
It would have been great if you would have elaborated on what after effects this algorithm would have.
Of course Google would want to see spam and low quality pages kept from ranking well in search results. That’s the purpose behind upgrades like Panda.
Nice post.I would like to know will google panda affect on website sharing a same type of content with different title.Because every website share theirs works along with others.Consider this if a publisher publish theme on one website and share the same theme on another website.will those wwebsite comes under panda effect?
I usually do not subscribe to threads such as this one, but I must say I have enjoyed the daily reads in my inbox since I first landed on this. Thank you for a good consistant read and valuable information being passed around which I find is much needed across the web in terms of SEO and understanding of SEO.
It’s possible that if the same content is published on different sites that Google’s duplicate content filters might show one page and filter the other out, regardless of the Panda upgrade from Google. That’s true regardless of whether the title is changed or not.
Thank you, James.
There’s so much misinformation on the Web about so many subjects, and that’s true about SEO as well. That’s partially why I like looking at sources like patent filings, because at least they are directly from the search engines themselves.
If I was Google I would spill the name of some engineer who has nothing to to with that algorithm just to keep the SEOs of the world busy the next couple of years. It’s not like that Matt Cutts could not imagine that everyone is going to hunt down that Panda and former documents connected with that name. My best bet would be to look out for patents granted that could support the whole quality issue.
It’s possible that people might provide information in an interview as a red herring to divert attention away from a topic, or intentially mislead people who might be interested in digging deeply into the details behind an upgrade like Panda.
Fortunately, some of the things that Biswanath Panda has written do seem to be a good fit for other details that we’ve heard about the upgrade. But that doesn’t mean that I’m not searching elsewhere for more information, like in patent filings and whitepapers. I haven’t seen any directly on point, but if Google started working on the specific approach described in their blog posts in 2008 or 2009, it’s possible that we might see a relevant pending patent published at some point in the near future.
I have written a few newer posts on how Google might classify pages for navigational searches, and how Google might classify queries when deciding which data center to show results from that share a number of similarities with the Panda upgrade. The Panda upgrades were definitely the work of more people than just one guy with the name Panda.
Thanks for your response. You are absolutely right – Mr. Panda and the linked documents seem like a perfect match. But if Google is only as half as good in preparing a red herring as they are in all the other things chances are that they set up that matches as well.
Enough of conspiracy theory. You are right in another thing: Good old journalists investigate in several directions. And as far as I can jugde by reading SEO by the sea I think you are one of them. So keep up the good work. Looking forward to reading more of everything.
Thanks. Biswanath Panda and the Panda updates do seem like a good fit, but as you note, it really doesn’t hurt to look elsewhere either. I’ll be keeping my eyes open for more clues.
As a business owner who takes care of his own SEO, i find the latest update quite interesting for several reasons, bear in mind that over in Australia we only got the update a few months after the rest of the world so i was able to watch and take a few preventative measures to ensure all my content was up to scratch before the transition took place.
Essentially, i think this update is a small part of whats to come from Google in regards to how they choose to display search results. As the social web becomes a larger and larger part of our lives, i imagine that over the next couple of years this will become the primary factor of search based rankings. Google already introduced +1 and their own social thing (can’t even remember the name of it now, not a good sign) a few weeks ago. I think having missed out on capitalising on the early integration of social media, google is gearing up to take the next step in how our personal lives can and will be integrated into their contextual search results.
With regards to the Panda update, At the end of the day, anything that helps strip out the majority of spam from the internet is a good thing, and lets not forget, if spamming techniques are no longer effective, they will no longer be utalised (win, win). So expecting higher quality content to be ranked higher should always be a priority for google and the results in theory should speak for themselves.
Choosing to base your sole form on income on the whims of a large, unpredictable multi-billion dollar companies is never going to be smooth sailings and if you’re not 100% on the ball then with updates such as Panda you can easily find yourself in a very difficult position.
Thank you for your thoughtful comment. It’s good to hear that you were able to avoid any problems with the Panda updates.
I don’t think that Panda is so much about fighting web spam as it is about Google attempting to show higher quality pages near the top of search results by looking at a wider range of features associated with pages and sites than they had been in the past. The Panda updates are a mix of different metrics that focus upon different aspects of quality associated with a site. Some of those, like copying the content of another site without permission or license in an effort to be indexed by the search engine for that content are spammy. But, Panda also doesn’t like duplicated content that might happen because of flaws in a site’s architecture that causes the same content to be duplicated on multiple pages, or even content that may be duplicated with permission, such as sites that republish syndicated or wire sources legally or only use product descriptions directly from publishers or manufacturers or distrivutors, especially if those pages don’t add anything new in any way.
I suspect that Google will continue to increase the range of features that they do consider, including using social signals, or at least ones that they might have more control over, like Google + interactions, where there’s a Google account associated with every plus post that Google can analyze anyway that they want. I suspect that they will be creating some kind of reputation or author rank to go with each person who participates in Google +
It’s never a good idea to focus too much upon any one channel to deliver consumers to your door or inbox or phone, but I agree that if you’re going to make a search engine like Google a strong part of how you market your business, then you do need to learn as much as you can about how the search engine works or get some help to do so.
I took a look at Dr. Andy Williams panda reports and I think he is one to something there. Where he really talks about themeing. But I also think it’s getting all the variables right for Google. Proper use of h tags, good coding, trusted backlinks and more. I have been using themeing or lsi in for the last year or so and none of my websites have been hit by Panda thus far.
I looked, and the latest newsletter report by Dr. Andy Williams that mentions Panda really has nothing at all to do with Panda. It’s pretty much just a list of related words on the top 5 pages for two different queries. Yes, the pages include related terms, which could easily be explained any number of ways, including the possibility that Google might be using a phrase-based indexing approach.
But the idea that a marketer who has no access to the whole Google search index corpus could produce an “LSI” friendly document is pure and total snake oil.
Comments are closed.