How credible is your website? How likely are people to believe what they find on your pages, or contact you to learn more about what you offer, or conduct a transaction on your site? Would you consider your site to contain high-quality content? How do you measure the quality of the content on your pages?
Search engines seem to be placing more emphasis on web site quality, such as with the recent Panda updates at Google, as described in a couple of blog posts on the Official Google Blog:
- Finding more high-quality sites in search
- High-quality sites algorithm goes global, incorporates user feedback
If Google is now looking at the quality of content on pages as part of what they consider when showing pages in search results, just how do they calculate the quality of pages?
Measuring Quality a Work in Progress
From 2005 through 2009, academic and industry researchers held a yearly workshop focusing upon webspam or Adversarial Information Retrieval on the Web. The workshop series was known as AIRWeb.
For some reason or another, a workshop wasn’t held in 2010, but the workshop reemerged in 2011 at the 20th International World Wide Web Conference in Hyderabad, India in a joint presentation with WICOW, (Workshop on Information Credibility on the Web).
Held on March 28, 2011, the Joint WICOW/AIRWeb Workshop on Web Quality (WebQuality 2011) was aimed at a large group of topics, which are listed on the workshop’s website. The main themes and topics from this web site quality conference included:
Assessing the credibility of content and people on the web and social media
- Measuring quality of web content
- Uncovering distorted and biased content
- Modeling author identity, trust, and reputation
- Role of groups and communities
- Multimedia content credibility
Fighting spam, abuse, and plagiarism on the Web and social media
- Reducing web spam
- Reducing abuses of electronic messaging systems
- Detecting abuses in internet advertising
- Uncovering plagiarism and multiple-identity issues
- Promoting cooperative behavior in social networks
- Security issues with online communication
There are many papers listed and linked to on the workshop webpage, but they don’t cover the whole range of topics listed on the agenda for the workshop. The topics and subtopics listed are ones worth considering and raising questions about if you own a website and participate in social networks and other places on the Web.
Google’s Web Site Quality Guidelines
One of the resources that I like to point people towards when talking about the credibility of Websites is the Stanford Persuasive Technologies Lab, which published the Stanford Credibility Guidelines in 2002. The guidelines are based upon a joint study conducted with Consumer Reports Webwatch, titled How Do People Evaluate a Web Site’s Credibility?
While I think there’s considerable value to both those guidelines and the study behind them, they are almost a decade old, and they focus primarily upon credibility rather than quality. Credibility is an important aspect of the quality of a web site, but there’s more about quality than how credible people find a set of web pages.
Google has provided information on its pages about what they look at when they consider the quality of what they see online. That includes what they look for in advertisements and landing pages, and their Landing page and site quality guidelines are worth spending some time on, even if you don’t advertise with Google. There are three main aspects to those guidelines:
- Relevant and Original Content
Google’s Webmaster Guidelines also provide a set of things to consider when putting together a site, and tell us that, “Following these guidelines will help Google find, index, and rank your site.” Many of the problems that I see on websites can be resolved by paying attention to the guidelines that Google lists upon this page.
A few years back, I published a post titled How Google Rejects Annoying Advertisements and Pages, which described a Google patent that provided a way for the search engine to programmatically assess the quality of advertisements and landing pages. In that post, about halfway down, I bolded one thought I had after spending some time with the patent:
This system could be used to assess other pages on the Web other than just advertisements, such as web page content.
With the Panda update, it appears that Google has come up with a way to automate assessment on the quality of web pages like they did for advertisements.
I wrote some more thoughts about the Panda update in Searching Google for Big Panda and Finding Decision Trees. One of the questions that I raised in the conclusion to that post was “How does Google define ‘quality?'”
We get some substantial hints from the Webmaster and landing page guidelines from Google. We could also look at sites online that we consider being quality sites, and see what they do to create that impression.
Other Resouces on Credibility and Quality
But, I think it’s worth going beyond those guidelines, and beyond emulating other sites that might be seen as being quality sites. So, I’ve also been looking for some other resources and information on the Web about credibility and quality, and thought that the following were interesting:
- Augmenting Web Pages and Search Results to Help People Find Trustworthy Information Online (pdf)
- Crowdsourcing credibility: The impact of audience feedback on Web page credibility (pdf)
- Customer loyalty in e-commerce: an exploration of its antecedents and consequences (pdf)
- Questioners’ credibility judgments of answers in a social question and answer site
- TIME: A Method of Detecting the Dynamic Variances of Trust (pdf)
- Explaining and Predicting the Impact of Branding Alliances and Web Site Quality on Initial Consumer Trust of E-Commerce Web Sites (pdf)
- Trust Online: Young Adultsâ€™ Evaluation of Web Content (pdf)
Web Site Quality Conclusion
Webspam isn’t a solved problem, and it likely won’t be in the foreseeable future, but the search engines (especially Google) have been receiving a considerable amount of criticism lately for the quality of content that appears in their top results for many queries.
While combatting spam still seems like an important aspect of what they do, Google seems to have broadened how they rank pages to include consideration of quality signals.
Much of what they consider may help answer the question that I raised at the start of this post, “How likely are people to believe what they find on your pages, or contact you to learn more about what you offer, or conduct a transaction on your site?”
78 thoughts on “Just What is Web Site Quality?”
As always a quality post.
One thing that i am bit curious and at the same time worried is this
“With the Panda update, it appears that Google has come up with a way to automate assessment on the quality of web pages”
Automating the quality of a web page based on a set rule(s) is a dangerous game.
As you said, “How does Google define â€˜quality?â€™”
What is a quality page to my readers need not be a quality page in the eyes of Google (the automatic bot).
A page that has received more than 40 to 50 valid comments from my visitors is and can be flagged as a low quality page. But that does not make sense when you take into account the 40 odd valid comments my users have put in. So in the eyes of my audience, whom i am targeting, that page is a high quality page with relevant information.
For the 1st time in the last 3 years, i am starting to question the way Google goes about with these algo changes and defining quality which in turn affects genuine business.
All said and done, i still find a lot of spam pages in the top 5 and brand websites who have gone out to milk big G purely because of their domain history and brand value in the name of creating quality junk pages.
Pretty interesting article, in a perfect world only sites who deliver value to users would be ranking. The good thing is that all these resources are free education to improve site performance for valuable site. Unfortunately spammers will always find their way around, hopefully Google will catch up cause I see lot’s of crap in serps here in Canada…
I always define quality in relation to something, like another website. you have to compare.
Google not only wants to deliver good search quality, but also good ads. so they must have their own definition of ‘quality’
i aggree with your point, that web spam is not solved, yet. and pretending it was, would be stupid. its sad that some do..
kind regards, s1ck
I think we’re a long way from an algorithm being able to truly identify quality content. I think Google is on the right track, but I still find numerous searches full of spam even after the Panda update. Rand’s recent example of searches on Propecia returning spam in the Top 10 results is case and point. I think it’s going to take a lot more research into semantics and getting machines to understand meaning before search engines can really identify quality. I mean, think about it..How can you discern the quality of a page without really understanding the meaning of the words that comprise it?
Personally, I believe that relevancy is key to quality. I think that this is one of the main reasons that Wikipedia is so high in so many searches.
I know that Google has their hands full in trying to weed out spammy sites. My hat is off to their efforts…and this post. Nice writeup as always, Bill.
A quality website or search result is one that is relevant, useful and has original content. A person looking out for information would not like to read a scraped paragraph of the original or a rehashed or rewritten copy of the original stuff. With the recent update I think that more quality stuff will rise to the top.
The issue I have with these guidelines is with the notion of “original” content. Very little content is truly original, nor should it have to be. Relevancy, transparency, and navigability, sure. But originality? Most content – whether on the Internet or in other media – is not original. Lack of originality doesn’t necessarily make content less useful or valuable. In fact, the opposite is often – if not usually – true. It’s not the originality of the content that matters to me, but the quality. I much prefer quality, accuracy, and clarity over originality that may have none of these – or any other – attributes.
I suppose that Google is measuring different types of quality.
I think you have at least:
1. the quality of the article (unique)
2. the quality of the context (links in and links out)
3. the quality of the grammar (Google knows a lot with translate.google.com)
4. the technical quality of the website (performance, validate, used tags, etcetera)
I think this is a layered model. When your article is excellent but the performance of your website is unacceptable, the total quality will be low.
Great post. Looking at those 2002 Stanford guidelines, it’s interesting that those are the same points that keep coming up in every post-Panda discussion.
Thanks for the list of additional resources – off to do some reading!
Thanks for sharing the information Bill, especially for transparency. I believe it will evolve the credibility, not only for for the blog/website but also for its contents.
The main problem is that people still don’t think about how SERPS come together / what makes a website rank better than another website. Most searchers mix up the position of a website in the search results with quality and Googles latest updates will definitely help exactly these people to get scammed less often. I mean, if you Google “make money” in Germany the second result is a website advicing you to test the latest roulette strategy on some scammy online casinos… hopefully, Google will kill these sites in future!
Quality is very subjective, even for us humans. During my days as an SEO for Suite101.com our editorial team prided itself on high quality content in comparison to WiseGeek.com or eHow.com. But in the end, Google deemed the site as a low quality site which resulted in a -90% drop in search traffic.
So if humans can’t agree on what quality is, how can a bot?
Great article, definitely and interesting read. I do believe that ‘quality’ can be very subjective. Everyone can perceive that as something different from anyone else.
Thanks for taking us back to the foundation of why we should be online in the first place: providing value via quality content. I look forward to reading through the links you mentioned.
Out of curiosity, would you happen to know how many people there are on the Google search quality team?
I think the whole hangup on original content is a distraction too. To massively generalise, most written content falls into two camps; fact and opinion. For instance “iPhone 5 just released” (fact) and “iPhone 5 is great but I’d rather use Android because it’s powered by lazer kittens”. Aggregating facts and adding opinion (e.g. HuffPo) seems to have been green-lighted by Google, however aggregating facts and wrapping them into a really useful interface (e.g. price comparison websites) seems to have fallen foul of Panda. I suspect the latter is deemed lower quality because Google think they are the best at aggregating and repackaging data (Shopping, Places, News).
um guys you do know what (Adversarial Information Retrieval) AIR is about its not realy about “quality” i always took it to be about sigint – dont forget where Matt C interned.
Good to hear from you. Thanks.
We don’t know how set those rules are, and whether there’s one set of rules for one type of site, and another set of rules for other sites. The interviews with Matt Cutts and Amit Singhal describe a process where a number of known high quality sites were explored and features on those, likely plus a number of other rules about quality were sampled to fuel an automated process that determines some kind of quality score for a page. It looks like that score was then used to rerank search results for different queries.
Google and the other search engines have been using automated algorithms to score pages based upon combinations of relevance and popularity/importance for years. Adding a “quality” aspect to that evaluation isn’t that big of a stretch. As long as there are mechanisms in place for evaluating the quality of results, even some involving manual input, adding a quality measure shouldn’t be more dangerous than anything else Google has been doing in the past.
Comparing quality with relevance might not be a bad idea. We speak pretty broadly when we talk about relevance, but if we look closer at the term, there are a number of different ways to define it. Relevance could be matching the keywords found in a query with keywords either found on a page or in anchor text pointing to that page. Relevance could be defined as a matching not so much of words themselves, but rather the concepts or categories behind them. Relevance could also be defined as providing an answer that meets a specific informational or transactional need. When I search for “ESPN” in a search box, I’m probably not looking for information about the network, but rather a link to the home page of the site. Relevance has many different facets to it.
When we talk about quality, we need to take into consideration the objectives behind a site, the audience the site is intended for, the appropriateness of different elements that might appear on a site, and the types of needs that site might fulfill. An ecommerce site selling books is going to have different indicia of quality than a site aimed at helping preschoolers learn English as a second language. A site providing financial news or another offering banking services are going to be very different than the book store or the educational pages. Reading levels are going to differ, use of SSL or trust badges may be appropriate on one and not the other. An address on every page of one site may be important, and not as meaningful on a different one.
Interesting point about social signals on blogs and sites that might incorport user generated content into what they display. Is a blog better because it has a lot of comments? In many cases yes, but then again there are some very well know blogs that don’t allow comments. (Seth Godin’s site allows trackbacks for instance, but no commenting.)
A rational skepticism is healthy and good for you. 🙂 Question what Google is doing, try to understand why they might do some of the things they do. Algorithms aren’t perfect, and they are based on (often) too human assumptions.
As long as there are efforts do things like rank web sites, and an economic or other value to the effort, there will likely be people trying to manipulate those efforts. Spam and “low quality” pages aren’t always the same thing, and the Panda updates aren’t focused as much on fighting spam as they are on trying to get higher quality pages to rank better.
Thanks. When you perform a search for a specific query, chances are that there are a good number of sites that may be relevant for that query, and you may see an estimate from Google that there are thousands or even millions of results. The challenge to the search engines is in trying to deliver the best sites they can at the top of that list, and that really isn’t easy, even if there weren’t some people attempting to spam or manipulate search results.
When I wrote about Query breadth a short time ago, Google’s patent on the topic told us that for some queries the differences between the information retrieval scores for the top 10 or 100 or even a greater number of pages might not be all that great, and that Google might decrease the value of the “popularity scores” that might be given to some of the top ranking sites so that those don’t get a tremendous benefit from being listed first. This way sites that might improve in terms of relevance have a better chance of moving to the top of results.
Here’s the rub though. The most relevant page for a search result might not be the best. A page with the single word “pizza” on it might be the most relevant result on a search for “pizza,” if you think about relevance as a matching of keywords to query terms. Not a very useful page, but a very relevant one. Add to that notions like looking at links between pages, on the assumption that important pages tend to link to important pages. That’s the thought behind PageRank, and in some ways it can help find more useful pages than a page that might be extremely relevant and extremely useless.
But, a page that has links to it from other important pages and is relevant for a query isn’t always going to the best page for that search. If other signals can be considered that might rerank results to bring us higher quality pages, that may not necessarily be a bad thing, at least from the perspective of a searcher.
I do think that some comparison is useful and essential. But I also think that there are some elements that a search engine might consider when looking for “signs” of quality where comparisons between sites aren’t necessarily helpful.
Unfortunately, web spam exists, and while the search engines are working on ways to remove it from search results, spam pages do show up. But, the Panda Algorithm seems to focus upon certain quality signals to rerank search results. That means that on a set of search results for a specific query, some of the pages that appear in those results might move up a few spots, and others might move down. This appears to be based upon the search engine looking at quality signals. For instance, does a page have a very large amount of advertising and a very small amount of actual content? It might or might not be spam, but that ratio of ads to content might be seen as a signal of lower quality.
Thanks. Unfortunately, a page that’s very relevant for a query may not be all that useful or important or of high quality (see my response to Charles, above). I think taking a step towards looking at quality signals is a move in the right direction.
The aim is definitely to get higher quality content to move closer to the top of search results.
I think there’s also some room in there to look for signals involving credibility and trustworthiness. I’ve come across articles on the web that were original, relevant and useful and gone off in search of other pages that I felt that I could rely upon better and trust more.
I think some of the language that we’ve seen from Google about originality of content comes from how easy it is for someone to take content from one site without permission or license, and reuse it else where. The search engines don’t want to deliver a set of search results that all contain the same content. There are also sites that aggregate content from other sources solely to rank well for terms contained in those sources to rank well solely for purposes of people clicking upon advertising once they arrive at those pages.
It’s also important that the creator of copyrighted content be properly attributed as the author or developer of that content, and if a search engine places a site that copies another (without permission or license) above the site being copied, I think that seems to be considered by many to be a point of failure on the part of the search engine.
One important aspect about “quality” is the ability to rely upon information that you find on the Web, and being able to learn about the expertise and trustworithiness of the person or people who created that content can be essential when it comes to credibilty. I’m not sure that we can separate credibility from quality.
Nice breakdown of different aspects of quality that a search engine might be considering. I would suspect that different features that Google might be considering may carry different weights with them, and that some signals of quality might be more important for some types of sites and some types of queries than others.
Just to play devil’s advocate a little, should transparency always be an essential part of “quality”? What about the anonymous blogger who writes great articles, uses compelling source material, persuades people to comment and act and get involved, who attracts tremendous amounts of links and discussion on other blogs and social networking sites and link curation sources?
Should that lack of the real identify of that anonymous blogger be considered when it comes to quality?
I really do like those Stanford credibility guidelines, and I think they’ve helped a lot of people to build more trustworthy and informative pages. The studies behind them, like the “How Do People Evaluate a Web Siteâ€™s Credibility” contain a lot of great ideas as well.
At least a couple of the papers that I linked to at the bottom of the post mention that people will sometimes consider that a page ranking well in search engines is a signal of the credibilty of that page. Unfortunately, that’s putting things backwards. Regardless of how well a site might rank at Google, the credibility and quality of a page should be considered independently of its ranking.
If you were to look at a lot of websites, and start making a list of the things you found on them that made them more reliable, more useful, more credible, I’d imagine that you would start seeing certain things appearing over and over.
I remember an English class that I took in college that was a mixed course with undergraduate and graduate students (deconstruction of literature), and the professor asked us to raise our hands if we didn’t think that his test questions were objective. I raised my hand, and then looked around to see how many of the other 30 or so students in the class had their hands raised as well. I was the only one. 🙁
The professor smiled, and asked me to explain why I thought the way that I did, and his smile got larger as I answered. I said that he was determining what was important, and what wasn’t. What was essential for us to learn, and think about, and how to approach it. His approach might be different from other professors teaching similar classes in other schools, and that there’s always an element of subjectivity to any objective approach. I think he liked my answer. 🙂
Google has given us some guidelines on what they believe to be quality when it comes to websites, to landing pages, and to advertisements. Some of those are pretty broad, but it’s a start. Those may not be “objective” in that Google is building those guidelines on their own assumptions, but many of them are based upon creating a good (or better) experience for people using the Web.
Part of why I included some additional resources in the post is because there are other ways that may be used to create quality content.
There is subjectivity in assessing quality, but chances are that there are also things that we can look for that make some pages higher quality than others. What things do you think might indicate that one page is of higher quality than another?
Hi Chris R,
Not sure how many people are on Google’s search quality team. I know that team includes the Web Spam team that Matt Cutts has been heading, and during a visit to the Googleplex (at one of the Google Dances they used to hold), there was a table of about 8-9 of the webspam team available to answer questions for us. Amit Singhal was (and may still be) the head of the Search Quality Team (though he’s usually referred to as a “Google Fellow” these days), and that team includes a number of other people as well.
Interesting thoughts. I know that Google is interested in finding and indexing unique and novel content, and adding opinion to fact is one way to generate that kind of content. A white paper from Google that I found interesting is Detecting the Origin of Text Segments Efficiently (pdf) because it tries to describe not only how interested Google is in finding full copies, but also partial copies, where someone may quote something from elsewhere and Google wants to know what the original source was.
One of the reasons that I mentioned AIRWeb in this post is because it was held in conjunction with WICOW (Workshop on Information Credibility on the Web), in a joint workshop called WebQuality 2011. I found that interesting because I think it shows an overall interest by academics and search engines in finding ways to uncover higher quality pages on the Web.
“Uncovering distorted and biased content”
What does that actually mean? I hope it doesn’t in any way mean that people don’t get credit for stating their own opinion or referring to a particular stance on something and not giving equal coverage to something they don’t stand behind or they get (no points) so to speak..
It always amused me that one Woorank having a “reading level” of higher than “kid” was thought to be a bad thing – surely not, Google! Surely you’re not playing to the sub-clinical illiterate masses and encouraging low bro use of language? Thankfully it is called Woorank and not Googlerank 😉
I agree at an overarching level there are really easy to pick up signals of poor quality. Blinking texts and on-load midis are on top of my list 🙂
The areas where it starts going down a slippery slope is when most everyone addresses the basic signals of quality and getting down into the text.
But one thing is for sure, if Google requires quality content above a third grade level, you’ll be sure to start seeing SEOs using spell-check a lot more 🙂
That’s one of the topics from the WebQuality 2011 workshop that I linked to above. What I found interesting about the workshop was really their choice of a name for it – WebQuality 2011. It’s an odd and interesting name because it joins together two previously distinct series of workshops on online credibility and web spam. The organizers of the workshop include people from Google, Yahoo, and Microsoft and a number of Universities.
They do have a list of subtopics under that heading, but they don’t spell them out in much more detail
Uncovering distorted and biased content
– Detecting disagreement and conflicting opinions
– Detecting disputed or controversial claims
– Uncovering distorted or biased, inaccurate or false information
– Uncovering common misconceptions and false beliefs
– Search models and applications for finding factually correct information on the Web
– Comparing and evaluating online reviews, product or service testimonials
You might want to be able to understand if certain things, like reviews, have differing opinions. For example, on Amazon.com when you look at a review page, they will show one of the most positive reviews on one side of the top of the page, and one of the most negative reviews on the other side. Being able to show you the most positive and the most negative together like that makes them appear much more credible.
There are also people who sometimes pay money to have biased reviews published in different places, and researchers might want to discuss ways to detect whether or not something like that is going on. That would be an example of biased and/or distorted content.
Yes. 🙂 Using the same reading level to gauge the quality of different pages might not be a good idea because those pages may be geared towards audiences where those reading levels should definitely be different. But then again, a site for young children shouldn’t be at a college level and a site on financial news shouldn’t probably be at an elementary school level.
Those are good points, and I think one of the biggest causes of confusion over the Panda updates. How do you define quality? Once you’ve defined it in a meaningful way, how do you decide which things are signs of good quality or bad quality?
For example, I think Google would be hardpressed to say that advertisements on a page are a sign of low quality. But, once you end up with pages on a site that have substantially more advertisements then content, those pages might not be very high quality. How do you define the point where those ads might go from being a positive to becoming a negative?
The search engines don’t go too much out of their way to tell us which things might make one page rank more highly in terms of relevance. They do give us some guidelines when it comes to “quality” on the Webmaster guidelines page, but not too much there either. So, perhaps if we focus on creating great user experiences, building very credible sites, and avoiding appearances of things that might be considered spam, we’re probably a long way towards creating high quality pages.
I generally regard content as being low quality if I can produce the same level of content fast and cheap by outsourcing, typically off shore.
I still come across a plenty of that type of content in the Google search results.
“But then again, a site for young children shouldnâ€™t be at a college level and a site on financial news shouldnâ€™t probably be at an elementary school level.”
Thus the safe bet would be that the Ivy League PhD’s over at the Googleplex have somewhat more sophisticated ranking signals re: text readability level than the Community College boys over @ Woorank? 🙂
I would like to thank you for writing on a subject which is very vague but still need to be constantly addressed. Quality is what Google is after and whether Google can see what quality is today or even if it doesn’t and if real quality could be more understood some day in the future- is less important. What important is the fact that websites owner do need to understand and follow consensus quality guidelines how they change and how they might change according to some logic.
I do believe that the concept of good navigability of website is not so tough to understand. Google gives strict guidelines in Google’s webmaster guidelines. What is interesting to me is finding sources that suggest how search engines test and evaluate genuine and authentic content..
I personally think that a machine at the moment cannot determine for sure if content is quality or not. A machine cannot recognise if an article might be fun for users to read or have a new exciting facts!
It never stops amazing me how google manage to come up with new inventions to combat problems they face. I would suggest that if google want to cut down on bad quality sites then they may want to consider what they advertise. Many dodgy debt companies advertise with google, these site are of no quality to someone in need of debt advice.
I agree with Craig, it seems counter-intuitive to go to all the effort of eliminating content farms when the many of the ads readily available alongside your “high quality” search results are so sketchy…
Bill, as always very insightful stuff. I do have a general question though – you dedicate much of your efforts to superior content (as we see daily); is the choice to go with the url structure: https://www.seobythesea.com/2011/04/just-what-is-web-site-quality/ (that’s to say the numerics after the dot com forward slash) a conscious choice? Keep on keeping on!
I think you hit the nail on the head with mentioning Google’s broadening of the factors they use to rank websites. This is key as if there are only a couple of main factors, it’s more easily gamed. If they can intertwine factors, broaden them, and enforce them, we’ll see much more webspam. It seems as though that’s what they’re doing. It looks as though this is going to be a perpetual fight as each update someone will find a new way to game the system. I like their implementation of allowing users to report webspam. Definitely a help.
Very comprehensive. I agree to Marc “Quality is very subjective, even for us humans”. Thanks for the share. I look forward to reading through the links you mentioned. Loved the read.
Interesting indeed. Automated Quality Identification will get better as time passes and I’m sure Google is investing a lot. This is what will make a difference. Schematics play a role and AI (artificial intelligence) is also a part of the equation. And the task is so complicated, since:
a. we are talking about many languages
b. Quality is not always measurable
I think (b) is the most difficult part.
If you can not measure it, it is difficult to tackle automatically.
IMHO, when creating content and web sites, human quality (even as subjective) must be accomplished. Machines will follow.
It’s starting to look like the way that content is presented on a page may play as big a role as the quality of that content itself.
The woorank score includes things that are pretty immaterial to SEO, such as Dublin Core meta tags, and meta keyword tags. I disagree strongly with a number of the other signals that they include, and their reasons for including them. I would say that the things Google considers are probably upon a much more sophisticated level.
You’re welcome. I spend a lot of time looking at patents and whitepapers from the search engines because they tend to be the best source of information on topics like “how search engines test and evaluate genuine and authentic content.”
A machine may not be able to do the things that you suggest, but it might be able to find indications of how human visitors to pages respond to those, including how much time might be spent on any one page, whether or not visitors view more than one page, print pages out, link to them, bookmark them, how far they might scroll down a page and more. That can be in response to specific queries that the search engines may track, sessions of multiple queries, links followed to those pages, and more.
Chances are that a search engine might be making some assumptions about how people respond to pages that they visit to make value judgments about the quality of content found on those pages. They may also be making assumptions about features found on pages that are similar to other pages where the same features appear, and where they have more data about how people respond to those pages.
Hi Craig and Kentaro,
I agree with you that some of the sites that use Google advertising aren’t necessarily of the highest quality. I think that’s also an area where Google is striving to improve. In my blog post that I mentioned in the post above, How Google Rejects Annoying Advertisements and Pages, I described a patent filing from Google which told us about some of the things that they were looking for when it came to advertisements and landing pages to assess the quality of those. Google has also been working upon quality scores for landing pages that mean that sites with lower quality scores pay more to advertise.
Google is fighting a battle on more than one front here.
Woorank doesn’t give any weight to the meta keywords (ie. it doesn’t change your score having them vs. not) – it is merely “there” – kinda like it is in most CMS’s….
Dublin Core doesn’t up one’s Woorank much either… I find it a useful general oversight tool for “gaping wounds” new clients…
My URL Structure is a conscious decision for a number of reasons. I do believe that the search engines can and do use words found in URLs of some value, and may be using them as a signal when doing a quick and dirty classification of pages, especially for purposes of determining what ads to show on those pages. See: Purely URL-based Topic Classification.
I also believe that it’s possible that Google may use keywords found in URLs in part of its ranking algorithm. But, with more than 200 signals that it might use, I’m not convinced that it is one of the stronger ranking signals. I believe that a lot more value is given to signals that are found on the pages themselves, such as the use of keywords in title elements, headings, main content sections of pages, in alt text, in image captions, and more.
I know that there’s potentially some gain in including keywords in URLs, but I’d honestly rather spending my time focusing upon creating engaging and interesting new content for my blog, in targeting new terms and developing new topics.
I can see the use of woorank as a triage service, but there are a lot of other signals that I look at that don’t seem to be part of what they include.
Good points. With every ranking signal that might be used in an algorithm to rank pages, there’s a possibility of an attack on that algorithm as well as noise that might not be helpful when viewed alone. The challenge that search engines face is in both anticipating potential attacks before they use a certain signal, and in recognizing that there’s a possible problem with one that they are using. Not every ranking signal is given the same weight, and there’s a good chance that a wider range of ranking signals may produce better results than a set that’s more limited.
Thanks. Objectivity vs. subjectivity is tough, and the highest quality results for one set of topics may be much better than the highest set of results for another topic. Still, I think there may be some baseline level of quality that might be identifiable that could help a search engine display higher quality search results.
Those are very good points. Thanks.
I’m not sure that it’s possible to provide an exact definition of what quality is, but I do think that it is possible to come up with signals that are more likely than not to indicate some level of quality.
Have to agree with Marc as well. Quality is a very subjective concept. Ultimately it is the end user who will decide if a site meets their quality guidelines. Algorithms go a long way but they will never be perfect.
Not sure that a search engine could ever be able to define “quality” itself in an automated fashion. Instead, what we may see is humans identifying a large set of “quality” sites, and then an algorithm used to identify features of those sites in common that seemed to indicate quality. How well would something like that work?
There would definitely need to be a lot of feedback and manual review of the “quality” of sites identified in an automated manner. Chances are also good that those quality scores given to sites would be compared to user-behavior data associated with the sites to see if sites identified as quality sites were ones that people seemed to be using, bookmarking, printing, returning to on a regular basis, and so on.
I could very easily see residence time being a key factor in determining “web-page” quality as well as bounce rates. Personally, I have lessened my keyword research and started writing more from the heart as well as adding as many resourceful/quality outbound links to my blog posts as possible.
Thanks for the reminder, Bill. We need these from time to time…:)
Quality is always an impossible to define metric regardless of the field. There are always obvious outliars on the bell curve that are clearly superb quality or spam. It is the 90% of sites that get caught in the middle of the bell curve where quality scores gets very subjective. Certainly scoring number of words, types of words/phrases, words that pertain to the page subject, links, etc. can be scored, but the weight given to each area is extremely important. An SEO teacher once told me to develop from your heart. If the page or approach feels wrong, then do not do it. The rest (visits, rank, etc.) will come if you are right.
Perhaps this quote from Google webmaster blog is pertinent…
“One other specific piece of guidance we’ve offered is that low-quality content on some parts of a website can impact the whole siteâ€™s rankings, and thus removing low quality pages, merging or improving the content of individual shallow pages into more useful pages, or moving low quality pages to a different domain could eventually help the rankings of your higher-quality content.”
To state the bleeding obvious…we really do need to understand what Google sees as quality and what isn’t.
Thank you, Mark. I still think there’s a role for keyword research when you are creating content for the Web, but it’s just as important to write something that people will find engaging, interesting, and useful. That they will refer others to, and will bookmark or save, and will return to see what else has been published.
Great thoughts. Thanks for sharing them with us. How does someone define value or quality, and don’t those change somewhat from one visitor to another? Is it possible to come up with scores for something such as Likeability? How would you measure it or define it. I haven’t a clue. 🙂 Passion for what you’re doing, and empathy for your visitors are the kinds of things that might get people coming to your pages over and over again.
To a degree, I’m not sure that it hurts to get a sense of the measuring stick that Google uses when it rank pages, especially when it tells you that it’s using a quality signal, and yet the pages that you’ve created that are very readable and engaging, and written by experts in the topic it covers lose a high percentage of traffic. I’ve seen this happen with some great sites, and having some sense of what Google is basing its decisions upon that will decrease traffic to sites like that is important.
In the past I was originally taught to focus on getting the keyword density as a main priority, getting that right tags and building a site that would be ridden with targeted content and keywords. Then over time I learnt that this sort of approach can potentially ruin the (UX) that a user may have on my site.
More recently Iâ€™ve come to terms that quality is the driving factor in how we look at websites. While making appropriate use of the right keywords is still a prominent benchmark in developing content, having a creative mindset and a good use of language is now equally important.
Keyword density always seemed to me to be more of a myth that was promulgated by people who made SEO tools than anything else. Of course it helps to include the keywords that you want to target on the pages those are being optimized for, and the frequency that they appear on those pages can play a role in how relevant a search engine might find the page for those terms. But, many toolmakes tried to define a “keyword density” based upon looking at high ranking sites for terms, and determining a “density” of those keywords compared to other words on those pages. What those tools often ignored was the value of those terms in links pointing to those pages, and the value of the PageRanks for those pages as well.
Google has been using the term “quality score” for more than a few years now to mean the overall mix of relevance and importance and other factors that determine the rankings of pages in search results, and with things like Panda looking even more closer at other signals, quality becomes even more important. As you note, a “creative mindset and a good use of language” are more important than ever.
What a Double Whammy thread!! Great post followed (naturally!) by discussion. We can see Bill practicing what he preaches about unique, practical, and passionate content drawing eyeballs and attention more than the over rated tools.
God Bless Bill Slawski.
I have been studying SEO and have had lots of training. Literally everyone says that content is king with Google. However, I found a site on the top of page one with NO content. It is a WP blog someone set up and then left with no content. They must have done it as an experiment to see if they could get page 1 Google ranking without content. Interesting. Everyone has a different opinion about what Google wants.
I try to stay away from cliches like “content is king,” because it’s just not true. Google looks at a wide range of signals when ranking pages.
It is possible for a page to rank well based solely upon links and anchor text if it has enough. But it can be a lot easier if you also have some content on your pages. 🙂
The issue I have with these guidelines is with the notion of â€œoriginalâ€ content. Very little content is truly original, nor should it have to be. Relevancy, transparency, and navigability, sure. But originality? Most content â€“ whether on the Internet or in other media â€“ is not original. Lack of originality doesnâ€™t necessarily make content less useful or valuable. In fact, the opposite is often â€“ if not usually â€“ true. Itâ€™s not the originality of the content that matters to me, but the quality. I much prefer quality, accuracy, and clarity over originality that may have none of these â€“ or any other â€“ attributes.
Comments are closed.