Duplicate Content Issues and Search Engines

There are a number of reasons why pages don’t show up in search engine results.

One area where this is particularly true is when the content at more than one web address, or URL, appears to be substantially similar at each of the locations it is seen by the search engines.

Some duplicate content may cause pages to be filtered at the time of serving of results by search engines, and there is no guarantee as to which version of a page will show in results and which versions won’t. Duplicate content may also lead to some sites and some pages not being indexed by search engines at all, or may result in a search engine crawling program stopping the indexing all of the pages of a site because it finds too many copies of the same pages under different URLs.

There are a few different reasons why search engines dislike duplicate content. One is that they don’t want to show the same pages in their search results. Another is that they don’t want to spend the resources in indexing pages that are substantially similar.

I’ve listed some areas where duplicate content exists on the web, or seems to exist from the stance of search engine crawling and indexing programs. I’ve also included a list of some patents and some papers that discuss duplicate content issues on the web.

Where search engines see duplicate content

1. Product descriptions from manufacturers, publishers, and producers reproduced by a number of different distributors in large ecommerce sites

When more than one site sells the same products, they often use text from the manufacturer or producer of the product as a product description on their pages. Add to that the fact that the name of product and the name of the creator, manufacturer, writer, or recording artist may also be on the page, there may be a considerable amount of content showing up on the web on pages that aren’t related to each other but offer the same products.

2. Alternative print pages

Many sites offer the same content on different pages that may be formatted for printers. If the site owner doesn’t use robots.txt disallow statement or a meta “noindex” tag on these pages to keep search engines from indexing them, they may appear in search engine indexes.

3. Pages that reproduce syndicated RSS feeds through a server side script

When RSS feeds from sites are shown on other pages in addition to the pages of the site where they originally appear, and that text is displayed using a server side include that presents the information as html on the pages, then it could appear as duplicate content on those other pages. When feeds are shown using client side includes, such as java script, there is much less likelihood that a search engine will pick up that content and index it.

4. Canonicalization issues, where a search engine may see the same page as different pages with different URLs

Because search engines index URLs rather than pages, it’s possible for them to index the same pages that is presented different ways. A “canonical URL” is one that is determined to be the “best” URL for a page, but search engines don’t always recognize that the same page is being presented multiple ways. For example, the following URLs may all point to the same page:

http://www.example.com

https://www.example.com

http://www.example.com/index.htm

https://www.example.com/index.htm

http://example.com

https://example.com

http://example.com/index.htm

https://example.com/index.htm

5. Pages that serve session IDs to search engines, so that they try to crawl and index the same page under different URLs

Some sites serve information in their URLs to track visitors as they go through the pages of a site. If this type of tracking information is provided to search engine crawling programs, then those programs may index the same page under different URLs, repeatedly. See, for instance, http://www.sears.com

As the Google Webmaster guidelines tell us:

Allow search bots to crawl your sites without session IDs or arguments that track their path through the site. These techniques are useful for tracking individual user behavior, but the access pattern of bots is entirely different. Using these techniques may result in incomplete indexing of your site, as bots may not be able to eliminate URLs that look different but actually point to the same page.

6. Pages that serve multiple data variables through URLs, so that they crawl and index the same page under different URLs

Some sites show different data variables in their URLs. In this instance, an example shows this well:

http://www3.jcpenney.com/jcp/Products.aspx?

DeptID=469
&CatID=29841
&CatTyp=DEP
&ItemTyp=G
&GrpTyp=SIZ
&ItemID=0e273be
&ProdSeq=2
&Cat=tees+%26+tanks
&Dep=
&PCat=
&PCatID=28237
&RefPage=ProductList
&Sale=
&ProdCount=32
&RecPtr=
&ShowMenu=
&TTYP=
&ShopBy=0
&RefPageName=CategoryAll.aspx
&RefCatID=28237
&RefDeptID=469
&Page=1
&CmCatId=469|28237|29841

It’s possible for a search engine to try to index the page above with all of those data variables in different orders.

7. Pages that share too many common elements, or where those are very similar from one page to another, including title, meta descriptions, headings, navigation, and text that is shared globally.

This is a frequent problem for large ecommerce sites that insist on having their brand name, and information about that brand in every title on every page of their site, and use content management systems that don’t allow them to have distinct meta description tags for each page of their site.

8. Copyright infringement

When someone duplicates the content on your site, it may cause your pages to be filtered out of search results. A site like copyscape may help you find some of these pages. Searching for unique strings of text on your pages, in quotation marks, may help uncover others.

9. Use of the same or very similar pages on different subdomains or different country top level domains (TLDs)

Using different subdomains and different top level domains for the pages of your organization may be a nice way to create different brands, or focus upon different kinds of content, services, or products. But duplicating content from one to another may create the risk that some of your pages don’t get indexed by search engines, or are filtered out of search results. Again, from the Google Webmaster Guidelines:

Don’t create multiple pages, subdomains, or domains with substantially duplicate content.

10. Article syndication

Many people create articles, and offer them to others as long as a link and attribution to the original source is made. The risk here is that the search engines may filter out the original article and show one of the syndicated copies.

11. Mirrored sites

Mirrors of sites used to be very popular, for when a site became so busy that people could use an alternative source to get to the same information or content. Larger sites that might have used mirrored sites in the past often use muliple servers and load balancing these days, but mirrors do still exist (and the wikipedia has a nice article about mirrors explaining why). Search engines may be able to recognize duplicated URL structures of mirrored sites, and may ignore some mirrored sites that they find.

Patents Involving Duplicate Content

Papers About Duplicate Content

Conclusion

If you are having difficulties with the pages of your site showing up in search engines, or if they show up as supplemental results or they seem to be disappearing from the index of a search engine, this is one of the areas to explore fully to see if duplicate content issues are harming the pages of your site.

Share

157 thoughts on “Duplicate Content Issues and Search Engines”

  1. Excellent post Bill, one of my clients’ site has seen his number of indexed pages drop from 20,000 to 50: the reason? I realized that was using the same manufacturers’ content on another site.

    I recommend every ecommerce store to avoid texts provided by their manufacturers, it’s obvious that dozens of other online stores are so lazy that they will use the same.

    Instead you should make sure to write your own product description, get customer reviews, write product comparisons etc. Google is being so good at noticing duplicate content that it’s now almost impossible to rank well if your page doesn’t provide unique content?

  2. Thanks, Nadir,

    It can be a lot of work to come up with unique product descriptions, especially on a large site, but a failure to do so can be costly.

    Customer reviews and other user generated content can not only be helpful to a site owner, but also other visitors to the site.

  3. Great Information,

    I find that number 6 is a difficult one to fix because it goes back to the structure of the database (and normalization). It also requires site owners to re-trace every path to see where the duplicate content issues reside.

    Search engines and their crawlers are smarter than ever before so I hear about this type of penalty all the time.

  4. Click the link for June 11th in the calendar on the upper right of this blog then click on the archives links of any month in the sidebar. You can remove these duplicate elements from a WordPress blog and put them all on a sitemap, sitemaps can make sense to humans and bots if done right. Think minimal, love this post and topic, thanks!

  5. Hi Aaron,

    I had considered removing the calendar to avoid duplicate content problems, but decided that people might actually use it to navigate around the site.

    I used a disallow in the robots.txt file on those types of pages instead. Was curious as to how the search engines would treat the pages of the site by doing that.

  6. yes I’d like to know that too. What if you do have duplicated content for good reason such as syndicating it? Can you disallow all but one copy of this content if it all resides on your server?

    Thanks for contributing this aricle.

  7. Hi Allen,

    You’re welcome. It’s a little hard to tell exactly what you are asking without too much of the context spelled out, but I’ll give it a shot.

    You have some content on pages of your site that you are also using xml or RSS to show on either other pages of the same site, or subdomains with the same domain, or under other domain names entirely, but they are all on the same server.

    Depending upon how your folders on your site might be set up, you could possibly disallow pages in those folders without too many problems.

    Keep in mind though that doing so would be telling search engines not to visit the pages in those folders, and index the content upon them.

    Todd mentions a few other possible approaches at the link that I posted in comment number 4, such as converting text to images. Again, those solutions would keep the content on pages from being indexed, though maybe just the content that is being syndicated, and not all the content on those pages.

    You could also use javascript to show that syndicated content instead, which search engines shouldn’t index.

    You may want to make sure that there is some indexable and unique content on each of the pages that you are displaying syndicated content upon, if you take steps like that, so that search engines do index those pages.

    If the only content appearing on the pages where the syndicated content exists is the syndicated content, and common navigation and header and footer features, you may still have issues with duplicated content.

    If this syndicated content appears on subdomains or new domains, and there is a fair amount of other unique content on other pages upon those domains and subdomains, and the linking structures of those sites are different from each other, it’s possible that you may more likely see content filtered from search results rather than pages not being indexed at all.

    But, you may not have any control over which pages appear in response to a search, and which pages are filtered.

  8. Pingback: Monthly meta-post for June » Search optimisation and standards compliancy » highly-visible
  9. Pingback: Andrew Halliday - UK SEO and Web Application Builder » Blog Archive » 301 Redirects and Multiple Domains
  10. A very useful and informative information.
    My problem is that I run a site on taxation matter where most of the information are like govt notification, circular and court judgement . There are about 9 to 10 sites who are using the same resources and these data can not be changed also . (http://www.indiantaxsolutions.com)

    I would appreciate if any body give me some in sight in the issue.

  11. Hi Tarun,

    That information may not be unique, but chances are that your interpretations of them will be. That’s one of the best ways to provide value to the readers of your site.

  12. I want that my URL without trailing slashes like “www.example.com/category” should be rewritten to include it: “www.example.com/category/” with the / at the end, cause now google is indexing my content two times ex:

    “www.example.com/mypost”
    “www.example.com/mypost/”

    I have this code on my .htaccess now, but Have no results…

    CODE
    RewriteEngine On
    RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC]
    RewriteRule ^(.*)$ http://www.example.com/$1 [R,L]
    RewriteBase /
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /index.php [L]

    Can any body help me with this issue, I´m ussing wordpress.

    Thanks.

  13. The duplicate content on the same site issue is sometimes completely valid. As users site search technology gets better, this dup content thing will need to be fixed.

    For example…take software like Endeca/Mercado etc that do guided navigation. The user can pick any data point to filter by, in any order. This means you naturally have different urls with the same content.

    /search/brand/Compaq/color/red/
    /search/color/red/brand/Compaq/

    The user take take either path, and get to the same data. Should the site be penalized for using friendly urls here? Absolutely not.

    /search?brand=Compaq&color=red
    /search?color=red&brand=Compaq

    This is even worse since search engines hate query string for the most part.

    In either case, no malice has taken place and there’s not reason a site should be banned form an index just because it’s using friendly urls and good search tech.

  14. Hi Chris,

    It’s not so much a matter of malice or banning or penalizing anyone as much as it is the allocation of resources by the search engines to try to index as many unique pages as possible. Besides, it’s much more likely that a search spider will just not index as deeply as it could when it comes across pages like that – it’s not a ban on a site, but rather the following of a number of importance metrics when it comes to indexing pages that often determines whether or not many pages will get indexed by a search engine.

    The ideal situation regarding URLs like that is to try to have only one URL per page even if people can enter through different ways. The search engines are trying to solve the problem. See:

    http://www.seobythesea.com/2006/09/solving-different-urls-with-similar-text-dust/

    for some good examples. One of the authors of that paper is now at Google.

    But, I’m not a big fan of waiting for search engines to try to figure something like that out. If you have control over your site, and can use some kind of middleware or data bridge between your data and your web pages that makes sure that any request for those URLs only sees the data variables in one order, that really is the best approach.

  15. Thanks for comprehensive information. I was thinking to duplicate my content to a new subdomain with the intention to move it eventually. So, I should not do that. I should redirect the whole subdomain instead, but my control panel (cPanel) does not work properly for this.

  16. Yes, I actually think that duplicate content can do some harm to your site. So can too much content. One of the best things to do, is ensure your site is categorised well, and ‘block off’ certain parts of the site from Googlebot’s and Search engine Spider’s scanning process. Therefore, you make only certain pages SEO because those are the pages you want the search engine to pick up. The rest you can focus on the user even more (and those pages you should blog off from the search engine spiders.)

  17. Pingback: Road-Entrepreneur.com » Blog Archive » Yahoo vs Ask vs Google - How Search Engines Work
  18. I am planning to create an affiliate program where same content will be replicated by many websites.

    Now, will it help if I create a page to be embedded within other site with a logo/link: Provide by Website.com

    Above link will in a way tell search engines/users that this content comes from website.com.

    Will this help?

    Rajat

  19. hi,

    i was wondering if changing a blog’s permalink structure can affect duplication (and penalty, if I may add). I used a pludin to do a permanent redirect to pages with old permalink structure but I see that some pages are duplicated in Google search results. Please send me an email for you reply if you have time.

    thanks you so much!

  20. Hi Jessie,

    The use of a permanet redirect should be helpful in telling Google which pages are the ones that you want indexed, so that’s a good step.

    If you’ve done that correctly, there probably shouldn’t be a concern about a “penalty” of any type. It’s probably not likely that Google would penalize your site, though you probably want to mitigate the risk of any potential problems when the search engines index your pages.

    You may want to do that by checking the pages of your site using something like Xenu Link Sleuth to make sure that you don’t have any links on your own site that are using the old versions of the links, and change those around to the new versions also.

    And, if there are links on the Web that you have some control over, like in directories where you can change the URLs, if you have any there that are pointing to the wrong versions of the URLs, you may want to change them to the new versions.

  21. Hi Bill, I have been checking a Site’s Structure and have found duplicate content issues related to a CMS that displays the same content for different URL’s.

    Can the Site command in Google: site:www….. be used as a prove of a Site’s health? I have noticed on the Site I’m working on that Google displays some results and at the end displays the following:

    “In order to show you the most relevant results, we have omitted some entries very similar to the 367 already displayed.
    If you like, you can repeat the search with the omitted results included.”

    The total number of pages displayed by Google are 2270, but it only shows results of 367.

  22. The site command may possibly be a good first indication that there may be problems.

    You may want to look to see all of the URLs that Google lists as being in its indexed , to see if they are capturing some pages (at different URLs) more than once, and missing other pages.

    You can do the same thing at Yahoo, checking with the Yahoo Site Explorer.

    If they are, you may want to take steps so that only one version of pages at different URLs are being indexed.

  23. Hi William,

    I have two questions:

    1) If we define header, sidebars and footer as “template” and the rest in the middle as “content” what should the word count ratio between both be like in order to prevent duplicate content issues?
    Is 80% Template : 20% Content still ok (a problem I have in an online shop with some product descriptions which are to short …)?

    2) Is it still true that the googlebot dislikes pages that use many data variables in their URLs lie e.g.

    “www.example.com/?var1=abc&var2=def&var3=ghi”

    Is their a maximum of variables one should not exceed?

  24. Hi Dirk,

    I’m not sure that I’ve really seen any guidelines anywhere that expressly provided some type of ratio between original content and template or boilerplate.

    I have seen search engines cease crawling pages of sites where there hasn’t been much unique content from one page to another – maybe a step up from keyword insertion, but not too much.

    Short product descriptions can be a problem, as you point out. Some product search engines that crawl and extract information may be looking for the sites that provide the most information about different aspects of products, so lengthening the descriptions can help there.

    It’s tough experimenting to determine some kind of ratio when you have a site that contains a lot of product, too.

    About data variables, the search engines are claiming that they are getting better at handling URLs that contain a lot of data variables, but search engine representatives at conferences have been tending to caution people to limit the number of data variables in URLs as much as possible, with only two or three at the most. I try to avoid relying upon the search engines getting it right as much as possible.

  25. Hi Bill,

    I run an ecommerce business primarily based on eBay. I’ve been searching for a reason as to why my traffic has been lackluster compared to the numbers my competition has been seeing. Recently, I’ve really come to see the importance of SEO and I’ve been researching various articles about what steps I can take to make my store more search engine-friendly.

    Number 7 above seems to be one reason for my dwindling numbers. I say this because eBay has its own branding on every page, and because I run off a template when I create my listings — it describes shipping and payment policies as well as ways to get in touch with me. Only a few lines of the template are changed each listing Would number 7 encompass product pages located within the same URL (ebay.com) and could a template used across 600 or so listings result in some penalties when I’m being indexed by search engines?

    Thanks!

  26. Hi Chris,

    It’s good to hear that you are learning more about SEO and search engines – I think that’s helpful for anyone who is involved in ecommerce on the Web, and it’s the same reason that I first got into SEO.

    There are three main reasons why traffic from search engines might change – either you’ve changed something about your site, or your competitors have made changes, or the search engines have changed something. I know that might sound obvious, but it always helps to take a step back and look around before picking any one thing that might be a cause.

    If you haven’t made any significant changes, and it doesn’t appear that your competitors have either, then what you are seeing might have to do with a change at the search engines.

    There may be a number of reasons why you may not be getting as much traffic from Google as before. If you haven’t created a Google account at the Google Webmaster tools pages, that might be something that you could do as a first step. If you can follow the steps that they provide to “verify” your site and add an xml sitemap, they might provide you with some details about any errors that they find on your site that might make a difference.

    If you can add more unique content to those pages, and unique page titles and meta descriptions, that could help you avoid problems with duplicate content.

    If you want me to take a quick look, send me your URL through my contact form here, and I’ll see if there might be some other reasons why you might be having problems.

    Bill

  27. Really great article! From a high level perspective I knew a bit about duplicate content and mirror domains. But this really digs into the issue so much more and provides excellent insight.

    Thanks for this info!

  28. This is great information, I was really wondering about what problems would happen if I wrote articles and used the content that I was using on my site.

  29. Hi Brent,

    It is possible that if you write an article and placed it on your site, and syndicated it out to other places, that the search engines might only show one of the copies of the article in search results, with the others filtered out and no guarantee which copy would be the one showing.

    I try to avoid doing that, but if you do, it might not hurt to include a link back to your site to the original in the syndicated versions, so that people who view the article can find your pages to see more from you.

  30. Bill,

    very nice article…
    i hope you don’t mind linking this one to my blog with full reference.

    if you accept ping accept, you may probably notice.

    i am very keen with unique contents and search engines, so i was writing my article and on the half way, i was thinking about what are the duplicate contents vs search engine and your page intrigued me to read.

    fantastic read.

    thanks for sharing

  31. Hi Bill. You sound like a great guy. I love this article. I suspected this all along with our site. We have product descriptions that come from the manufacturer. Would you suggest we use the disallow tag in the robots.txt file during the time we re-write our thousand product descriptions? I wish I had known about this issue from the start! Thanks to your article though I’m going to get back to writing! Great site too I’ll be back often! :)

    – John

  32. Hi Linn,

    Thank you. It doesn’t hurt to consider the problems that you might have because the content you are using also appears in other places on the Web.

    Hi John,

    Thank you, too.

    It’s difficult to tell whether or not you should disallow those pages from being crawled. While you might be using product descriptions that many others are using, since they are straight from the manufacturer, it’s possible that your pages may be filtered out of search results rather than any other harm happening to them, if the search engines think that they are near duplicates enough of each other. I would recommend trying to rewrite those descriptions fairly quickly either way – don’t forget unique titles and meta descriptions as well.

  33. Pingback: The State of Duplicate Content | The Organic SEO
  34. Hi,
    First thanks for the article. I was looking for information about duplicate content issues and stumbled upon this post. It was very easy to understand and at least I got answers to my question.
    Can you use both robots.txt and robot tag?

  35. Hi Paul,

    You’re welcome. I’m happy to hear that this post was helpful to you.

    You can use robots.txt to disallow a search engine spider from crawling a page, and also use a robots tag, but if the search engine is obeying the robots.txt disallow statement, then it won’t crawl the page and see the robots meta tag.

    There may be a reason for using both when you first decide to disallow a page in robots.txt. Instead of checking the robots.txt file for a web site everytime a search engine crawling program visits a URL at a site, it will usually make a copy of the robots.txt file for the site, and then update that on a regular basis.

    So, if you add a page to your site, and add links to it from other pages, and disallow it in your robots.txt file, and the search engine has already come by and looked at the robots.txt file that day, it may look at your new page and start indexing it before it updates its copy of your robots.txt file. If you also have the robots meta tag in the head section of your page like this:

    <meta NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”>

    It won’t index the contents of the page nor follow links from that page to other pages. Chances are that the major search engines will get a new copy of the robots.txt file at some point (even within the same day or a day later), and know that it shouldn’t have crawled that page, but it may have sent the new page to be indexed already.

  36. Ya, I have had major problems with duplicate content while trying to bulk up my site with syndicated articles in the past. Now I make sure that all of my content is 100% unique! Wish I would have read this post before I found out the hard way that duplicate content is horrible!

  37. Hi Justin,

    I’ve seen duplicate content cause problems on a lot of sites. Wish you had found out the potential risks before it caused problems, too.

  38. I’ve been a regular visitor to your site for about a month now – did a quick search on Google for information on dupe content and landed here AGAIN! Thanks for the write up, it cleared up a few questions. Merry Christmas :-)

  39. the new canonical header tag released at SXSW in feb 09 may be helpful to someone who has multiple versions of the same page on their site (e.g. screen version, print version etc.). Add . Remember the final / is important – leave it in!

  40. Thanks very much, Jas.

    All things considered, I would rather fix duplicate content problems with URLs rather than use the canonical tag. Often there are more problems going on with pages than just canonical problems, so it’s worth addressing all of them. It is definitely worth mentioning here, and pointing to what the search engines have said about it:

  41. Its becoming quite a plague, I see 1000s of sites created purely for the purpose of building links for their customers websites. My client had 600 links that I had accrued for him and through natural processes, now he has paid an aforementioned company who get their customers links by making loads of similar/duplicate sites. He now has over 10000 links, but has vanished from google for many of his target keyterms.

  42. Hi Andrea,

    It’s sad to see that people are creating duplicate and near duplicate pages and sites like that solely for the purpose of trying to increase rankings. It does appear that the search engines are actively trying to keep duplicate sites like that from having an impact upon rankings, and in the case that you describe, using methods like that could be harmful rather than helpful.

    Do you think that there could be other reasons why his rankings may have disappeared from search results? Perhaps pages on his site that engage in a reciprocal linking exchange or something similar?

    One concern that I have about that kind of reaction from the search engines is that someone unassociated with a site, perhaps even a competitor, could engage in creating links to a site from duplicate and near duplicate pages in an effort to harm the rankings of that site. That practice was known as Google Bowling a few years ago, and it was thought that sites linking to you like that couldn’t harm you because of the potential for abuse by others unaffiliated with a site in any way.

  43. This can also be a problem if you run a forum, and people cross post. Another problem is if you run a wiki, and the history pages are indexed. This is why making proper use of robots.txt is very important to any webmaster. My website has been banned from Google for a month, because of these reasons, which is unfair. Luckily fixing and addressing the problems do work, and I was brought back in a month. Properly blocking session IDs and other long string URLs using robots.txt is encouraged, and making use of mod rewrite to redirect problem urls is also a good tactic.

    I don’t see any way around duplicate content when it comes to product descriptions, other than if you have staff, you can have them paraphrase the descriptions manually. But if you have a store with 10,000+ descriptions, this would be nearly impossible to do. Make sure you have a somewhat authoritative site if you plan on selling thousands of products.

  44. Hi Young Composers,

    Smart use of robots.txt, meta noindex/nofollow tags, the use of the new canonical value link elements, and other technical approaches can help stop problems with the indexing of pages on a site. Sorry to hear about the banning, but it’s good to hear that you were able to fix the problems that you had. I’m not sure that I’ve heard of many sites that were actually banned based upon duplicate content, but I hope that others who might read these comments take your story seriously.

    Using manufacturer’s or producers’ content for product descriptions on a site that contains a large number of products is really a problem, and it can really be something to overcome. While visitors’ product reviews might possibly help a little to provide some unique content, not every kind of product generates enough interest to have people comment and review products regularly. Regardless, it can help to introduce as much unique content as possible. And it might not be a bad idea to have less products that are more easily found than to have many products that no one can locate. Definitely something to think about.

  45. Hi Answer Blip,

    Thanks for your thoughtful comment.

    Search engines do look for duplicate content at every stage of what they do, from crawling pages, to indexing content from the pages, to displaying links to pages in search results.

    Filtering is something that usually happens at the display stage, when a search engine decides which version of content that it will show. There are sometimes problems with that choice, unfortunately.

    A search engine might also decide at the crawling stage or the indexing stage that content it comes across is duplicate, and may decide not to crawl or index pages that it doesn’t believe are unique enough. If the link structure and content on a site are vitually identical, as in mirrored sites, a search engine may decide to not include one version at all in its index, and not to waste time and resources crawling it. That’s not a “penalty” per se – in fact it’s worse than a penalty, especially if it’s the originator, or copyright holder, of the content being left out of crawls or indexing.

    Duplicate content issues may expand, but one of Google’s recent patent filings described a “duplicate content search” that it might develop to allow site owners to help themselves find when others are duplicating their content without permission or license. Another patent filing from Google within the past couple of years described how Google might allow authors to use authenticated IDs and special meta data to indicate when they have syndicated content, and where they’ve syndicated it to. Will Google implement these things? It’s possible.

    Competing with yourself for rankings by syndicating your content probably isn’t a good idea, and I wouldn’t recommend it. Writing articles for syndication that link back to your site, that you don’t publish on your own site can be a better idea if you want to do some linkbuilding. It is possible that links from such articles may be discounted if they are too prevalent on the Web, but it’s better to create a unique article than to syndicate one from your pages itself.

  46. I would say too that duplicate content is not a penalty, although it can feel like one, its simply a filter by the search engines to help them bring back relevant, unique, and helpful results based on a search query.

    I also think that duplicate content issues are going to become more of a problem as there are more and more web pages created, and the search engines will be more strict as time goes on when it comes to doing things like syndication of content.

    Most of syndication filtering comes when a website syndicates to a more trusted website, so be careful. I personally don’t like syndicating content for seo purposes because it creates competition for your content where their might not have been any, and I don’t think you should give up a primary advantage that you have, which is your own content.

    So lets syndicate and put a link back to the original on my site. I have seen this time and time again as a use for link building. The 2 issues with this; The first is that most people try to track the link and put a tracking id on it, this most likely will trigger the search engines to treat this like a marketing or affiliate link and discount it. The second issue is that usually syndication is done in mass and with the same anchor text and a site getting 10,000 links at one time could also trip the red flag and be seen as a marketing campaign and therefor those links are discounted.

    So in short it’s hard to implement a syndication strategy that is effective for seo without hurting yourself in the long run. Any thoughts? Ok taking off my SEO hat and putting one of my other hats back on ;)

  47. Hi Biz,

    I haven’t used those particular services, and I don’t really use article publication sites, so I can’t really discuss them in particular.

    If your articles will be seen in those locations, and bring visitors to your site, and possibly even a few links, it might not be a bad idea to submit a few articles.

    I would hestitate in submitting the same articles to an article site that I might also publish on my own pages, to avoid the possibility of having one or the other filtered out of search results. There’s nothing wrong with writing some closely related articles that might cover the same or very similar topics, that may also share a few keyword phrases, and that might both show up in some of the same search results.

  48. Hello, i have a similar problem.
    1. I have a domain called http://www.ventdepot.com and this domain contains the USA, Mexico and Peru store. (this is good for a global search)
    2. I want to ONLY duplicate the USA store content to “http://www.ventdepot.us”; ONLY the Mexico store content to “http://www.ventdepot.com.mx,” and ONLY the Peru store content to “http://www.ventdepot.com.pe” (This is good for a local search)

    My question is this spam?

  49. Hi Julio,

    Ideally in that situation, if you feel that you need to use different URLs for different locations, you would create unique content at each of the URLs, that is fueled by cultural and linquistic differences.

    I don’t think that I would duplicate the content and linking structure at each of the sites, even under different URLs. It’s possible that some of them might not be crawled by search engines since they might be seen as mirrors of each other (near duplicate content and linking structures).

    If they are substantially similar, and are crawled and indexed, it may be possible that the search engines will filter out one or more of the different sites in search results and only show one. But it might not be possible for you to dictate which version is shown to which visitors. That’s probably very unlikely.

    I don’t know if search engines would perceive the different sites as spam, but it’s quite possible that they would see them as duplicate content.

    I’m not also a fan of the use of a “.us” tld in an attempt to try to target people from the USA. I don’t believe that most people in the US recognize the URL as one that is specifically intended for people from the US. So many United States based URLs use “.com” that it’s often what people from the US expect to see. Given a choice between visiting a domain with a “.us” or with a “.com,” I would expect most people from the US would choose the “.com” version.

  50. Hi Aaron,

    Those are good steps to take to avoid some duplicate content issues. Many site owners don’t, and that can cause a potential problem.

    Sites that aggregate full RSS feeds and repost them without permission or licensing the content are violating the author’s copyright. Publishing an RSS feed doesn’t mean that you have given people license to republish that content, and sites that do that may see DMCA takedown notices sent to their hosts, or to the search engines.

    If a site is using RSS feeds in that manner, it’s possible that Google might penalize the site not on the basis of duplicate content, but possibly rather as web spam.

  51. This is why first thing I do is via htaccess or config files, redirect all non-www traffic to the full www. domain via 301. I also make sure all directories end in a slash or 301 to that, and use the canonical tag to help as well. The easier you make it, and the less pages involved that are all the same, and the better experience it is SE-wise and visitor-wise as well. Search engines are getting pickier about relevancy these days and similarities between pages though… what are you thoughts on sites that aggregate rss feeds and repost them – is that considered duplicate to google, and if so, if someone is doing it against your permission, is it your fault and should the original content owner by penalized?

  52. Hi,
    Thanks for the post. I am pretty new to SEO and I’m am not an IT expert, but have worked out how to re-write and upload all of my HTML page descriptions and titles so that each page is different.
    Only problem is that google is not recognizing the new content. The diagnostics in google webmaster tools is still listing the old duplicates even though it says it was updated 2 days ago?

    Any ideas?
    Thanks a lot.
    Martin

  53. Hi Martin,

    The Google Webmaster Tools can sometimes take a fair amount of time to reflect changes to a site. That time can partially depend upon how frequently Google returns to crawl and index your pages, though I’ve seen some changes take a number of weeks (and longer) to appear, even though I’m sure that Google’s crawler has visited a good number of pages, and the Webmaster Tools indicated updates a number of times during those weeks.

    The good news was that I saw my changes reflected in search results from Google long before those changes were shown in Google’s webmaster tools. That’s where you should be looking first.

  54. Thanks Bill, will keep an eye out on Google. Interestingly I’ve flown up Yahoo to page no.1 for my chosen keywords. No moves on Google as yet though… although I did jump 6 pages by simply removing a flash page!

    Thanks a lot.

  55. You’re welcome, Martin.

    Nice – the Yahoo results are encouraging. Patience is one of those traits that go well with SEO – changes in search results don’t aways happen as quickly as you might hope or expect them to. While you’re waiting, it isn’t a bad idea to think about and work on other things on your site, like working to mprove your content, your site’s usability, links to your pages, and so on, or learning more about SEO and those other topics. Gaining a little knowledge everyday starts adding up after a while.

  56. Long time ago in Google duplicate on one domain were added to suplemental index. Today suplemental index don’t exist, but Google has similar mechanism, invisible to normal users.

    Worth of duplicate content on many domais depends of links. If our content is better, than more people copy our text, but it isn’t harmful when our site is stronger of their.

    Sorry for ma English ;)

  57. Hi Adi,

    Google has never said that the Supplemental index was not going to be used anymore, but instead that they would stop showing whether or not a search result was from the supplemental index. In the same statement, they also told us that they were going to be refreshing what was in the supplemental index much more frequently than it had been in the past.

    It’s possible that if pages that were duplicates were in the supplemental index, it wasn’t necessarily because they were duplicates, but rather because they may not have had much in the way of links pointing to them, or much PageRank. It’s possible that some duplicates are in Google’s main database – for example copies of important historical documents that have been published on the Web and are on high quality sites. Duplicates are often filtered out of search results when being displayed to searchers, but the display of search results is independent of the crawling or the indexing of results.

  58. I have a question regarding duplicate content and URL canonicalization. How can I redirect every single non-wwww version of my web site to a www one. I am using Joomla CMS.I did this for the home page.I also know about the canonical tag and all the other ways to eliminate duplicate content but this one is troubling me a lot more since i tried fixing it through CPANEL and did not work. Please advise if you have any suggestions.

  59. Hi Stefanos,

    I’m going to guess that you are probably on a linux server using apache. The easiest way to do that is often using an htaccess directive to create your 301 redirect from non-www pages to www pages. But, I don’t know what your hosting set up is like, or what you’re seeing in your CPANEL, so your approach to setting up a redirect may be reliant upon that. You should be able to set up a redirect for all of the pages, instead of just the home page. I’d recommend sending a quick email or contact to your hosting support, and asking them for help.

  60. Bill, duplicate content is a difficult thing, more difficult than one would think as your pages can even compete with each other, if they are about the same topic duplicate content not even being a factor. So, my question would be about home pages. On your homepage, I have seen some sites get away with no content yet rank very highly. This would seem to be the best approach as it allows you to not give away your best ideas and get bogged down, yet on the other hand I have seen sites rank high with a ton of content on their home page beating out the guys that make you wait.

    My question, do you think it is a good idea to have a shorter homepage and save the meat for your deeper pages or to get a little dirty on your homepage? I look forward to reading your response!

  61. Hi Jeremy,

    A couple of things to keep in mind regarding a home page is that:

    1) Search engines can pretty much deliver a visitor to any page on a site, so I think there’s value in providing something important to visitors on any page they might potentially land upon on your site.

    2) Some rankings for pages are a result of things like anchor text pointing to your pages from other pages and sites, so that’s why you might see some home pages ranking well for a query, even though there isn’t much in terms of content on those pages.

    3) People might arrive at your site through your home page, or through another page on your site, but chances are that if they arrive on another page on your site, and they are interested in what they see there, that they might visit your home page. Regardless of how much content you decide to include on your home page, it should work to give visitors an idea of what else they might find on your pages, and engage them in a manner that convinces them to visit other pages.

    4) There shouldn’t be a problem with having your pages compete with each other for some terms, as long as you avoid using content that is too similar.

    Some sites and some topics for sites may benefit from a shorter home page, and others may ideally be served by a longer homepage. How do you tell the difference? Try some different things and test. Make sure that you include enough on your home page so that the different visitors that might come to their site recognize that they might find what they are looking for on your pages.

  62. @jeremy and @bill … what would you say about many of the highly ranking SEO / web design websites that use a HUGE amount of content on their homepage..

    Would this increase the chance to rank for a head tail term, or is that because they are concentrating their link efforts just into one page, so it makes sense to have everything in one place?

  63. Hi Scott,

    Sometimes including a lot of content on one page can be a good strategy, but I would caution that constructing an intelligent information architecture for a site might be a strategy worth pursuing, after learning about the objectives behind a site, the intended audience for that site, and the tasks that one might help those audience members perform on that site.

    Chances are that multiple pages might work better than trying to place everything all on one page.

  64. Hey Bill,

    We have a client with a 4 Page Rank and they had a content writer pull duplicate content from other websites, and probably 70% of the content is duplicate or stolen. If they re-write all of the website’s content, will google accept the repairs and re-value the website (if this is the only penalty)?

    They previously ranked great for some nationwide terms and then after this happen they fell off of the search engines all together. Let me know your thoughts.

  65. Hi Joel,

    It is possible that the duplicated content may be causing a problem, such as pages from the site being filtered out of search results. It may also be possible that Google may have considered the use of that content as spam, and may have applied a possible penalty or penalties (though Google is often quick to state that they don’t penalize sites for duplicate content, they do penalize sites for what they consider web spam).

    Without more information about the site, it’s hard to tell whether that added content is the cause for the loss of rankings. I don’t know how much content was added to the site, as well as other changes that may have been made on the site, but it’s possible that there may be other reasons for the loss of rankings as well.

    Rewriting the added content so that it is unique sounds like it could be a start in the right direction, though without some additional information about the site, it’s difficult to tell with any certainty.

  66. Am trying to do some SEO housekeeping on my website and came here through a Google search on ways to reduce duplicate content. Very thorough list and some excellent comments as well. Unfortunately, it looks like we’ve got our work cut out for us.

  67. Hi Daniel,

    Thank you. Getting rid of duplicate content from within a site can definitely be worth the effort, especially when it affects things like the PageRank of your pages. I may have a newer post on duplicate content sometime soon, covering some different aspects of it.

  68. Unfortunately my comment got deleted. But I’m still interested in an answer. Is there a tool to compare if pages could be considered as containing duplicate content (e.g. enter a URL and search the web for possible dupes or entering the URLs of two pages and compare them)? Thanks in advance.

  69. Hi Ben,

    One site that you can use to try to see if there is duplicate content on more than one site is copyscape. But I can’t tell you if the duplicate content detection that it uses is the same method that a search engine might use.

    It also isn’t useful for comparing pages of a single site, to see if problems like having duplicate content at different URLs on that site might be a problem. For example, when a home page of a site might seen as existing at more than one URL, like the following:

    “http://www.example.com/”
    “http://example.com/”
    “http://www.example.com/default.php”
    “http://example.com/default.php”

    There are a fair number of methods that a search engine such as Google might use to try to determine the existence of duplicate content that a tool like Copyscape might not capture, as well. I link to a number of white papers from the search engines which describe a few in my post:

    New Google Process for Detecting Near Duplicate Content

    Google did publish a patent application for a tool that might be even more useful, if they develop and release it. It’s a “duplicate content search engine,” and it might be helpful for site owners in detecting duplicate content that the search engine knows about. I wrote about it in this post, but it isn’t available yet, as far as I know:

    Google to Help Content Creators Find Unauthorized Duplicated Text, Images, Audio, and Video?

  70. Thanks for the answer. I did a quick try with copyscape and I got results (Title + first sentence of an article was found in a blogdirectory). According to your experience how deep does the copyscape spider dig meaning how likely is it that copy scape will actually detect existing dupe content on the net?
    I had a short look at the described methods but I did not have the nerve to dig deeper. ;-)
    The Google tool could be really interesting (if it uses the same detection mechanism as the general search engine).

  71. Hi Ben,

    I’ve used copyscape in the past, but haven’t studied in in detail to see how effective it might be at uncovering all existing duplicate content on the Web. That is an interesting question.

    The Google tool would be very interesting if it used the same detection mechanism as the general search engine. I expect that Google uses more than one method though – they look for duplicate content during the crawling of web page, the indexing of pages, and the display of pages – and it’s quite likely that the processes used in each stage is different.

  72. Hey Bill,

    I haven’t read your blog in a week or two. Been quite busy with websites! I found this post through Google because I am having problems with one of my sites.

    The site is still indexed when I did a site:url search, but I am no where to be found for non-competitive keywords. I looked in Google Webmasters and didn’t see a notice that my site is punished. I am well aware that the site had a lot of duplicate blog posts (to try to rank high for various keywords) so I am assuming that is the problem. Would my site start ranking again if I deleted all the duplicate content?

    Thanks,
    Kai Lo

  73. Hi Bill,
    So this might be my third comment today..
    I was so surprised when I saw the date this post published. It’s nearly 4 years ago.
    I thought duplicate content issue is not the old thing.
    Wew.. I really hope that I speak english well so I can learn a lot from your site.
    Thank you very much for the list where search engine see duplicate content.

  74. Hi Kai Lo,

    While it’s good to see you back, I’m sorry that it was because of a problem with one of your sites.

    It is possible that your pages may be filtered out of search results for some terms. I don’t believe that Google considers such filtering to be a penalty, but rather a way of making sure that searchers don’t see search results filled with duplicated content. It is possible that replacing the duplicate content with more unique content, or removing it may be helpful.

  75. Hi dekguss99,

    You’re welcome. Duplicate content has been a problem for the search engines for years, and the search engines have been coming up with more and more approaches towards identifying duplicates.

  76. Thanks for comprehensive information. I was thinking to duplicate my content to a new subdomain with the intention to move it eventually.

  77. Hi Piluz.

    You’re welcome. If you do make such a move, make sure you plan carefully. You want to make sure that you set up 301 redirects so bookmarks and links and search engines pointing to your old address are redirected to your new ones. You may also want to plan to build some new links to the new address as well.

  78. Hi Piluz, I think it’s usefull for you dailyblogtips.com/how-to-setup-a-301-redirect/

  79. Yeah.. and use a 301 checker to make sure it’s right.. there’s several of them out there

  80. Hi Bill,
    I thought about the duplication of content within a website is not a new thing anymore.
    I really hope that I speak English very well so I can learn a lot from your site.
    Thank you very much for the list in which search engines see duplicate content. Once again, a lot of ways to create a unique and interesting content on search engines

  81. Hi sibudy,

    The problem of duplicate content on websites is one that has been known about for a fairly long amount of time, but it’s not a problem that has been solved by any means. The search engines seem to come out with new approaches on a regular basis. But honestly, there are a lot of things that webmasters can do to solve many of those problems on their own.

  82. Very interesting acticle, a search for duplicate content on one of my own sites brings up lots of directory listings as they pull from descriptions or we pasted home page content in, I wonder does this adversely affect SEO.

  83. Hi Dan and Randy,

    A snippet from your site on a directory page shouldn’t impact whether or not your page might get filtered out of search results, but I have seen some problems in the past when high authority sites might include a page that contains a number of snippets from pages on the same site.

    I experienced a problem like a few years ago when Bloglines was publishing consecutive snippets on a single page from blog posts from another blog that I had, and they were allowing those pages to be indexed. Even though my home page had a higher pagerank than the bloglines page, it was appearing in results and my blog was being filtered out of those results. I took that up with Bloglines, and once they decided to noindex those types of pages – my blog reappeared.

  84. Pingback: onDevelopment =1;
  85. Ya, I have had major problems with duplicate content while trying to bulk up my site with syndicated articles in the past. Now I make sure that all of my content is 100% unique! Wish I would have read this post before I found out the hard way that duplicate content is horrible!

  86. Hi posicionamiento web,

    Yes, it really can help to explore what the costs might be of doing things like adding syndicated content to your pages, or dig into whether adding content from multiple sources in a kind of mashup might really be effective.

  87. I have several different top level domains, and I have different design, different IP, different hosting, different images, different title and meta description but the same contents. Do you think these will consider as duplicated content in the eye of search engine ?

    Thanks.

    Paul

  88. Hi Paul,

    I think that it’s possible that a search engine might ignore some things such as different design, IP address, hosting, images, and other factors when trying to decide when the main content of a page might be duplicated elsewhere. Search engines try not to only focus upon when they might find the same content at different URLs on the same site, but also when they find the same content on different sites controlled by different people.

    Search engines try to avoid returning the same content to searchers in response to a query when that content appears on different sites – otherwise we might see search results filled with links to the same article at different domains.

    The content at different sites may not be exact duplicates, but search engineers often refer to what you describe as “near duplicate” content, and they have been working for years to identify it.

  89. Hi
    I started my website in Feb 2009. I added some copied content.

    But now I am adding all self-written posts.

    But then also, around a year has passed (after writing self-written posts) and I don’t get more than 200-300 daily uniques from Google.

    Could you please help me in gaining good traffic from Google.

    Please if you can have a look on my website and explain in detail.

    Thanks A Lot In Advance’
    Atul

  90. Hi Atul,

    It was probably a good idea to stop using copied content and start creating your own unique content.

    It does look like you’ve left the site open for others to submit and have their articles published on the site, with links to their pages. I’m not sure that type of approach may lead to the highest quality of content that you might publish on the pages of your site, and it may not always lead great choices for keywords to attract visitors. Do you try to make sure that articles published on the site aren’t duplicates?

  91. Hi Matt,

    That rumor originated from a comment made by Google’s John Mueller in athe Offical Google Webmaster Central forum, at:

    http://www.google.com/support/forum/p/Webmasters/thread?tid=7909f77b85c3e53b&hl=en

    John noted the following in that thread as part of one of his answers:

    It looks like some of the pages from your site have been returning empty HTML pages with almost no content. In general, when this happens, we may assume that you’re trying to return this content on purpose, and if it’s identical with other content that we’ve found on your site, we might assume that they’re duplicates.

    He followed up with a post that mentioned that there might be a server issue causing some pages that would otherwise have content on them to be returned as empty or near empty pages, and that if Google removed those pages from its index that once those server problems were fixed that Google would likely return those pages to its index as they were before.

    Technically, they are “duplicates,” but if those pages are showing up as blank, then there really isn’t much to index on them anyway.

  92. Hi Bob,

    One of the problems or risks involved in submitting an article to a number of article directories is that if you also publish the article on your own site, it’s possible that the copy on one of those other directories might outrank the article on your own site and the one on your site might be filtered out of search results.

    I don’t think that the search engines would consider an article submited to a number of article sites to be spamming, but I’m also not sure that you would really get a whole lot of value for those based upon links from them. Many articles on article sites don’t end up with a lot of links pointing to them, and they may not end up passing along a lot of pagerank or link equity.

  93. Hi Wim,

    Nicely written article. Having the same content available in your posts, your dated archives, in your paginated pages, and in your categories can cause the kinds of problems that you experienced. I know something like that is a little shocking when you experience it. Good to see that you’ve solved the problem. Making sure that Google doesn’t index some of those other pages is the way to avoid that problem, whether through a noindex/follow meta or a disallow statement.

  94. Good article Bill. You almost covered all duplicate content aspects. I’ve been working with SEO a while and I am amazed how poorly some of my clients site are optimized. You should not forget that the canonical tag should preferrably be present on each page and refer to the page itself and strip of any variables if linked in through external links, i.e affiliate links etc.

    See article by Rand Fishkin
    http://www.seomoz.org/blog/canonical-url-tag-the-most-important-advancement-in-seo-practices-since-sitemaps

  95. Hi Lana,

    Thank you. You do have to keep in mind that I wrote this post in 2006, before the search engines had come up with the canonical tag. While I think the canonical tag is a good idea and has its uses, to some degree it’s often used as a bandaid where there’s problems with providing one URL per page. If it’s technically possible to avoid multiple URLs for the same page, that’s often a better approach than using a canonical tag.

  96. Thanks for the great info!

    In regards to #8:

    8. Copyright infringement

    When someone duplicates the content on your site, it may cause your pages to be filtered out of search results. A site like copyscape may help you find some of these pages. Searching for unique strings of text on your pages, in quotation marks, may help uncover others.

    Copyscape is an excellent tool to use to make sure your content is not being stolen. It is also a great idea to use copyscape if you are outsourcing article writing. There’s no need to pay $5 to $10 an article only to find out later that the author simply stole the information from other websites. If you make sure that your content is unique and of good quality, you will be rewarded for it in the SERPS.

  97. Hi Bill,

    Thanks for your post, 4 years later it is usefull instead of the changes on Internet and search engines.

    Gerard

  98. Hi Gerard,

    You’re welcome. The duplicate content problems that I listed in this post were ones that I had seen over and over again on a very large number of sites, and still see frequently today, even after the passage of 4 years. Glad to hear that you found it useful.

  99. Would i be right in suggesting that this is more true now than ever before because of panda. In the past google would filter duplicate content on websites but not it would punish it by dropping all pages within the site down the rankings. I am right in thinking this?

  100. Hi Craig,

    Panda does seem to be interested in sniffing out duplicate content. Many of the points that I make in the post involve duplicated content on the same site, and that’s something that I’d recommend everyone address regardless of whether their rankings have been impacted by Panda. Chances are that the same content located at different URLs might be seen as different pages, and it also might defuse the PageRank that those pages receive. Not sure that exact duplicates are something that Panda might punish a site for, but pages that are near duplicates, with slight differences such as keywords inserted into some areas, or containing very little novel content might have a problem based upon Panda.

    Content that’s duplicated across multiple websites may be considered a sign that it is content that’s been scraped from one or more other sources on the Web, even if the publisher is the one that’s been scraped by others. If so, it might be time to either rewrite that content, send out emails and DMCA notices, or take other steps that might help Google identify the page as the original source of that content.

  101. Hello Bill,
    Would this work for a wordpress site say? Using canonoical tags in an SEO plugin that I have installed. I am also using a sitemap so how would the two conflict each other?

  102. Hi Phillip,

    An XML sitemap, an HTML sitemap and canonical tags on your WordPress blog should work together well without conflicting with each other, as long as you only include the canonical versions of URLs to pages in both of your sitemaps.

  103. What do you do when you offer a service that is searched by state or city, but the service is the same everywhere? For example, if you wanted to create a site that offers auto insurance and then created FloridaAutoIns, TexasAutoIns, etc. Each site has the same content with the state or city name changed. There are just so many ways you can describe the product without duplicating yourself. But, you want to score high for each separate state or city.

  104. Hi Marty,

    I think that it can really help to learn more about the differences as they apply to each state, and then focus upon those differences.

    For example, if you were trying to build a site or sites about insurance laws in different states, chances are that each state uses different phrases to refer to things like uninsured driver protection or personal injury protection. Learn what the differences are from one state to another, because there’s a good chance that people from those states will search for those terms.

    The content from one state to another shouldn’t be the same because the insurance requirements in each state are different.

    You could potentially use a similar format for each state, including things like:

    1. A section about the requirement to carry insurance, including the different types of insurance
    2. A section about the state departments that create regulations involving driving and insurance
    3. Some of the things that your coverage includes, including things that are unique to each state
    4. Vehicle inspection requirements and the inspection and reinspection process for the particular state.

    Every state is different enough in the laws and regulations it has created and the terms that they use so that while your sections would be similar, the terms you use would be different.

  105. I am sorry,I should have been more specific. The product I am looking at does not vary from state to state. It is exactly the same. However, it is often searched for with the state preceeding or following the search term. That is the dilema. The only thing that is different is the state or city name.

  106. Hi Marty,

    Ok, you’re making things a little difficult for me. First it was a service, and now it’s a product. And a product that people usually search for using some kind of geographic term associated with it.

    Without knowing what the product is, it’s difficult to offer you much in the area of helpful advice.

    If it’s a product that’s regulated in some manner by each state, then there are different offices with different names that regulate it regardless of whether the product is the same in each state, you could include information about those if it’s appropriate and helpful to put that kind of information on the site or sites.

    It’s also possible that there are different local terms or names for the product from one state to another that might be used in testimonials from customers in each of those states, or by the people who perform those services. If it’s called exactly the same thing from one state to another, and you create a testimonial page for each of the sites, and use the first name, last initial, city and/or city & state for the people leaving the testimonials, then you’re creating unique content that’s also geographically specific.

    If the product is available offline at specific locations within each state, listing the places it’s available is another idea.

  107. It is PPO Dental plans. Same plan in each state and no regulations regarding the product that differs from state to state. The description, product name, etc are all exactly the same. I do like the idea about the testimonials. Although, it is not a product that people write them for. I can rearrange the text a bit, but after 20 or 25 states you can change it just so much.

    This then makes me wonder if the text were essentially the same with the state or city name changed, would it even show up?

  108. Hi Marty,

    I’ve been in your shoes, and it is challenging to write something different from one state to another, regardless of whether the content you’re creating is on one website or fifty of them.

    You definitely don’t want to create a template to use for each state, where you just insert the different names of the states, because that might potentially end up having your pages filtered out of search results as near duplicate content.

  109. At what point is content considered duplicate? In other words, how much duplication is permitted?

    If I described a product or service in a paragraph or two, and then listed the providers or locations of where the product or service was available, would this be enough? The providers/locations would be different of course for each city or state. A page could have a considerable amount of this information following the more or less duplicated description. I could easily have more non-duplicate info than duplicated. But, will the algorithms used by the bots still penalize the page?

  110. Hi Marty,

    Google’s published a number of patents and papers on duplicate and near duplicate content. In those, there really isn’t a reference to a strict percentage of duplicate content, and sometimes it really doesn’t have to be very much to be considered duplicate.

    For instance, if a searcher enters a specific query term in Google, and Google finds a number of pages that all return the very same snippet of text, it might filter some or all of those pages out of search results based solely upon the duplication within the snippet. Note that in an instance like that, Google doesn’t consider the filtering of results to be a penality, but rather a filter.

    For what you’re describing, it probably would help if your description varied from page to page somewhat. I’m not sure that just inserting a state name would be enough, but having the different lists on each page is helpful.

  111. Thanks Bill. I have a big job ahead of me.

    I have thought about using one site devoted to the product and have sub-pages for certain states and cities. We have had some luck with that for non-competitive search terms. I wonder if I would get picked up and score as well as a dedicated site for each state?

  112. Hi Marty,

    You’re welcome. One site that I worked with had a similar barrier in front of them, but they had people in each of the states representing them. I suggested that they have the people from those states write up the description of the services that they provided in their own words (which could and would then be edited.) The benefit of doing that was that they did get uniquely worded descriptions from different people. There were some differences on a state level that each was able to bring out with their descriptions. I don’t know if you have a similar kind of relationship with your representatives in each state, but if you do, that might be worth trying.

  113. Unfortunately, I am chief cook and bottlewasher. I sell products and maintain about 30 sites (for myself). I will have to write all the copy.

    I have come across sites that scored well but used what I would call “dark grey to black hat” techniques. They had a bunch of nonesense paragraphs. Well, nonesense in that there was no rhyme or reason but each sentence was a legitimate statement with proper keyword placement. They then redirected you after you landed. A bot would see grammatically correct content with a clear focus. I guess they knew if it was a bot that was looking at their page. They ranked highly in separate states using this technique. However, I was under the impression that Google knows the score on this and now punishes them for it.

    I have always used the standard “good copy – good links” method for myself and others. It has worked well for a long time.

  114. Hi Marty,

    I do think you’re much better off with the good copy/good links approach, and I think the value of automated content/redirected content is starting to diminish with updates from Google like Panda.

    I also know how hard it is to try to say something that is substantially the same 50 different times. Maybe consider hiring an intern or two? They might be able to get college credit and some pay, and you can focus upon things that are a higher return on the investment of your time?

  115. @Bill Slawski

    This is such a nice reading. I read your entire blog post but I have one mind bubble with my website.

    How to Fix Duplicate Content Issue of Manufacturer Details Paragraph?

    I am surviving with Google’s crawling issue. Google had not index my product pages yet. I have Google a lot and read too many articles to get it done. But, I did not get satisfy answer with it.

    I just checked my product pages and found that: There is one tab with Manufacturers Details containing one paragraph.

    This content is available on too many product pages with same manufacturer.

    So, Does it matter to stop my crawling? If yes so How can I fix it?

  116. Hi Bill
    Thanks for such an informative and comprehensive view on duplicating content.
    How search engine behaves if same content in different languages on different websites?
    Do they fall in the same duplicate content category or treated differently?

  117. Hi Hiren

    Many sites on the Web use manufacturers descriptions word for word, and that might cause them to be less likely to be found through those pages.

    There are steps to take, such as rewriting those descriptions and adding completely new content to them, and/or including user generated content such as reviews to those pages. For example, take a look at what Amazon.com does on their pages.

  118. Hi Zia,

    Thanks. I believe that I’ve seen statements in the past from Google spokespeople that if they run across the same content in different languages that they probably wouldn’t treat it as if it were duplicates of the original in the different language. I would suggest still considering adding something new, especially if the original content makes more sense to the audience that it was written for in the language that it was written in.

  119. Hi Bill,

    Thanks for your prompt reply on my question. I have done similar R & D with few blog posts and discussion on SEOmoz help forum. I have 7K+ products and quite hard to develop unique description for each product. But, I can go in that direction by pick specific high selling category. What you think about it? I have integrated Power Reviews for my website and all reviews are embaded with Java Script. So, It’s hard to index that content by crawler. I am going to discuss with web developers to give alternate solution for it.

  120. Hi Hiren,

    You’re welcome.

    I think deciding which products to focus upon by both looking at present high selling categories, and by categories that you may potentially see some of the best returns on the use of your time makes a lot of sense.

    I haven’t looked at “Power Reviews” enough to gauge the value of using those in a manner that might be indexable by the search engines. Do they have anything in their terms of service that might prevent you from doing that?

  121. No, That facility is not available with Power Reviews. I was confused after replying about content in Power Reviews. Because, It may take too much time to gather unique content by Power Reviews.

    For that, my each product must be pass from specific channel like 1. Must be buy by customer 2. Get satisfaction 3. Sending email for review 4. Must check inbox to get it done 5. Put review on products.

    Hoooo….

    I think it’s long process and Dinosaur will alive again till than getting unique power reviews on each page. :) What you think about it?

    I am going to define unique attributes like Title, product details, PDF, Video, Q & A, Blog post, etc… If you have any more idea about it so you can suggest me…

  122. Hi Hiren,

    It does sound like a labor intensive approach for both you, and for the people who might provide reviews.

    Defining unique attributes as much as possible and adding unique content via blog posts and Q&A are definitely good starts.

    I’d definitely look around at a number of successful ecommerce sites, and see what they are trying to do to make their content stand out and be unique.

  123. Bill
    Thanks a lot for making this valuable information on duplicating content and
    how search engine behaves for same content, and especially for the canonical URLs.
    We followed your guidelines step by step and noticed a significant improvement.

  124. Hi Vormetric,

    You’re welcome. It’s great to hear that you were able to use the information from this post to see improvements for your site. Thanks for letting me know.

  125. As regards canonicalization issues, I understand that it’s if nothing else “good practice” but wouldn’t you say that with all the might of Google etc they have the technology to include this in their algorhythm already? I’m betting that it’s already in there.

    Perhaps not 10 years ago but certainly now.

  126. Hi Amy,

    There are many different approaches and content management systems that people can use to create content on their pages, including creating their own content management or ecommerce platforms. Ideally, Google should be able to look at an individual website and find a way to find the best URL for each page when there is more than one that a page can be reached at. But when you think about the scale of the Web and the potential computational cost of Google parsing through every website they find and performing that kind of analysis, it could take much more work that it should for them to do that.

    Since Google’s focus is upon indexing as much unique content as they can, that lends itself to spending more time indexing pages, and less time trying to figure out how every site they come across is set up and organized.

    Making it as easy as possible for the search engine to index your site correctly is your responsibility as a site owner if you want them to index your content and possibly rank some of your pages highly enough in search results so that people visit your pages. It’s not a good idea to believe that you can just leave it to Google to get everything right.

  127. Hi Bill,

    What would Google do if some website/blog is copying article from my website? Is it going to penalize by website too?

    There have been incidents when some blogs have posted article from my website but gives a backlink to my home page and kept all the links inside the article to my inner pages as well.

    I always thought it would help my website. Should I ask these websites to remove the content?

  128. Hi Max,

    Representatives from Google usually insist that they don’t penalize sites when they see duplicate or near duplicate content. With the Panda updates, I’m not quite convinced that’s quite true anymore. But regardless of that, if Google finds duplicate content (or near duplicate content), it might filter one or more pages out of search results, so that it’s only showing one version to searchers.

    The thing is, when Google does this, it doesn’t always show the page from the site that originated the content, or if there’s more than one URL for the same page on the same site, it doesn’t always choose the version that you might want it to.

    There’s fair use, and there’s copying a whole article or blog post, and the two aren’t the same. When I see someone has copied a whole article or blog post from one of my sites, even if they provided a link to my page, I try to contact them and ask them to either remove it, or just use some of the content and add some of their own thoughts or opinions about that content. In a few cases, when they’ve refused, I contacted their hosts and complained about copyright infringement.

    Chances are on sites that tend to copy and paste their content from other sites, the value of those links don’t really help you all that much. If they had the power to do that, they also have the power to show up in search results instead of your pages.

  129. Pingback: Google Panda – The Complete Update List and How to Avoid Over Use of SEO

Comments are closed.