Close | X
Blog Profits Blueprint

The Report That Changed Blogging.

Join My Newsletter And Download The "Blog Profits Blueprint"

The Real Skinny On Duplicate Content

By Bryan Clark
15 Comments

The issue of duplicate content is a rather contentious one in the world of Search-Engine Optimization. There were for many years – and perhaps still to this day – myths floating round about duplicate content penalties.

Essentially, folks were told that if they produced duplicate content then they would be punished, and many thought that this meant you could not syndicate content, or post the same article on your site and another. Personally I think this was a ploy by Google to deter potential spammers.

You see, back when search engines first started to become popular, it was easy to rank in the top spots for your keywords by stuffing your website full of keywords. The black hat community used the same article on multiple pages and the search engines gave them good rankings. Thankfully, the algorithms are much more sophisticated these days and so are the users.

In all reality, what Google was trying to say is that you could not post the same article over and over again on your own site in order to attempt to manipulate your search-engine rankings. Duplicate content is specific to one site.

When Plagiarism Isn’t Really Plagiarism

The fact of the matter is that plagiarism exists. It would be nearly impossible to avoid having content ‘duplicated’ from one site to another. While there are ways of reporting people who steal your content, more often than not it is something that goes unpunished.

Life is made harder still by the fact that search engine algorithms do not have a good basis for determining who stole whose content. In addition, and in relation to the internet, content ‘scrapers’, article spinners and an abundance of black hat internet marketers looking to make a quick buck make it harder still to police content.

This is a problem, although there are various web sites – such as Copyscape – that can be used to determine if someone is stealing your content, and action can be taken.

Unique content plays a big part in the way in which the search engines rank your content, and is therefore a big plus to your SEO efforts. Fresh – that is to say updated or relevant – content is favored massively over content that has been sitting dormant for years. Cue note to self to update some older blog posts with a little more fresh content.

Duplicate content on the other hand, is content that appears in more than one place. If you are unaware as to how your content management system works, you could unwittingly be creating duplicate content. It can appear, to the search engines, in various places on your own web site.

In terms of your search engine rankings, it is crucial to solve any on-site issues. Failure to do so can result in the search engines not knowing which page to rank for your keywords or, worse still, not ranking your web pages at all.

Assuming that you have minimized the use of similar content within your own site, then how can the search engines still find duplicate content?

The main problem is the way that the search engines read your Uniform Resource Locator, or “URL”. For example “http://google.com”, “www.google.com” and “google.com” all look the same, but to the search engines they are indeed three separate pages. This problem magnifies when you look further at the architecture of your site; dynamic pages, categories, print friendly pages, session ids and even capitalization of letters can have an impact.

According to Google, examples of non-malicious duplicate content could include:

Bouncing Balls And Infinite Domain Configurations

The solution to this issue is best explained with a working example, so consider this:

A web site about bouncing balls has a page about green bouncing balls, located at –

http://bouncingballs.com/greenbouncingballs

The content of which can also be found by the search engines via the products page –

http://bouncingballs.com/products/greenbouncingballs

It is the same page, although because of the way the URL is generated, to the Search Engines it appears to be two pages, within the same site, that are hosting the same content; in other words Duplicate Content!

To confuse matters further still, the search engines would find further content issues when crawling “www.bouncingballs.com/greenbouncingballs”. If only there was some way to inform the search engines that this is an issue with the URL as opposed to a malicious duplicate content issue. But wait… there is.

Domain Redirects

There are a number of ‘work-around’ ways to solve the problem of duplicate content on-site, however the best way is to make things right permanently. In the case of the working example this is a two-step procedure.

First of all, you need to solve the top level domain issue and avoid confusion as to whether or not your site is located at “http://bouncingballs.com” or “www.bouncingballs.com?.

A “301 redirect” in your htaccess file is the answer (example below). This is a permanent instruction to your web server that it should always re-direct “http://bouncingballs.com” to “http://www.bouncingballs.com”. Simply choose which version you would like to use, and then STICK WITH IT. It is important that you are consistent with your internal linking, so ensure that all your links go to that version.

The following code can be used to 301 redirect your site, just replace “yoursite” with your own domain, and then paste it into your htaccess file.

RewriteEngine on RewriteCond %{HTTP_HOST} ^yoursite.com$ [NC] RewriteRule ^(.*)$ http://www.yoursite.com/$1 [R=301,L]

The second step to solve the working example would be to inform the Search Engines that you are aware of the URL issues within your site. Otherwise known as “Canonicalization”, this is an SEO best practice for identifying your preferred URL to the Search Engines and users alike.

A simple HTML meta tag can be used within your pages to let the search engines know “http://bouncingballs.com/products/greenbouncingballs” is indeed the same as “http://bouncingballs.com/greenbouncingballs” and that all credit should go to your preferred version.

By placing the following tag code – rel=”canonical” – in the header of your duplicate page, the search engines will not penalize you. For example, the code <link href=”http://www.bouncingballs.com/greenbouncingballs/” rel=”canonical” /> would be placed in the header section of the URL “http://bouncingballs.com/products/greenbouncingballs’, meaning that the search engines would credit your preferred URL with the content.

If the rel=”canonical is applied to all issues of on-site content duplication then you can expect to see great changes in your Search Engine Rankings.

More information can be found on the Google Webmaster pages. Watch the video – it will really help you understand the basics, and the reasons for doing this. Also, if you are using WordPress, there are several plugins that will assist you in your quest to cut down your on site duplicate content issues.

Bryan

Photo courtesy of Horia Varlan

About Bryan Clark

Bryan Clark is a professional writer, blog editor and evangelist. He has contributed to leading news properties and blogs in tech, entrepreneurship, finance, and the digital lifestyle. Bryan has earned features on Problogger, Entrepreneurs-Journey and USA Today. Bryan works with Growth Partner, a venture fund and startup platform for web businesses.

Follow Yaro

View Yaro Starak's profile on LinkedIn
Follow us on Instagram

Share This Article


15 Comments

  • Thanks for a great post Bryan. I was just looking at a client’s webmaster tools this morning and discovered canonicalization was disabled. I’m now working to straighten her WordPress site up as you discribe.

    Thanks,
    Steinar

  • Very interesting. Thanks for sharing. I look forward to reading more articles from you Bryan.

  • After checking my files it seems as if they are correct by not including “www” for my blog. Awhile back, I switched blogging platforms and I did not use a redirect to my new platform. I guess I just got lucky (I hope) or I am missing something within my files database. Thanks for the great info, it’s helped a lot!

  • Thanks for the info Bryan. I will probably need to read it again to really understand how to implement the steps you mentioned. It takes me awhile but I’m usually able to do the techie stuff. Thanks for sharing.

  • Thanks for the info, Bryan. Especially the “Canonicalization” issue, I wasn’t aware of that before.

  • What if all my blog’s info was being duplicated by a site like “topsynews”, would my site be penalized then?

  • Raj

    These days, search engines are quite adept at finding out if the same content has been posted in a category, home page and archive pages (for example) and not penalize a site for it. But then this problem was quite evident earlier. One fix that worked for me was to submit an xml sitemap to webmaster tools.

  • Great article, Bryan. I was having issues deciding if I should spring for an article spinner or just post PLR to my niche sites and I tried without spinning. I haven’t had any problems from Google or anyone else, so far. But just to be safe I’m goning to start spinning rather than duplicating. Thanks and keep it up.

  • Thanks for sharing! I have been wondering about duplicate content for a while and it’s very good to get some light on the subject.

  • I don’t think duplicate content is as much of a concern as people think it is. Basically the search engines look for the first occurance, or the most authoritative occurance of a piece of content, and give credit to it. However, it’s always a good idea to make sure that we are not unwittingly creating duplicate content on one’s own site. Thanks for the clear tutorial.

  • Duplicate content has always been a grey idea. I agree with Barbra’s comment “t’s always a good idea to make sure that we are not unwittingly creating duplicate content “.

    I try to create my sites with a clear hierarchical structure that visitors/google bots can logically navigate without having the same content on different pages (as you said with the bouncing balls example)

    I find not only does it help with users and the bots, but it also helps you manage and organise your site for maximum SEO optimisation (as well as identifying which areas of the site needs improving).

    You see from a lot of older sites, strange url structures e.g. /product/green-1 that have no relation to the actual content of the page and therefore not helping in terms of seo, user-ability and bots – A site restructure can be a pain, but correctly using 301 redirects will help solve this.

    Good article!

  • The introduction of a mobile site has caused massive duplicate content issues for us. You would think that Google would be able to identify a mobile site (versus the main site), but instead they couldn’t (or wouldn’t).

    We introduced redirects between the sites because google was sending mobile users to our main site, and vice versa, this just seemed to make Google more and more angry with us. As a result our traffic dropped even more.

    Duplicate content, never an easy fix!

  • Nice post Bryan. I’m learning and understanding the ‘Canonicalization’ for the first time. Well I don’t think I have to go through all those hectic processes because this function already exist in ‘All in One SEO Pack’ Plugin, which I’m using on my blogs. I believe that also can solve the problem, right?

    On the duplicate content thingy, I’ve always thought SEs (esp. Google) do frown at it not until now. However, I’d love to produce new and unique content on my blog so that my visitors will have to read new and interesting post each time they come around. I wouldn’t like a situation whereby a visitor will get to my blog and the next thing he does is to make a comment saying… “I’ve read this before on so so so blog”. It will look embarrassing, at least to me.

  • Thanks for the info Bryan. I will probably need to read it again to really understand how to implement the steps you mentioned. Since my site de-indexed by Google

Leave A Comment

Your email address will not be published. Required fields are marked *

Blog Profits BlueprintLearn How To Make $10,000 Per Month Blogging 2 Hours A Day

Enter your email to join my newsletter and download the Blog Profits Blueprint Exclusive Report

Follow Yaro: Email | RSS | Facebook | Twitter | Google+ | LinkedIn | Instagram | YouTube

Don't show again | X

Follow Yaro On Facebook

And learn how to build a better blog.

Follow Yaro Starak Facebook