Should We Care About People Stealing Content Via RSS Feeds?
- 20 Comments... Click to Contribute
I’ve had full text RSS feeds on my blogs from day one. I’ve justified why and I stand by that justification still. Recently there has been a lot of discussion circulating around the blogosphere about people stealing content from RSS feeds. Since feed are so easy to use it was only a matter of time before someone came up with a way to automatically replicate blogs by using RSS.
For a while now I’ve watched my Technorati referral statistics and it’s becoming more and more common to find blogs replicating some of my articles or even replicating all of my articles. They don’t ask for permission and use the content either by copying and pasting or by using an RSS feed scraper which automatically republishes content from RSS feeds of other blogs.
Initially I was bit surprised and angered, especially when I came across a site which blatantly republished all of my articles creating a near-duplicate of my recent content. I’ve shot off a few emails to such sites asking for them to remove my content but somehow I don’t think an email will be enough to get my desired response.
How This Hurts
My biggest concern, and I’ve already noticed this once, is that my search engine rankings will be affected. While I’m confident that it won’t be me that gets the duplicate content penalty since I have so many authority links coming in to my site (it should be the rip-off artists that get the penalty, but you can never be sure), I have noticed at least one instance where a clone of my site has ranked just below my own page in the search engine results page for a particular high-value keyword.
But Does It Really Hurt?
A few weeks ago I woke up to the usual list of emails telling me about each new person who has subscribed to my blog promotion tips newsletter. I check each of these emails to see which sites people are joining up from. This particular morning one of the referrers was from a site that was not mine. The blog clone had recruited a newsletter subscriber for me!
Of course it depends exactly how other people are ripping off your content, but in this case when the culprit is using an automatic RSS publishing tool it republishes my content EXACTLY how it appears, which means it includes all my affiliate links, links back to my site and links to my newsletter. This duplicate site may be stealing my content, but it’s also helping to make sales for me, sign-up newsletter subscribers and bring new visitors back to my site. It’s hard to complain about that.
Overall of course I don’t like people using my content without asking for permission first, but at least it’s not all bad.
Yaro Starak
Blog Content Thief Victim
Subscribe to Entrepreneurs-Journey.com
Free with subscription
"How To Start An Internet Business
& Make Your First $1000 Onine"
Forward to Friend
Email a copy of this article to a friend
Comments
Leave a comment
Trackbacks
-
1
[...] This morning as I was scanning my FeedDemon categories before heading out to Sunset beach, I saw what has to be about the 30th post I’ve read on content theft and copyright violation in the last month. The one I read today was by Ausie blogger Yaro Starak. [...]



















I’ve had the same thing happen.
I actually feel flattered and not that bothered, since as you say, the search engines will take note of the authority links coming into my site and there’s not much danger of being ranked lower than the content thieves.
The sites ripping from rss also include links back to the original site too, which is fair enough to me. What i worry about are those that grab the raw text of the content w/ no links or attribution back to me. They’re the real danger because they could garner authority links themselves on the back of my content.
Hi Yaro,
I’ve noticed this trend go up significantly in the last two months. I’ve even put one guy on the front page of my blog and emailed everyone I knew he was stealing from because he was using the program Autoblog.
Now, They’re not even bothering to steal whole pieces . . . they start in mid-paragraph some of them. I suppose I wouldn’t mind if they asked and weren’t just using the content as ad fodder.
Liz
Hey Yaro!
The key is, I think, to make sure all the links on your site include the domain name. While some scrapers will strip all the links, the majority seem to just use the content as is, which means they will feed all the traffic back if the links are rooted.
Have you noticed that Gray Wolf recently added a copyright statement, pointing out what the correct URL should be, to his feed? Another way to appproach it.
As for the ‘dup content’ issue – I’ve never come across a suggestion that authority matters, I’ve only seen age discussed, ie the engines think the oldest reference to the material is the one that matters. Makes you wonder if we shouldn’t hold our feeds for 24 hours or so, just to let the bots come through us first… (and, really, who knows what rules the engines use!
)
Lots of good points everyone. It’s really hard to say regarding all the search engine issues since we don’t exactly know how the duplicate content penalties apply.
I’d like to think that authority links and age of content protect us from the thieves but I still find it disheartening when I see my article on someone else’s site ranking well in the search engines.
I think in a lot of ways we just have to rely on Google et al to make ripping off content useless. Afterall most of these content rippers are using AdSense as their monetization strategy so if their pages don’t rank well they don’t make money.
Hi Yaro.
I also see the trend that some people seem to believe every published RSS-Feed and every photo that is published on flickr is free for any use.
Even if they provide links and subscribers: When they don’t ask and get permission, go after them by all means necessary.
There was a great post a long time ago on the osC forum about stealing content and how to stop it. A company in Europe put up some sites at several different hosts and then sent the hosting companies emails (from hotmail accounts) stating that the sites were using copyrighted material without authorization and wanted the content removed or they would sue. Only one hosting company questioned the validity of the request, all the others just took the sites down, no questions asked.
If the content is on a persons web site just contact the host, for $8 per month a host could not be bothered to figure out if it is legit or not. They just don’t want to get sued over one customer. For other blogging sites I am sure they have legal departments you can contact and ask that the content is removed.
I read your posts all the time on Search Engine Feeds. I am most certainly not the only one who does, but I have noticed sites listed on SEF got a boost in rankings over the past few months, so I am not sure RSS syndication will (at least from that site) hinder your SEO efforts. From what I have seen so far, they help ranking efforts and drive more targeted traffic. That’s the point of offering a feed, so people can subscribe to your site in any means they like and unsubscribe easily. Where it gets touchy is when a whole site is driven off feeds from other sites. That is exactly why SEF has included a Featured Feed Sources section on the site to link back and promote all the feed sources whose content is being used and syndicated for others to read.
Have you noticed referrers from Search Engine Feeds? Might that be the place you got an email subscriber from? Just wanting some feedback.
We also used to provide full feeds figuring “how could it hurt?” Well … Yaro, you are absolutely correct with your first concern on search engine ranking. The duplicate content if it gets bad enough WILL impact your SEO, both directly and indirectly.
Directly due to the duplicate content verbatim. Indirectly, in cases where scraped content does not include a link back to your site, it’s actually creating competition for your targeted keywords. In addition — even if there are links going back to your domain, in most cases the referring sites are not relevant to your topic so the inbound link has little to no positive impact on your ranking.
Plus, just as importantly, some of the sites where your scrapped content appears can be damaging to your reputation. We’ve had scraped content appear on sites with references to pornography and other junk we don’t want to be associated with.
For these and other reasons, we now use a summary feeds.
On the legal front, going straight to the offender’s host is a great suggestion. a few other things that you can do are:
* Step up the legal response by working with an attorney
* Put a strong copyright tag at the end of your original content posts (not Creative Commons)
* If you insist on using full feeds, insert a copyright message in your RSS feed
Since I’m already long here, and want to respect your comment area Yaro, I’ve expanded on these and other points over at Advanced Business Blogging. Your post was the catalyst for a two part series “Six Simple Steps to Combat Blog Content Theft, Ranking Degradation, and Damage to Your Reputation“. (Part one just went up.)
Thanks for the discussion! It’s time for all content theft victims to fight back.
Hi Yaro.
The above poster makes some valid points but I believe folks get a bit too paranoid at times with what the big G is or isn’t doing, thinking, or planning. The search engines also have a way of establishing a certain reputation with sites that have been around a while and I’m sure penalties, de-rankings and the like are not just dished out willy nilly, based on all the inevitable and automated comings and goings.
As my site MrRoomfinder is mainly Bangkok based, I’ve got back links coming in from some real dark and seedy forums from around the globe, including the world sex guide and all manner of weird places which cater for ladyboys bar girls and red light entertainment. Roomfinder doesn’t seem to have suffered as a result of this. You see, the thing about non-reciprocal links pointing to your web pages is that you often have no control over who’s linking to you and of course the SE’s know this.
I also think there’s a little too much concern over duplicated content and penalties. If you have your site cloned then that’s a different story. It’s the duplicated sites not duplicate content that gets a kick up the butt.
Think about it, Article Minor and the Ezine Articles directories plus all the other article directories out there, give out tens of thousands of the same articles every month. Many of these articles are used as valuable content on multiple websites. It’s the nature of their legit business.
However, these sites do not have the same page layout, they are not of equal bytes, and they do not use the same fonts, or font sizes, font colours, headers, graphics and all the rest of it. All they do is publish a bit of content available on through public domain.
If Google and the other SE’s wanted to go around giving penalties for every website that displayed an article that was picked up at a different location, they’d have very little time for anything else.
Unfortunately, I’ve heard of stories where some folks have their entire sites ripped off and Google throws their penalties at the wrong webmaster leaving the thief in-tact-and-ranked with his virtual swag. It’s just hearsay but sadly cloning does go on.
Aitch
At least we don’t generally have to worry about a company stealing our entire companies identity, unlike NEC!
Should be interesting to see how this is controlled in the future. The main problem is that with the “Content Originators” have all the rights to their work and no way of fighting against this sort of thing. Sure it could be ‘flattering’ to some and ‘fraud’ to others because a lot of people big in the blogging community have faced this like Steve Pavlina, Guy, and others deal with this sort of thing AND of course Yaro
This and identity thief are the few things that I have always been concern about. This is also one of the main resons why I’m undecided if I should blog with my real or psudo-name.
As they say the moment you put something online, it is open season, and nothing you can do can control the spread, except leaving it off line.
At least content thieves are using automated systems, then its not so bad. It could be out right plagarism by a human.
As long as the “leeching” site also copies ALL LINKS VERBATIM, I have no problems with this. It will become problematic if they RELINK or STRIP my links and place their own.
That’s the problem with rss (and xml in general – including XHTML); while it’s great for structuring data and sending it to your readers in a convenient format, it also makes it very easy indeed to harvest the content for your own purposes.
DRM on rss feeds anyone?
I would not be so sure Google has a handle on this, Matt Cutts has said that their BigDaddy Datacenter will improve canonical issues but if you visit Google Groups you learn that those with few backlinks (and have no 301 redirect in place) are still getting hosed.
A site called hitslog has been aggregating my posts for months from seobuzzbox.com and still has not gone supplimental. In fact, before I got a few backlinks my posts went supplimental, what does that tell you? You got it, if you have enough backlinks and authority you can make it, weenie bloggers are collateral damage often in todays search.
Smart webmasters do not wait around for the primative search engines to get it right.
Yaro: We discovered that one of our popular sites was being beaten in the SERPs by an RSS aggregator site that we had subscribed to!
Now, although we had signed up legitimately (when this was a smaller service), they had grown in prominence and eventually were getting search engine hits for our keywords.
Our first line of attack was to contact the company and ask them to withdraw our content. They were slow to respond, so we cut down our feeds to avoid this continuing.
That’s a scary story Gerard. It’s certainly a problem that is getting worse.
I’ve noticed strange things happen now when really popular blogs link to me a day later all the clone blogs link to me too as the replicate the story from the big blog.
One thing that has pissed me off is seeing people publish my newsletter articles without permission.
Yep, I have had the same thing happen to me on some of my websites, but I didn’t see any change in rankings. I think its a great way to build one way links and I plan on using it for my new site at:
http://www.ewebtvworld.com
Cheers!
Actually, some people want you to use their content on your site. However, you have to provide a backlink to their site. Nonetheless, duplicate content might be a problem for your SEO. However, there is a debate about the seriousness of duplicate content in SEO.