RSS Scraping: Why It Is Not OK

Many site owners, bloggers and forum admins will end up dealing with RSS-scraping content theft at some point since it is one of the most common types. The scraper can easily grab content from any site and use it as their own for search engine gains. It is definitely one type of spam and content theft that is the most despised.

What Is RSS Scraping?

RSS scraping is basically when a spammer or other third party grabs RSS content and then republishes it on another site. In this manner, automatic scrapers work similar to Google Reader where they grab your website content and display it on another site. However, the content is typically hidden behind a password protected wall with Google Reader which only the subscriber can access, whereas with scrapers, they display this content on sites that are public and anyone can view it.

What RSS Scrapers Have To Say

People who scrape RSS feeds will argue that the content is up for grabs since it is placed in an RSS feed to begin with and it basically comes with permission to use elsewhere creating what is known as an implied license to republish it. However, this is not the case and in reality RSS scraping is still considered an infringement of copyright since content is being scraped and used without permission.

Another common argument by scrapers is that RSS feed content is under fair use. Again, this is not the case either. There are governed guidelines that make up fair use which include:

how the content is being used and for what purpose, including if the content is being used commercially or for educational purposes
the copyrighted work nature itself
how much content was actually used of the copyrighted work
the value or effect the copyrighted work had on the potential market

Going by these guidelines, it is safe to say that scraping is a leading example of something that is not fair use. Scraping grabs all the content, republishes it, uses it for profit and most of the time without attribution, while adding no educational, criticism or commentary value. Because it is simply a duplication of content already out there, it damages the market significantly as well.

How To Stop RSS Scraping

So, how can a site owner, blogger or other content owner stop their RSS feeds from getting scraped and receiving duplication penalties, or even worse, having the scraped content rank higher in search engines than their own? There is always some extent of RSS scraping and it is almost impossible to go after every content thief to slam them with a DMCA takedown request. There are some things you can do, however.

1. Find Scraped Content

The first thing you need to do is find your scraped content. The simplest way to do this is to search for your post titles in Google inside quotation marks. You might find various sites that syndicate your headlines, which can be ignored, but you are ultimately looking for the sites that have republished your content. There are decent theft checker tools available for you to use to help find your stolen content such as CopyScape Premium which is a plagiarism detector.

2. Have the Site Shut Down

Spam and stolen content is taken very seriously by web hosting companies and if they suspect any illegal hosting activities, they will take prompt action. If you find any of your content on another site, you can contact their webhosting company and report them. The webhosting company will take action if they find your request is genuine.

3. Anti-Scraping Services

There are services out there like the ScrapeSentry Anti Web scraping services that help to block scraping from happening altogether. This particular type of service will monitor traffic that goes into websites that are protected for suspect behavior, anomalies and other signs of misuse. The normal user will not even detect this type of service as it is completely invisible and this service does not cause any types of issues to the website’s usersâ experience. When malicious traffic is detected through the service, it blocks it automatically and it will send an alert to a security operator if the risks seem to be low or obscure, depending on the customer’s particular Incident Response Plan. This ensures accurate detection while not imposing any issues for wanted users.

RSS scraping is a big deal and can hurt a genuine site owner. Taking precautions against this type of activity is therefore important. Knowing the legalities of scraping is also important. Overall, preventing this type of spam from happening altogether is your best weapon.

One thought on “RSS Scraping: Why It Is Not OK”

Jane

October 22, 2013 at 4:37 pms Reply

RSS has lost its whole purpose these days. It was introduced to make things easy and handy. Now it has become a primary tool for content lifters to steal and republish content – that too very easily. Autoblogs still do exist and these content lifters simply set up their stealing strategy once and forget it forever; the lazy way of stealing. But I do wonder if they get any good from those auto blogs!

Thanks for sharing this Nolan. I do agree with you!

Loading...

RSS Scraping: Why It Is Not OK

What Is RSS Scraping?

What RSS Scrapers Have To Say

How To Stop RSS Scraping

Like this:

Related

One thought on “RSS Scraping: Why It Is Not OK”

Leave a ReplyCancel reply

What Is RSS Scraping?

What RSS Scrapers Have To Say

How To Stop RSS Scraping

Share this:

Like this:

Related

Related Posts

7 Simple Tips on How to Build a High-Performance Software Development Team

7 Best Web Development Frameworks To Leverage In 2026

Laravel Development: Best Practices to Keep in Mind

One thought on “RSS Scraping: Why It Is Not OK”

Leave a ReplyCancel reply