Many site owners, bloggers and forum admins will end up dealing with RSS-scraping content theft at some point since it is one of the most common types. The scraper can easily grab content from any site and use it as their own for search engine gains. It is definitely one type of spam and content theft that is the most despised.
What Is RSS Scraping?
RSS scraping is basically when a spammer or other third party grabs RSS content and then republishes it on another site. In this manner, automatic scrapers work similar to Google Reader where they grab your website content and display it on another site. However, the content is typically hidden behind a password protected wall with Google Reader which only the subscriber can access, whereas with scrapers, they display this content on sites that are public and anyone can view it.
What RSS Scrapers Have To Say
People who scrape RSS feeds will argue that the content is up for grabs since it is placed in an RSS feed to begin with and it basically comes with permission to use elsewhere creating what is known as an implied license to republish it. However, this is not the case and in reality RSS scraping is still considered an infringement of copyright since content is being scraped and used without permission.
Another common argument by scrapers is that RSS feed content is under fair use. Again, this is not the case either. There are governed guidelines that make up fair use which include:
- how the content is being used and for what purpose, including if the content is being used commercially or for educational purposes
- the copyrighted work nature itself
- how much content was actually used of the copyrighted work
- the value or effect the copyrighted work had on the potential market
Going by these guidelines, it is safe to say that scraping is a leading example of something that is not fair use. Scraping grabs all the content, republishes it, uses it for profit and most of the time without attribution, while adding no educational, criticism or commentary value. Because it is simply a duplication of content already out there, it damages the market significantly as well.
How To Stop RSS Scraping
So, how can a site owner, blogger or other content owner stop their RSS feeds from getting scraped and receiving duplication penalties, or even worse, having the scraped content rank higher in search engines than their own? There is always some extent of RSS scraping and it is almost impossible to go after every content thief to slam them with a DMCA takedown request. There are some things you can do, however.
1. Find Scraped Content
The first thing you need to do is find your scraped content. The simplest way to do this is to search for your post titles in Google inside quotation marks. You might find various sites that syndicate your headlines, which can be ignored, but you are ultimately looking for the sites that have republished your content. There are decent theft checker tools available for you to use to help find your stolen content such as CopyScape Premium which is a plagiarism detector.
2. Have the Site Shut Down
Spam and stolen content is taken very seriously by web hosting companies and if they suspect any illegal hosting activities, they will take prompt action. If you find any of your content on another site, you can contact their webhosting company and report them. The webhosting company will take action if they find your request is genuine.
3. Anti-Scraping Services
There are services out there like the ScrapeSentry Anti Web scraping services that help to block scraping from happening altogether. This particular type of service will monitor traffic that goes into websites that are protected for suspect behavior, anomalies and other signs of misuse. The normal user will not even detect this type of service as it is completely invisible and this service does not cause any types of issues to the website’s usersâ experience. When malicious traffic is detected through the service, it blocks it automatically and it will send an alert to a security operator if the risks seem to be low or obscure, depending on the customer’s particular Incident Response Plan. This ensures accurate detection while not imposing any issues for wanted users.
RSS scraping is a big deal and can hurt a genuine site owner. Taking precautions against this type of activity is therefore important. Knowing the legalities of scraping is also important. Overall, preventing this type of spam from happening altogether is your best weapon.
Nolan is a Network Security Expert and presently he places in Washington. Along with this he has a passion of writing technology blogs. He always tries to write something that helps his users in updating their antivirus software for computer security.
1 Comment
Leave a Reply
Cancel reply
Leave a Reply
This site uses Akismet to reduce spam. Learn how your comment data is processed.
Jane
October 22, 2013 at 4:37 pm
RSS has lost its whole purpose these days. It was introduced to make things easy and handy. Now it has become a primary tool for content lifters to steal and republish content – that too very easily. Autoblogs still do exist and these content lifters simply set up their stealing strategy once and forget it forever; the lazy way of stealing. But I do wonder if they get any good from those auto blogs!
Thanks for sharing this Nolan. I do agree with you!