Saturday, August 4, 2012

Deal with Blog Scrappers getting indexed quicker than the original site

Although its not something that one should be worried about as such things often happen, and its actually a sign that you are growing. I would suggest you to just keep going on with the quality content on your site and not to worry about them scrapping your articles. Google does a pretty good job in killing spam blogs. They generally gain traction for a month or so and then they are completely gone.
But sometimes it might happen that the spam blog site might be getting indexed quicker than the original site when your original site is pretty much new, so it can be a temporary hold for your organic traffic growth. In such case, we can deal with them by delaying the feeds for a certain amount of time as all these scrappers work by pulling articles from your feeds and then publishing your articles on their site.

 

Delay publishing of WordPress Feeds:

Here is the snippet with you can delay your feeds for (lets say 15 minutes):
01/**
02 * Publish the content in the feed 15 minutes later
03 * $where ist default-var in WordPress (wp-includes/query.php)
04 * This function an a SQL-syntax
05 */
06function publish_later_on_feed($where)
07{
08    global $wpdb;
09    if ( is_feed() )
10    {
11        // timestamp in WP-format
12        $now = gmdate('Y-m-d H:i:s');
13        // value for wait; + device
14        $wait = '15'; // integer
15        // http://dev.mysql.com/doc/refman/5.0/en/date-and-time-functions.html#function_timestampdiff
16        $device = 'MINUTE'; //MINUTE, HOUR, DAY, WEEK, MONTH, YEAR
17        // add SQL-sytax to default $where
18        $where .= " AND TIMESTAMPDIFF($device, $wpdb->posts.post_date_gmt, '$now') > $wait ";
19    }
20    return $where;
21}
22add_filter('posts_where', 'publish_later_on_feed');
This will delay the feeds for 15 minutes (Line 14 in the code) before any new article appears in it. This is a very good approach in killing those automated blogs. But sometimes it can be the case, that they are not automated. Its humans manually copy-pasting the articles from various sources. In such a case, what you can do is to make your blog ping the crawl bots so that your chances of getting indexed first is maximised.

Checklist for fast indexing:

  • Submit a Sitemap to Google Webmasters.
  • Use PushPress and RSS Cloud WordPress plugin.
  • Use WordPress option to ping pinging service and add several multiple pinging service there (less effective now but doing it won’t harm)
  • Delay your feeds for a few minutes (Scrappers won’t be manually monitoring your site every minute)
Hope that helps you defeat those blood sucking scrappers. If you have any questions or tip, feel free to leave it in the comments below.

No comments:

Post a Comment