WordPress spam…

So, I was looking at the start of the year about 8% of my stats was SPAM.yuck. Then something insane happened this week, it jumped to 28%.

So I crossed that point when something would have to be done!

I’ve already installed stuff to detect the spam, and it does a good overall job.  But I wanted to take it to the next level and block all traffic from the spammers! Anyone who SPAM’s probably is engaged in other nonsense that makes me not want their traffic.

Thankfully for me and this brave new era of google, I could quickly find someone has done 99% of the leg work for me right here! Thanks to Sakis’s hard work I was able to add some minor tweaks, and generate a full iptables config, flush & add the new rules, then have cron run it every few minutes.

Pretty cool stuff if I do say so myself!

Since the primary site is now offline, I’ve updated with an archive.org link. For what it’s worth, here is the meat of the article in question:

Dodging WordPress comment spammers

I admit: Allowing anyone to post comments is bad practice. Though, I’ve got my reasons to stand my ground. I’ve many times read something on a blog and to some of them I even had something to add. Could potentially help blog’s author or future visitors by sharing my own experience or request a solution to one of my problems by posting a question. Guess what? I am so lazy that I rarely go through registration procedure, just to enable me posting a comment.

I am one of those that insist dialog and discussion is always constructive as long as both ends feel like establishing it. I do not want to lose the opinion and comments of stopping-by visitors, just because I want a “safe” thing that runs on its own. But, “buts” exist. My blog is currently one month old, still it manages to receive 300+, in average, spam-oriented comments per day, while I’ve even witnessed a 1k/day.

Thank god, WordPress provides blacklist features based both on IP addresses and comment content. And it really does a good job: After messing around with your recent “spam” you can easily end up with a list that accurately detect a non constructive comment. However, you’ve not solved all your problems this way:

  • New comments still come. They are just automatically rated as spam.
  • Your database fills with garbage.
  • Your web traffic statistics are spoiled.
  • You waste bandwidth.
  • You waste CPU time.
  • If your spammer ever stop selling drugs and starts advertising flesh, all your content matching rules go away.
  • If your spammer loose interest into being a blog spammer and switch to a port-scanner, you will receive that too.

How about you refuse them a spare TCP socket? Besides, you don’t even wanna know them. All their connection attempts will end-up to void. Time for some “iptables” magic.

WordPress has already stored their IP addresses within its database. Consult that wp-config.php file you lately edit when you firstly installed WordPress, and refresh your memory on what your database name, username and password is. Mine are:


$ grep "DB_" wp-config.php

define('DB_NAME', 'mywordpress');

define('DB_USER', 'sakis');

define('DB_PASSWORD', 'myextrastrongpassword');

define('DB_HOST', 'localhost');

define('DB_CHARSET', 'utf8');

You now have to use that information into constructing this single-row command:

Check my example:

$ mysql -f -p --user=sakis mywordpress <<<"select distinct CONCAT('iptables -A INPUT -s ',comment_author_IP,'/32 -j DROP') from wp_comments where comment_approved='spam' order by 1 asc" | grep -v "^CONCAT" >> THEY_BOTHER_ME
Enter password:
$ head THEY_BOTHER_ME
iptables -A INPUT -s 113.161.128.232/32 -j DROP
iptables -A INPUT -s 117.121.208.254/32 -j DROP
iptables -A INPUT -s 118.141.141.7/32 -j DROP
iptables -A INPUT -s 118.194.1.157/32 -j DROP
iptables -A INPUT -s 119.235.27.100/32 -j DROP
...

You now have a simple recipe, named “THEY_BOTHER_ME”, ready to be executed (as root):

$ su

# . ./THEY_BOTHER_ME

Make sure you hook “THEY_BOTHER_ME” at your system’s start-up procedure and construct a cron/at job to periodically refresh it.

I’ve created a file named /etc/cron.daily/update_spammers.sh, with the following contents:

#!/bin/sh

fileloc="/etc/THEY_BOTHER_ME"

before=`cat "${fileloc}" | wc -l`
before=`echo ${before}`

cp "${fileloc}" /tmp/BOTHERS.$$

mysql -f --user=sakis --password=myextrastrongpassword mywordpress <<<"select distinct CONCAT('iptables -A INPUT -s ',comment_author_IP,'/32 -j DROP') from wp_comments where comment_approved='spam' order by 1 asc" | grep -v "^CONCAT" >> /tmp/BOTHERS.$$

sort /tmp/BOTHERS.$$ | uniq > "${fileloc}"
rm -f "/tmp/BOTHERS.$$"

. "${fileloc}"

after=`cat "${fileloc}" | wc -l`
after=`echo ${after}`

di=`expr ${after} - ${before}`
di=`echo ${di}`

printf "[%s] Spammers updated. Added %d new spammer(s) (Before: %d, After: %d)\n" "`date`" ${di} ${before} ${after}

And sadly, his original script is now offline.  This should be enough for anyone to get going on this exciting spam adventure…

4 thoughts on “WordPress spam…

  1. Be careful with blocking legacy IPv4 addresses… The use of CGNAT is extremely widespread so there might be many thousands of real users behind a single address. One customer of the ISP becomes infected with malware which starts sending out spam, and suddenly all customers are now blacklisted and can’t visit various sites. And from a user perspective, the site will look to be down.
    In some cases you don’t even need any malicious behaviour, simply a large number of real users from a single address is enough to trigger a block in many cases.

    I have frequent troubles accessing legacy IPv4 sites because of this. For more modern sites that have implemented IPv6 i never have a problem.

    • It’s been about a decade since I went through this, and I ended up using the askimet plugin to deal with the spam. It was a much better way to deal with the noise, and still allow me to access the blog from within China when I was there.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.