This document is available on the Internet at:  http://urbanmainframe.com/folders/blog/20041115/folders/blog/20041115/

Blocking the Referrer Spammer

Date:  15th November, 2004

Tags:

concrete spheres

I have mostly passed under the radar of the Referrer Spammers, but my invisibility spell has gradually depleted. This past weekend (13-14 Nov, 2004) my access logs recorded 187 hits from referrer spammers. I'll admit that 187 hits isn't a lot, but I usually only see three or four of these per week.

I don't know why I've suddenly attracted this attention, but I do know that I'm not going to endure it...

“referrer spamming this website is an exercise in futility”

"fu·til·i·ty: The quality of having no useful result; uselessness." - The American Heritage® Dictionary of the English Language, Fourth Edition.

Referrer spamming urbanmainframe.com is an exercise in futility. I don't publish lists of referrers anywhere on this website. I am the only person who sees the referrers index and I don't click through to those domains that are obviously spamming.

I have already defended against those who try to steal my resources to send spam, those who spam my inbox and those who attack my comments handler. Now it's time to turn my attention to this new irritant.

Introducing My Personal Superhero: mod_rewrite

Fortunately, dealing with the referrer spammer is ridiculously easy. I have used the Apache module "mod-rewrite", the Swiss Army Knife of URL manipulation, to keep the spammers off my website and out of my referrer logs

With mod_rewrite installed on the web-server, rewriting rules can be deployed in either Apache's httpd.conf or .htaccess files [1]

RewriteEngine on
RewriteCond %{HTTP_REFERER} ^http://example.com$ [OR]
RewriteCond %{HTTP_REFERER} ^http://bad-referrer.com$
RewriteRule .* - [F,L]

The first line is, I hope, self-explanatory. The next two lines (RewriteCond) describe conditions that, when matching on an incoming request, will trigger the rule described in the forth line (RewriteRule).

The RewriteRule tells Apache to return a 403 (forbidden) error code if any of the preceding conditions (RewriteCond) are met. 403 errors are not recorded in the referrer log, so the spammer is effectively rendered impotent.

Rewrite Conditions

Whilst there is no single, magic RewriteCond that will keep every referrer spammer out, mod_rewrite does allow you to use regular expressions for pattern matching. So we have a lot of power in our hands:

  • RewriteCond %{HTTP_REFERER} ^http://(www.)?example.com/$ - matches if the referrer is "http://example.com/" or "http://www.example.com/"
  • RewriteCond %{HTTP_REFERER} ^http://(.*)?viagra(.*)?.(com|net|org)/(.*)?$ - matches if the referring domain contains the string "viagra" and has a TLD of "com", "net" or "org"
  • RewriteCond %{HTTP_REFERER} ^http://192.0.34.166/$ - matches if the referrer is "http://192.0.34.166/"

Testing the Ruleset

We can use "GET" from Perl's LWP bundle to test our rewriting conditions:

# GET -ds http://urbanmainframe.com/
200 OK

# GET -ds -H 'Referer: http://yourblog.com/viagra/' http://urbanmainframe.com/
200 OK

# GET -ds -H 'Referer: http://www.got-viagra.com/' http://urbanmainframe.com/
403 Forbidden

Further Reading

Credits

Thank you to Aaron Logan for the image entitled "Balls", which is kindly provided under the Creative Commons "Attribution 1.0" License.

Footnotes

[1] For performance and efficiency, httpd.conf is preferred. Apache parses the .htaccess file on every request, whereas httpd.conf is parsed only once, at server start up.