Blocking the Referrer Spammer
Tags: Apache, CMS, LAMP, Open Source, Perl, Programming, Software, Spam, SysAdmin, Urban Mainframe, Website Development
![]() |
I have mostly passed under the radar of the Referrer Spammers, but my invisibility spell has gradually depleted. This past weekend (13-14 Nov, 2004) my access logs recorded 187 hits from referrer spammers. I'll admit that 187 hits isn't a lot, but I usually only see three or four of these per week.
I don't know why I've suddenly attracted this attention, but I do know that I'm not going to endure it...
“referrer spamming this website is an exercise in futility”
"fu·til·i·ty: The quality of having no useful result; uselessness." - The American Heritage® Dictionary of the English Language, Fourth Edition.
Referrer spamming urbanmainframe.com
is an exercise in futility. I don't publish lists of referrers anywhere on this
website. I am the only person who sees the referrers index and I don't click through
to those domains that are obviously spamming.
I have already defended against those who try to steal my resources to send spam, those who spam my inbox and those who attack my comments handler. Now it's time to turn my attention to this new irritant.
Introducing My Personal Superhero: mod_rewrite
Fortunately, dealing with the referrer spammer is ridiculously easy. I have used the Apache module "mod-rewrite", the Swiss Army Knife of URL manipulation, to keep the spammers off my website and out of my referrer logs
With mod_rewrite installed on the web-server, rewriting rules can be deployed
in either Apache's httpd.conf
or .htaccess
files [1]
RewriteEngine on
RewriteCond %{HTTP_REFERER} ^http://example.com$ [OR]
RewriteCond %{HTTP_REFERER} ^http://bad-referrer.com$
RewriteRule .* - [F,L]
The first line is, I hope, self-explanatory. The next two lines (RewriteCond
) describe conditions that, when matching on an incoming request, will trigger
the rule described in the forth line (RewriteRule
).
The RewriteRule
tells Apache to return a 403 (forbidden) error code if any of the preceding
conditions (RewriteCond
) are met. 403 errors are not recorded in the referrer log, so the spammer is
effectively rendered impotent.
Rewrite Conditions
Whilst there is no single, magic RewriteCond
that will keep every referrer spammer out, mod_rewrite does allow you to use
regular expressions for pattern matching. So we have a lot of power in our hands:
RewriteCond %{HTTP_REFERER} ^http://(www.)?example.com/$
- matches if the referrer is "http://example.com/" or "http://www.example.com/"RewriteCond %{HTTP_REFERER} ^http://(.*)?viagra(.*)?.(com|net|org)/(.*)?$
- matches if the referring domain contains the string "viagra" and has a TLD of "com", "net" or "org"RewriteCond %{HTTP_REFERER} ^http://192.0.34.166/$
- matches if the referrer is "http://192.0.34.166/"
Testing the Ruleset
We can use "GET" from Perl's LWP bundle to test our rewriting conditions:
# GET -ds http://urbanmainframe.com/
200 OK
# GET -ds -H 'Referer: http://yourblog.com/viagra/' http://urbanmainframe.com/
200 OK
# GET -ds -H 'Referer: http://www.got-viagra.com/' http://urbanmainframe.com/
403 Forbidden
Further Reading
- A User's Guide to URL Rewriting with the Apache Webserver
- mod_rewrite.com
- Mastering Regular Expressions, by Jeffrey E. F. Friedl
- Apache: The Definitive Guide, by Ben and Peter Laurie
- libwww-perl
- Perl & LWP, by Sean M. Burke
Credits
Thank you to Aaron Logan for the image entitled "Balls", which is kindly provided under the Creative Commons "Attribution 1.0" License.
Footnotes
[1] For performance and efficiency, httpd.conf
is preferred. Apache parses the .htaccess
file on every request, whereas httpd.conf is parsed only once, at server start
up.
You can comment on this entry, or read what others have written (13 comments).