flag of the United Kingdom
URBAN
Mainframe

User Comments

(for: Blocking the Referrer Spammer)
1 | Posted by: Aankhen (Guest) | ~ 1 year, 11 months ago |

RewriteCond %{HTTP_REFERER} ^http://(.*)?viagra(.*)?.(com|net|org)/(.*)?$

You seem to have a fair number of extra question marks in there. The question mark indicates ‘zero or one of the previous atom’, while ‘.’ indicates ‘zero or more of the previous atom’, so for example, (.)? breaks down to ‘zero or one of (any number of characters)’.

Aside from that, you haven’t escaped the dot before (com|net|org), so even .icom or .enet or .asdffhfhorg could match.

Finally, since the capturing brackets aren’t used, they can probably be done away with altogether.

The rule could be rewritten as:

RewriteCond %{HTTP_REFERER} ^http://.viagra..(com|net|org)/.*$

Hrm… no brackets does it make it harder to read. Here we go:

RewriteCond %{HTTP_REFERER} ^http://(.)viagra(.).(com|net|org)/(.*)$

2 | Posted by: DarkBlue (Registered User) | ~ 1 year, 11 months ago |

Aankhen, unfortunately my CMS has not handled your reply very well.

I have uploaded an unformatted ASCII version of your message so that others can see your reply as you intended.

You are quite right, it seems the question marks are superfluous. For some reason, I always think of dot-star as meaning “1 or more of” - so I use the question mark to make it optional. But you’re right, it actually means “0 or more of”.

In my httpd.conf file, the dot before “(com|net|org)” is escaped - but I didn’t escape the escape when I posted to my CMS, so it was lost!

I must make an effort to proof-read more carefully in future.

Thanks for your comments and corrections. They are really appreciated.

3 | Posted by: Gabriel Mihalache (Registered User) | ~ 1 year, 11 months ago |

You know what’s really fun? The fact that people actually buy viagra off these kind of links and sites. (otherwise, these tacticts wouldn’t be so succesful — the was once this survery about spam-based purchases, but I’m drifting off subject again)…

Well, if people find this crap useful, and you want to best serve your visitors it is you who should link to these sites and refer-spam to them. That is, if your audience is representative of the population’s average. :-)

4 | Posted by: Scott Johnson (Guest) | ~ 1 year, 11 months ago |

This type of referrer spamming really puzzles me, too. My site gets over a hundred spams per month from each of at least three different sources. And I’m the only person who ever sees those links in my stats pages. Perhaps there is a tool out there somewhere that just spams random URLs all day long. Who knows!?

5 | Posted by: DarkBlue (Registered User) | ~ 1 year, 11 months ago |

I’ve adjusted the referrer spam handler a little. In the original post, the following rewrite rule is described:

 RewriteRule .* - [F,L]

I’ve changed the process, the rewrite rule I am now using reads:

 RewriteRule .* %{HTTP_REFERER}

The former rule, while keeping the spammer out of my referrer logs, was resulting in wasted bandwidth (from my server) as it serves my 403 Forbidden page.

The new rule bounces the spammer back to his/her own server so, apart from the small overhead of the redirect, the only bandwidth wasted is the spammers.

I prefer it this way! ;-)

6 | Posted by: DarkBlue (Registered User) | ~ 1 year, 11 months ago |

Perhaps there is a tool out there somewhere that just spams random URLs all day long.

There are many such tools Scott.

7 | Posted by: DarkBlue (Registered User) | ~ 1 year, 10 months ago |

Referrer Spam Should Be a Crime - enough said.

I’m now using Linux’ iptables along with mod_rewrite to keep these pathetic SOBs off this website:

    iptables -I INPUT -s xxx.xxx.xxx.xxx -j DROP

Where “xxx.xxx.xxx.xxx” is the IP address I want to deny.

Works like a charm.

8 | Posted by: DarkBlue (Registered User) | ~ 1 year, 10 months ago |

I just had to update this thread: I now have a new weapon in my anti-referrer-spammer armoury, a weapon that is an ICBM besides the water pistol of my previous defences (based on mod_rewrite). I need to test some more but it’s looking really good at the moment.

I’ll post details once I’m sure that it’s working as expected!

9 | Posted by: mr strauss (Guest) | ~ 1 year, 9 months ago |

thank you for providing a solution to this totally irritating problem. I have to admit I’m a little skittish about trying it, since I’ve never done any of this type of modification before. But it seems easy enough. Even if I don’t end up trying it, I appreciate that you have taken the time to post a solution. Decent web citizens like you are necessary to fight against the evil of spam in all its forms.

mr strauss pop goes lethal

10 | Posted by: DarkBlue (Registered User) | ~ 1 year, 9 months ago |

I’m glad the information was useful to you Mr. Strauss. I understand why you are a little reticent to employ this solution, but I can assure you that you can’t do anything that can’t be undone when you use mod_rewrite in this manner.

Having said that, I will soon be releasing an Apache module specifically designed to keep the referrer spammer at bay. I have been testing this module for several weeks now and it has matured into a very reliable defence. As an Apache module, it’s easier to get to grips with than mod_rewrite and, significantly, it requires no ongoing maintenance (with the mod_rewrite solution one has to constantly revise the rewrite rules as new spammers appear).

Watch this space for news of its release.

11 | Posted by: mr strauss (Guest) | ~ 1 year, 9 months ago |

Re:

For performance and efficiency, httpd.conf is preferred. Apache parses the .htaccess file on every request, whereas httpd.conf is parsed only once, at server start up.

Can you please explain this better for me? Here’s what I’m not clear on. Does this mean that Apache, with .htaccess, checks against a “blocklist” for every single request - like say my page is made of sliced images, would it check against the list for every one of thos requests for an image - like it doesn’t know that it just checked?

And the second type - httpd.conf - it seems like your are saying that it’s a RAM/hard disk thing. Like the server sort of “keeps it in memory.”

Am I interpeting this correctly, or am I way off? Keep in mind that I’m kind of a beginner at this, so please dumb down your answer if possible.

Thanks a lot for your help, I appreciate that your time and energy are valuable.

mr strauss

12 | Posted by: DarkBlue (Registered User) | ~ 1 year, 9 months ago |

Mr. Strauss, your interpretation is correct.

Apache will process the “.htaccess” file for every single request, regardless of the file type that is requested.

On the other hand, “httpd.conf” is loaded when the Apache “httpd” process is started (that is, when the Apache web-server starts). Once it has been processed, Apache “remembers” the configuration profile by keeping it in RAM.

It’s a little more complicated than that, but that’s the simplest explanation.

It is obvious which is the most efficient mechanism. If performance is a concern, then “httpd.conf” should always be used.

13 | Posted by: Keef (Guest) | ~ 1 year, 3 months ago |

Has anyone been able to get modrewrite to work for /cgi-bin ?

I’ve ended up using a .htaccess for my www root and a iptables rule to handle the direct /cgi-bin requests.

TIA - Keef

Your Comments
  • Formatting your comments
  • A valid email address is only required if you wish to receive notifications of new comments posted in relation to this page


remember my details:
notify me of new comments:


W3C VALIDATE XHTML
W3C VALIDATE CSS