flag of the United Kingdom
URBAN
Mainframe

No More Spam!

Date:  Tue, 16th-Mar-2004prevnext

Tags: Commentary, Software, Spam

Anyone who has had an Internet account for longer than 0.00004 nanoseconds will be familiar with spam: Pre-approval for a $10,000 platinum credit card; the world's biggest online casino; the X10 camera; Viagra; penis enlargement; bulk cigarettes; mortgages; debt consolidation; Paris Hilton; amazing farm girls (and boys); getting a degree; getting ordained as a minister; everything I ever wanted to know about anybody - ever...

The breadth of spam topics is amazing and sometimes disturbing. Spam saturates my inbox, consumes expensive bandwidth, steals my precious time, saps my energy and so on. All these negatives and no obvious benefits.

I knew I had to take action when I found myself receiving, on average, 200 emails a day - the vast majority of which were spam. This article discusses spam, includes avoidance information and goes on to describe how I completely eliminated the inconvenience of spam from my inbox...

What is Spam?

(Also known as UCE or Unsolicited Commercial Email).

The State of California, USA has a definition that is nicely summarized by FindLaw in this article (PDF):

The statute defines "unsolicited e-mail documents" as "any e-mailed document or documents consisting of advertising material for the lease, sale, rental, gift offer, or other disposition of any realty, goods, services, or extension of credit" when the documents (a) are addressed to recipients who do not have existing business or personal relationships with the initiator and (b) were not sent at the request of or with the consent of the recipient. (§ 17538.4, subd. (e).)

Vint Cerf, the "Father of the Internet" offers his own definition:

"Spamming is the scourge of electronic-mail and newsgroups on the Internet. It can seriously interfere with the operation of public services, to say nothing of the effect it may have on any individual's e-mail mail system. ... Spammers are, in effect, taking resources away from users and service suppliers without compensation and without authorization."

Why is Spam Annoying?

I've often considered this question. Why is spam so reviled? After all, spam is just advertising - somebody on the other end is just trying to earn a crust. How hard is it to click delete anyway?

I think that Spam means different things to different people. Personally, I find spam annoying because:

  • It's unsolicited, with no (legitimate) way to opt out.
  • The majority of products and services offered are of no interest to me.
  • There is no demographic targeting: the first time I received an email offering breast enlargement I was quite amused, but the amusement soon dies when I continue to receive the same.
  • Similarly, there is no geographical relevance: I live in the United Kingdom (united?) Great Britain (great?) England, so what's the point in offering me discount vouchers for Kmart?
  • And the single, greatest annoyance of bulk email marketing is the deceit. The extraordinary lengths that the marketer will go to in order to hide his identity, the subject lines that are unrelated to the email content, the web-bugs, the obfuscated URL's, the anti-spam filter trickery and legendary "opt-out" or "unsubscribe" options whose sole purpose is to validate live email accounts. The devious methods used to harvest email addresses in the first place just compound the treachery.

Could it be Worse?

Bulk snail mail is usually personalised. Every day I give thanks to the God of the Digerati for the fact that spam isn't. Imagine how disturbing that would be:

Dear Jonathan,

We believe you would benefit from "DonkeyRod", our amazing, clinically proven, non-surgical, penis enlargement device...

Or, even more frightening:

Jonathan,

You girlfriend thinks you need generic Viagra...

Basic Anti-Spam Measures

The new email user can take some basic steps to limit the influx of spam:

  • Never, under any circumstances, follow an unsubscribe or opt out link in relation to email marketing
  • Don't use your permanent email address when posting to news groups
  • Don't use your permanent email address when posting to web forums (or weblog comments, etc)
  • Don't publish your permanent email address on your website, even in obfuscated form
  • Don't use your permanent email address on any website that doesn't publish a privacy policy that you're comfortable with
  • Don't post to any mailing lists that offer online archives within which author's email addresses are published in clear text
  • Don't subscribe to mailing lists that don't verify subscriber's email addresses
  • Don't buy anything advertised by spammers
  • Use a spam filter

By "permanent email address" I mean the one that you are trying to protect. This might be an address based on your own domain name, or one that has been assigned to you by your ISP or employer (et cetera).

I appreciate that many web services (forums, chat rooms, etc) require that you register with an email address before you can participate. In these cases, I recommend that you use temporary email addresses (ones that you can quickly and inconsequentially close down) such as those provided by web-mail services like Hotmail, Bigfoot, and so on.

Tracing the Spammer

Forget it. Sorry, but unless you have a lot of time to spare and incredible investigative skills, you are not going to be able to locate a spammer. They know and use every trick in the book to prevent you doing just that. It simply isn't worth the effort.

Tracing the Origin

For a couple of years now, I have been following best practices for spam control (as described above). Yet still the email came. I decided to end it all and committed virtual suicide - by shutting down my existing email account completely.

I bought my own domain name and set up a new permanent email address. I followed the best practices religiously. Despite these extreme measures, it wasn't long before the spam incursion started anew. My desperate measures had failed. Worse still, the volume of spam I was receiving was steadily increasing.

However, I had adjusted my best practices by the addition of one subtle innovation. Whenever I registered on a website, purchased online, posted to a mailing list, or subscribed to a forum, I always used an email address unique to and therefore identifying the website or service in question.

For example, when registering at amazon.co.uk, I used the email address amazon.co.uk@my_email_domain.com. When I post a comment on Dunstan's Blog, I use the email address 1976design.com@my_email_domain.com and so on [*].

Therefore, I am able to trace which unscrupulous website operators are selling my email address (or simply losing it through poor security) simply by examining the email address that a rogue email is addressed to. I am then able to pursue the organisation involved (not the spammer - since this is pointless) as I deem appropriate.

It is easy to see how much leverage this device gives. I could imagine receiving groveling emails literally begging for forgiveness and discretion from organisations whom I might challenge as a result of this mechanism - organisations that might be operating in breach of their own, published privacy policies. At the time of writing, none of my "honey trap" addresses has been used in this way though, which is extremely gratifying in itself.

Of course, had a honey trap been triggered, the compromised email address would already be out in-the-wild, so to speak. Therefore, I would subsequently explicitly block the compromised email address from receiving email in the future at the mail server itself. Effectively cutting off that line of communication.

This is an approach I heartily recommend if you have the means.

* NOTE: Neither amazon.co.uk nor Dunstan Orchard have been the origin of any spam I have received. I have used their domain names simply illustrate my point, not to incriminate them in any way.

Building My Defences - The Local Filter

Even with best practices, I found that I was still seeing upwards of 200 emails each day. Only a handful of which were messages that I actually wanted to receive. How could that be? Spammers often create lists of common names / words (and sometimes random strings of characters) and use them to "brute force" an email through to an unsuspecting recipient. Because I didn't want to have to set up a new email account every time I used a new identifying email address, I had configured my mail server with just one, catch-all address for my email domain. Thus, any email sent to any address at that domain would be automatically forwarded to my permanent email address. So the randomly-addressed spam still made it through!

Clearly, I needed a solution.

I didn't need to Google far before I came across the work of Paul Graham. Graham has invested an awful lot of time and energy in the pursuit of the perfect anti-spam solution. His work is far-reaching and extremely useful for anyone who has even a modicum of interest in the problem.

Graham's essays directed my thoughts towards the works of the Reverend Thomas Bayes (1702 - 1761). Now what, you are probably wondering, has an 18th-century cleric got to do with the problem of unsolicited email? I'm glad you asked, you see Bayes worked on the problem of calculating probabilities and his theorem leads to the mathematical formula used in many of todays' most advanced spam filtering programs.

After studying Graham's essays and Bayesian Probabilities, I decided that such a filter was just what the doctor ordered for my own spam management campaign.

A little more Googling and I had downloaded POPFile and installed it in front of my beleaguered email client. I was immediately rewarded. POPFile is one of the most accurate email filters out there. It works by analysing each email and classifying it into user-specified "buckets" (or folders). Each email is [optionally] tagged in its subject line with the name of the bucket and a web-interface provides for configuration, management, training and reporting.

The email client connects to POPFile (rather than the POP3 server) and retrieves the incoming email. Then the client's own filtering mechanism takes over and performs actions based on the POPFile prefixed subject line. For example, on my computer, POPFile prefixes the subject of any email classified as spam with "[spam]". My email client is configured to automatically move any email with such a subject line straight into the trashcan and mark it as "read".

The program takes a little while to "train" but, once the first few emails have been classified, it learns at an exponential rate. It is a very powerful ally.

POPFile quickly began to justify itself. Just before I added a remote filter (more on this later), I looked at POPFile's logs to assess its efficiency. The results were stunning:

Messages Classified:
33,722
Classification Errors:
27
Accuracy:
99.91%

The scope of my spam problem was clearly illustrated too:

Spam:
28,361 (84.10%)
Mailing List:
2,048 (6.07%)
Virus:
1,913 (5.67%)
Personal:
1,400 (4.15%)

Thus, from 33,722 emails, I only wanted to see 3,448 of them. The rest were all junk.

If I am a typical user, then you don't have to extrapolate far to appreciate the enormous burden that spam places on email servers and bandwidth. Not to mention the countless man-hours that are wasted in dealing this electronic noise.

The Remote Filter

POPFile was nothing short of a revelation. I quickly grew accustomed to having only legitimate emails in my inbox (POPFile does occasionally miss a spam missive, but these false-negatives are extremely rare as you can see from the statistics above).

However, there are two big issues with local filtering - even when those filters are as good as POPFile.

  • The false-positive. This occurs when POPFile mistakenly classifies a genuine email as spam. This is unavoidable due to the nature of the Bayesian scoring. It's a rare occurrence but, because there is a small chance of misidentification, one is still inclined to perform at least a cursory exploration of the "deleted" folder - just to make sure that there's nothing important in there. This can be time consuming and tedious.
  • Resource Consumption: Incoming email has to be processed by my mail server, POPFile and my email client. Thus consuming valuable processor cycles that could be better employed elsewhere.

Then I discovered Spam Arrest. To quote from the website, "Spam Arrest uses a combination of unique technology and a human touch to defeat the exponentially increasing problem of spam. Spam Arrest will save you time and money - and end the frustration of receiving constant unwanted email."

Doesn't it sound grand? It also works!

I subscribed to Spam Arrest about two weeks ago. $34.95 (USD) bought me a one year's subscription to this service and I got two additional months free! The system stops spam at my mail server, so it never even gets to my inbox. It employs various methodologies, a white-list, a black-list and a challenge/response mechanism and, for my purposes, is the Nirvana of email filtering.

In the short time I've been a subscriber, I haven't seen a single unsolicited email. Yet email from family and friends is getting through, my mailing list email gets through, my opt-in email gets through and I've had no problems with business email either (even my most technically-challenged acquaintances seem to have coped with the challenge / response mechanism). Furthermore, many of my peers have been so seduced by the obvious benefits of the service that they too have subscribed (hey Spam Arrest - how about a referral programme?).

Okay, you have to pay. But, to my mind at least, $35/annum is a small price to pay for a protected inbox.

I'm pretty happy with my email service now. Spam Arrest seems to work well and, if ever a rogue email slips through the net there's always POPFile serving as a last line of defence before my inbox - hopefully it won't have much work to do in the future.

Fighting Back

While researching this article, I came across an interesting concept from E-Scrub Technologies, Inc (I kid you not), called WPOISON. WPOISON has a single, noble purpose in life: to frustrate the spammer.

One of the ways spammers acquire email addresses is by harvesting them from websites. The spammer uses a piece of software that starts at a predetermined page then scours the HTML source for any email addresses contained within. When the page has been scanned, the software then recursively follows all the links on that page and repeats the process with each subsequent page it finds. The harvesting process builds up a database of addresses that should, assuming well-maintained websites, be valid.

WPOISON aims to upset this process by offering bogus email addresses and links to the harvester. The outcome of which should be that the spammer wastes time and money acquiring perfectly formed email addresses that are basically worthless.

In theory, if enough webmasters deploy WPOISON, email address harvesting will be rendered useless due to simple economies of scale.

Only time will tell if the effort is successful.

Related Entries

Links

Coalition Against Unsolicited Commercial Email

You can comment on this entry, or read what others have written (11 comments).


W3C VALIDATE XHTML
W3C VALIDATE CSS