[SpamCop.net - protecting the internet through technology]

[SpamCop-Digest] Using SpamAssassin to detect and sort spam.

Pete Stephenson pete at heypete.com
Wed Jul 23 03:51:22 EDT 2003


Greetings again,

In this edition of the SpamCop Digest, I will discuss the use of the 
free spam filter called SpamAssassin[1] to detect and sort spam.

While I believe it's safe to say that all subscribers of this mailing 
list are indeed quite frustrated about receiving spam, and most 
report most if not all of the spam they receive via SpamCop. Overall, 
I believe that anti-spammers will win, and that spam will no longer 
be tolerated anywhere on the internet. However, while we're waiting 
for that day, we still have to deal with the huge volume of spam we 
receive daily (I generally receive anywhere between 50 and 200 spams 
per day, with it varying randomly every day).

Many of us get spam at our work email accounts in addition to our 
personal accounts, and many businesses are fearful of implementing 
server-side filtering because they fear it might result in 
false-positives and blocking of legitimate mail, which, for a 
business, would be intolerable (and reasonably so). Since employers 
hesitate to block spam and implement filters, our accounts receive 
more spam, and more of our work time is being spent dealing with 
spam[2].

Enter SpamAssassin. In it's native form, it's a free, open-source 
UNIX program, though it's been ported to Windows and other platforms, 
and people have created commercial versions[3]. It performs a wide 
variety of heuristic checks on the headers and body text of mail in 
order to detect spam.

Many filters out there aren't very effective, as they either have 
inflexible filter rules (i.e. blocking based on a specific sender 
address, or specific words in the text[4], or generate too many false 
positives (i.e. requiring that your address be in the TO/CC fields 
blocks a lot of legitimate mailing lists). SpamAssassin doesn't work 
that way -- it assigns a "point" value to specific spam-like words 
and phrases in a message. When the amount of "points" exceeds a 
user-configurable threshold (in my case, 5 points), the message is 
sorted to a separate folder for perusal at a later time.

For example, a recent spam I got advertising a free golf club 
received 10.70 points. It received 0.4 points for the word "free" 
being in the subject, 0.6 points for the "from" address ending in 
numbers, 0.5 points for asking the reader to "click below!", etc. 
Certain other criteria, such as the message forging it's sending mail 
system as being the Internet Mail Service software, when it does not 
contain other IMS identifying marks earns the message 4.3 points. I 
left off several other items, as this was merely an example.

SpamAssassin is quite effective at detecting spam -- it's filtered 
nearly 300 spam messages I've received, let about 15 through, and not 
caught a single legitimate email. I'm using the Eudora plug-in 
Spamnix[3], which does not currently include the Bayesian filter, 
which "learns" and adapts to what you determine to be spam or not 
(and increases the probability of catching spam to above 95%, with 
the benefit of becoming more accurate the more mail it processes) 
that the regular version of SpamAssassin contains.

SpamCop has been experimenting[5] with using SpamAssassin as an 
optional filter for paying subscribers. JT's still in the process of 
testing it out, and it's defaulting "off" on all accounts, though you 
can turn it on if you wish via the webmail interface. When it's out 
of testing, and placed fully into production, there'll be an 
announcement.

For those of you who are frustrated with having to wade through gobs 
of spam in order to pick out only a few legitimate mails, consider 
using some sort of client-side filter like SpamAssassin to aid you in 
sorting your mail. This way, you can read your legitimate mail at 
your lesiure, and deal with your spam when you have sufficient time. 
There are many filter programs out there, but I've yet to find one 
nearly as effective as SpamAssassin. Note that I do not suggest using 
ONLY SpamAssassin -- SA's client-side filtering, combined with either 
manually reporting or using SpamCop to report your spam makes an 
effective way of dealing with spam.

Also note that I am not associated with SpamCop or SpamAssassin in 
any way, other than being a satsfied user of SA's product and SC's 
service. I'm also not compensated in any way.

It may be a few week until I next write to the Digest, because I will 
be vacationing in Ireland from July 25th until August 11th. If there 
are any anti-spammers in Dublin or the surrounding area, and would 
like to get a drink or something, let me know, and I'll see if I can 
work it into my schedule.

Cheers!

[1] SpamAssassin is available at, unsurprisingly, http://www.spamassassin.org/
[2] Greatly decreasing worker efficiency and productivity, increasing 
frustration, and costing the company money because they need to pay 
people to spend time looking through spam, rather than doing 
productive work.
[3] Such as http://www.spamnix.com/, a Mac OS X and Windows Eudora plug-in.
[4] For instance, "sex" or "breast". These words can be, and 
frequently are, used in perfectly legitimate, non-porn-related email. 
I, for instance, am a medical professional...I may need to email 
someone regarding "breast cancer", which may be caught by inflexible 
filters that don't take context into consideration.
[5] http://news.spamcop.net/pipermail/spamcop-list/2003-July/049644.html
-- 
Pete Stephenson
HeyPete.com


More information about the SpamCop-Digest mailing list