[SpamCop-Digest] Using SpamAssassin to detect and sort spam.
Pete Stephenson
pete at heypete.com
Wed Jul 23 03:51:22 EDT 2003
Greetings again,
In this edition of the SpamCop Digest, I will discuss the use of the
free spam filter called SpamAssassin[1] to detect and sort spam.
While I believe it's safe to say that all subscribers of this mailing
list are indeed quite frustrated about receiving spam, and most
report most if not all of the spam they receive via SpamCop. Overall,
I believe that anti-spammers will win, and that spam will no longer
be tolerated anywhere on the internet. However, while we're waiting
for that day, we still have to deal with the huge volume of spam we
receive daily (I generally receive anywhere between 50 and 200 spams
per day, with it varying randomly every day).
Many of us get spam at our work email accounts in addition to our
personal accounts, and many businesses are fearful of implementing
server-side filtering because they fear it might result in
false-positives and blocking of legitimate mail, which, for a
business, would be intolerable (and reasonably so). Since employers
hesitate to block spam and implement filters, our accounts receive
more spam, and more of our work time is being spent dealing with
spam[2].
Enter SpamAssassin. In it's native form, it's a free, open-source
UNIX program, though it's been ported to Windows and other platforms,
and people have created commercial versions[3]. It performs a wide
variety of heuristic checks on the headers and body text of mail in
order to detect spam.
Many filters out there aren't very effective, as they either have
inflexible filter rules (i.e. blocking based on a specific sender
address, or specific words in the text[4], or generate too many false
positives (i.e. requiring that your address be in the TO/CC fields
blocks a lot of legitimate mailing lists). SpamAssassin doesn't work
that way -- it assigns a "point" value to specific spam-like words
and phrases in a message. When the amount of "points" exceeds a
user-configurable threshold (in my case, 5 points), the message is
sorted to a separate folder for perusal at a later time.
For example, a recent spam I got advertising a free golf club
received 10.70 points. It received 0.4 points for the word "free"
being in the subject, 0.6 points for the "from" address ending in
numbers, 0.5 points for asking the reader to "click below!", etc.
Certain other criteria, such as the message forging it's sending mail
system as being the Internet Mail Service software, when it does not
contain other IMS identifying marks earns the message 4.3 points. I
left off several other items, as this was merely an example.
SpamAssassin is quite effective at detecting spam -- it's filtered
nearly 300 spam messages I've received, let about 15 through, and not
caught a single legitimate email. I'm using the Eudora plug-in
Spamnix[3], which does not currently include the Bayesian filter,
which "learns" and adapts to what you determine to be spam or not
(and increases the probability of catching spam to above 95%, with
the benefit of becoming more accurate the more mail it processes)
that the regular version of SpamAssassin contains.
SpamCop has been experimenting[5] with using SpamAssassin as an
optional filter for paying subscribers. JT's still in the process of
testing it out, and it's defaulting "off" on all accounts, though you
can turn it on if you wish via the webmail interface. When it's out
of testing, and placed fully into production, there'll be an
announcement.
For those of you who are frustrated with having to wade through gobs
of spam in order to pick out only a few legitimate mails, consider
using some sort of client-side filter like SpamAssassin to aid you in
sorting your mail. This way, you can read your legitimate mail at
your lesiure, and deal with your spam when you have sufficient time.
There are many filter programs out there, but I've yet to find one
nearly as effective as SpamAssassin. Note that I do not suggest using
ONLY SpamAssassin -- SA's client-side filtering, combined with either
manually reporting or using SpamCop to report your spam makes an
effective way of dealing with spam.
Also note that I am not associated with SpamCop or SpamAssassin in
any way, other than being a satsfied user of SA's product and SC's
service. I'm also not compensated in any way.
It may be a few week until I next write to the Digest, because I will
be vacationing in Ireland from July 25th until August 11th. If there
are any anti-spammers in Dublin or the surrounding area, and would
like to get a drink or something, let me know, and I'll see if I can
work it into my schedule.
Cheers!
[1] SpamAssassin is available at, unsurprisingly, http://www.spamassassin.org/
[2] Greatly decreasing worker efficiency and productivity, increasing
frustration, and costing the company money because they need to pay
people to spend time looking through spam, rather than doing
productive work.
[3] Such as http://www.spamnix.com/, a Mac OS X and Windows Eudora plug-in.
[4] For instance, "sex" or "breast". These words can be, and
frequently are, used in perfectly legitimate, non-porn-related email.
I, for instance, am a medical professional...I may need to email
someone regarding "breast cancer", which may be caught by inflexible
filters that don't take context into consideration.
[5] http://news.spamcop.net/pipermail/spamcop-list/2003-July/049644.html
--
Pete Stephenson
HeyPete.com
More information about the SpamCop-Digest
mailing list