E-mail: Spam Filtering
This page is meant to provide some tips and tricks regarding
the SpamAssassin spam filter ... which we sometimes refer to as our
content based filter as compared to the challenge-response filter.
False Negatives and False Positives
The content filter is not 100% accurate. It will sometimes think a spam
is a legit message (this is a false negative), and on rare occassions a legit
message will be marked as spam (a false positive).
False positives are usually a sign of a configuration error, such
as a mail server which has been blacklisted, or mail which has been
forwarded through a 3rd party (thus breaking SPF checks).
False negatives are what spammers strive to achieve, and some
types of spam are hard to identify. Graphical spam falls into
this category and is increasing.
Tuning the SpamAssassin Content Filter - Basic settings
The basic configuration screen has 3 fields: a threshold, a whitelist,
and a blacklist. The threshold is the score at which a message is
classified as legit (ham) or spam. If you increase the threshold, more
messages will be considered legit and you reduce the chance of false
positives. If you lower the threshold, more messages will be considered
as spam. The whitelist and blacklist fields are lists of addresses
that should either be classed as ham (whitelist) or spam (blacklist).
The addresses can include 'wildcards' like *@example.com which would
pass or block all messages from senders at the example.com domain.
Tuning notes (basic).
The spamassassin rules are designed around the premise that the
threshold will be 5. Changing the threshold to be below 5 _will_ cause
false positives. This might be ok for a casual user. However I would suggest
that it is NEVER appropriate for a business contact address to lower
the threshold below 5. (Doing so will cause the loss of some legit mail.)
Whitelisting your own address or domain is generally a bad idea. This is
because spammers often forge spam to you from you, or from an address
at your domain to lots of addresses at your domain.
I often see client configurations with long lists of white and black
lists. I cringe when I see this because these crude lists override all
the complex logic in spamassassin. Occasionally I'll see scores
in the logs like -75. That means spamassassin assigned a score of 25
(blatant spam) to a message but a whitelist entry (-100) over-rode it.
(The bad whitelists also make our job as sys-admins more difficult, as
the logs will show ham hitting rules that should _never_ be hit by
anything but spam. When we investigate to find out what happened, it
often turns out to be a bad whitelist entry.)
I also see resellers that go around
whitelisting themselves in their client's configurations. This works
and isn't such a bad idea, but it seems to me that a simpler solution
would be to write messages that don't look like spam!
(This is simple enough when you have access
to the system. Setup a mailbox with a low threshold and set it
to mark spam instead of bouncing it, then send you message to that mailbox.
You should end up with a message that includes a report of all the
rules your message hit and their respective scores.)
Advanced Tuning Notes.
There are some occassions where you may want to do some more advanced tuning
than the threshold and white/black lists allow. We built an interface
that was supposed to allow adding your own SpamAssassin rules, but due
to the way "spamd" works it does not work for adding rules. However
it _does_ still work for changing scores for a particular rule.
There are configurations that cause false positives. The most
obvious one is forwarding mail from an outside address to a baremetal
hosted mailbox. This breaks the SPF checks as the server hosting
the outside address is effectively relaying mail. The fix for this
is to disable the SPF rules by adding the following two lines to
your advanced configuration:
score SPF_HELO_SOFTFAIL 0
score SPF_SOFTFAIL 0
Alternatively you could increase your score threshold, but this solution
targets the specific problem and doesn't cripple spamassassin as badly.
This approach should be a general solution if you find that there are
rules that are consistently causing you trouble. Keep in mind that
there isn't much point in worrying about low score rules
like HTML_MESSAGE.
-Tom
|