Craig M. Buchek

St. Louis UNIX Users Group

July 14, 2004


"To promote the progress of science and useful arts, by securing for limited times to
authors and inventors the exclusive right to their respective writings and discoveries"
                                -- US Constitution, Article 1, Section 8

What Is SpamAssassin?

Why SpamAssassin?

  • Not much configuration required
  • Combined methods of detection are hard to beat
  • Marks messages with their scores, so users can decide what to do with them
    • Sort messages with high scores into a folder
    • Delete messages with very high scores
  • Works with a wide variety of MTAs, MDAs, and MUAs
    • Sendmail, qmail, Postfix, Exim, Exchange
    • MIMEdefang, amavisd
    • KMail, Outlook, Eudora
  • Does relatively well in comparison tests against other filters

Features

  • Several spam detection methods
    • Header checks
    • Body content
    • Bayesian filtering
    • DNS lists (MAPS, ORBS)
    • Checksums (DCC, Pyzor, Razor2)
  • Lots of pre-configured rules
    • Uses a genetic algorithm to determine weights for each rule
  • Marks emails with their score
    • User can decide what to do with the message
  • Written in (mostly) Perl
  • Commercial (and Windows) support available
    • McAfee SpamKiller
    • InboxCop

Installation

  • Installs via CPAN (as root)
    • perl -MCPAN -e shell
    • o conf prerequisites_policy ask
    • install Mail::SpamAssassin
    • quit
  • Also available as tarballs and RPMs
    • But they have a lot of pre-requisites
  • Set character set if using Perl 5.8 to avoid Unicode slow-down:
    • export LANG=en_US

How It Works

  • Always-running deamon: spamd
    • Preloaded to reduce program-startup penalty
    • Listens on a socket for incoming messages
    • Processes the message, adds info about what it found
    • Option to use only local tests (no DNS, RBL, CRC lookups): -L, --local
  • Quick-starting client: spamc
    • Accepts a message on stdin, feeds it to spamd, results go to stdout

Configuration

  • Sitewide configuration: /etc/mail/spamassassin/local.cf
  • Per-user configuration: ~/.spamassassin/user_prefs
  • 	whitelist_to     thinkgeek@buchek.com
    	whitelist_from   *@socket.net
    	blacklist_to     winfreestuff@craigbuchek.com
    
  • Internet sites are available that will help you choose a configuration
    • http://www.yrex.com/spam/spamconfig.php

Rules

  • /usr/share/spamassassin/ (demo)
  • You can change the score from the default in your config files:
    • score NAME_OF_TEST 3.0
  • Always run spamassassin -d --lint before making your rule changes live
  • Internet sites are available that will help you write rulesets
    • Guide to writing your own add-on rules: http://mywebpages.comcast.net/mkettler/sa/SA-rules-howto.txt
    • SpamAssassin Rules Emporium (SARE): http://www.rulesemporium.com/

Bayesian Filtering

  • Determines likelihood that a message is spam or not
    • By comparing the message to other spam and non-spam messages
    • Uses frequency of words within spam and non-spam messages
  • Learns about your spam by looking at the words it contains
  • Teach it what's spam and what's not:
    • sa-learn --spam ~/Mail/saved-spam-folder/cur
    • sa-learn --ham ~/Mail/other-nonspam-folder.mbx
  • Preferably put the sa-learn in a cron job

Procmail

# Lock file ensures only 1 invocation happens at a time, to reduce load
# Only filter messages smaller than 250 kB
:0fw: spamassassin.lock
* < 256000
| spamc

# Mails with a score of 15 or higher are almost certainly spam
:0:
* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
mail/Spam

# All mail tagged as spam (eg. with a score higher than the set threshold)
:0:
* ^X-Spam-Status: Yes
mail/Maybe-Spam

# Work around procmail bug: stderr may cause the "F" in "From" to be dropped.
:0
* ^^rom[ ]
{
  :0 fhw
  | sed -e '1s/^/F/'
}

KMail

  • Trick is to create 2 filters (Settings -> Configure Filters)
    • First filter runs the message through SpamAssassin
      • Uncheck "if this filter matches, stop processing here"
      • Put after any filters where you are sure there won't be spam
    • Second filter decides what to do with the message
      • Based on headers SpamAssassin added
  • http://www.tomchance.org.uk/research/random/kmail
  • http://www.softwaredesign.co.uk/Information.SpamFilters.html

Kmail - Filter 1

Kmail - Filter 2

Other Email Programs

  • Outlook - http://apps.carleton.ca/ccs/web/spam/outlookspam.asp
  • Exchange - http://www.spamblogging.com/archives/000069.html
  • Exchange - http://www.christopherlewis.com/ExchangeSpamAssassin.htm

SpamAssassin 3.0

  • SpamAssassin 3.0 will be released within the next few weeks
  • Now an official Apache Foundation project
  • License has been changed
    • 2.x: Perl Artistic License OR GPL
    • 3.x: Apache Software License 2.0
  • Much more modular design
    • Allows third-party plug-ins to be created more easily
  • Checking of URLs against online databases of known spam URLs
  • Support for new anti-spam initiatives, such as SPF and HashCash

Other Spam Filters

  • DSpam
  • BogoFilter
  • CRM-114
  • SpamBayes

Presentation Info

 

  • Recommended Reading: SpamAssassin by Alan Schwartz (O'Reilly)