How to Easily Reclassify Mail with CRM114 Mailfilter

You may ask why yourself why not just use the method of forwarding the misclassified mail to yourself. I can think of at least three reasons why you would want to use the method of message reclassification described below.

  1. The misclassified message is relearned exactly as it arrived in your mailbox. You don't need to worry about your mail client adding or removing headers, therefore greatly increasing your learning accuracy.
  2. Retraining is super-simple. All you need to do is move the misclassified message to the Reclassify folder and the Perl script does the rest. You also don't need to remove any CRM114 headers and you don't need to type command password [non]spam.
  3. Because the message is relearned in the same form as it made it through CRM114 in the first place (no headers are added or removed), you should never receive the dreaded "LEARN AS [NON]SPAM UNNECESSARY - ALREADY CLASSIFIED CORRECTLY - NO ACTION TAKEN" message.
The following steps will guide you in getting CRM114 working with Mbox style IMAP folders and a nifty retraining method. If you don't already know what CRM114 is you need to mosey on over to its project page. A brief description would be that it is a kick-butt Spam filter, but it's much more than that.
  1. Download, install and setup CRM114.

  2. You will need an IMAP server that supports the Mbox format.

    I use the IMAP server provided by the University of Washington. If you are looking at using an IMAP server that supports the Maildir format then you should look at the original Perl reclassification script.

  3. Create two new IMAP folders.

    I use Spam and Reclassify for my folders, you can use whatever you like.

  4. Make sure you have procmail installed on your system.

    I have the following lines at the bottom of my /userhome/.procmailrc file.

    :0fw: .msgid.lock
    | /usr/bin/crm --fileprefix=/userhome/crm/ -u /userhome/crm mailfilter.crm

    :0:
    * ^X-CRM114-Status: SPAM.*
    Spam

    The procmail recipe looks for the X-CRM114-Status: SPAM string in the message header. When that string is found, it puts the message in the Spam folder.

  5. Download and modify the Perl script that handles the reclassification. - Updated 4/17/2004

    The configuration for this script is pretty straight forward. You will obviously need Perl but you will also need the Mail:Box CPAN module. If you don't have it you should be able to run perl -MCPAN -e 'install Mail::Box' to download and install it. If you aren't sure if you have the module, run perl -c fix-spam-classification.pl and look for an error similar to this: "Can't locate Mail/Box/Manager.pm". Once the script has been configured put an entry in cron something like this.

    */2 * * * * /usr/bin/fix-spam-classification.pl > /tmp/fix-spam-classification.log 2>&1

    This will run the Perl script every two minutes and looks for messages in the Reclassify folder. Notice I placed the Perl script in /usr/bin but you can put it anywhere you like. You can also redirect output from the cron entry to /dev/null if you don't want to keep a log file.

  6. Using CRM114 and the Perl reclassification script.

    Ahhh, the easiest part. If you get a misclassified mail just move it to your Reclassify folder.

Thanks go to Dr. Mike Chudobiak as he had the original idea. I took his original Perl program that uses Maildir style IMAP folders and applied it to Mbox style IMAP folders.

Email John Johnston with any feedback or questions.

Valid HTML 4.01!