Corpus-Feed

Create an MBOX file out of your current mailbox.
Separate the good from the bad email during this step, so you get two MBOX files one containing your spam emails and one containing your legit mails.

Feed them into dspam to have a good starting point.

Feed Spam

I donot have seperate profiles for different users, so I’m building a global profile under the user ID ‘amavis’

cat ./my_spamfeed.mbox | dspam --mode=teft --source=corpus --class=spam --feature=chained,noise --user amavis

Feed Ham

Do the same for your legin (Ham) mbox file:

cat ./my_nospamfeed.mbox | dspam --mode=teft --source=corpus --class=innocent --feature=chained,noise --user amavis

Retrain false positives/negatives

The dspam documentation gives you a bunch of ways to achive that.
In my case I run a script every 5 minutes that inspects two special folders within each mailbox named ‘dspam_trainspam’ for false negatives and ‘dspam_trainham’ for false positives.

So, whenever a user puts (move/copy) an email into these folders they will be grabbed by my script, retrained with dspam and removed from the dspam_* folders.

Retrain script

This script is optimized for usage with dovecot IMAP server.
In this script all special files owned/created by dovecot server are left untouched. See this line in the first two for-loops:

if [ -z "$(echo $foo1 | grep dovecot)" ]; then

If you are using some other kind of IMAP/POP3 or whatever server that creates files within the mailbox folders, you need to modify this script to leave those files untouched!

#!/bin/bash
#
# grab missed spam from dspam_trainspam folders
# grab legit mails from dspam_trainham folders
#
#set -x

BASEPATHES="/home/*/Maildir"

# create temp dir
mkdir -p /tmp/dspam_trainspam
tempDir=/tmp/dspam_trainspam
mkdir -p /tmp/dspam_trainham
tempHam=/tmp/dspam_trainham

# get train spam
for BP in $BASEPATHES; do
        trDir=${BP}/.dspam_trainspam
        if [ -d $trDir ]; then
                cd $trDir
                /usr/bin/find . -type f | while read foo1; do
                        if [ -z "$(echo $foo1 | grep dovecot)" ]; then
                                cp $foo1 $tempDir/;
                                rm $foo1;
                        fi
                done
        fi
done
unset foo1

# get train ham
for BP in $BASEPATHES; do
        trDir=${BP}/.dspam_trainham
        if [ -d $trDir ]; then
                cd $trDir
                /usr/bin/find . -type f | while read foo1; do
                        if [ -z "$(echo $foo1 | grep dovecot)" ]; then
                                cp $foo1 $tempHam/;
                                rm $foo1;
                        fi
                done
        fi
done

chown -R amavis $tempDir
chown -R amavis $tempHam

# retrain spam
/usr/bin/find $tempDir -type f | while read foo2; do su -s /bin/bash -c "/usr/bin/dspam --client --user amavis --class=spam --source=error < '${foo2}'" amavis; done

# retrain ham
/usr/bin/find $tempHam -type f | while read foo3; do su -s /bin/bash -c "/usr/bin/dspam --client --user amavis --class=innocent --source=error < '${foo3}'" amavis; done

# cleanup
rm -rf $tempDir
rm -rf $tempHam

Leave a Reply