[clug] Spambayes

Rob Weir rweir at ertius.org
Fri May 23 01:02:55 EST 2003


On Thu, May 22, 2003 at 12:29:33PM +1000, Nemo -earth native- wrote:
> On Wed, May 21, 2003 at 12:15:19PM +1000, David Gibson did utter:
> > > 
> > > Personally, I'd be interested to see a comparison of the relative
> > > effectiveness of bogofilter, spambayes, spamassassin, etc... for
> > > example, train each system on the same block of spam and ham, and then
> > > test each on a new block of email known to contain both ham and spam... 
> 
> I'm increasingly finding I'm putting thought into this... SO, straw poll
> time. 
> 
> Q: What spamfiltering system do people use?

spamassassin, version 2.53.

> Q: Is it trainable?

Yes.

> Q: If it is, do you update it with ALL messages once classified as
> spam/ham, or only with false pos/neg results? (or not at all? Or
> something else?)

I carefully weeded out some mailing list folders, and created a fairly
large spam folder (~2500) and maybe triple that of ham.  Ran that
through sa-learn (which took aaaages), and then just let spamassassin
start sorting my mail.  It actually did a fairly good job (this was
immediately after 2.50 was released, I think) right off the bat, with no
false positives, and very few false negatives.

To continue training it, I have these key bindings for mutt: 'y' feeds a
message as spam to sa-learn, and moves it to =spam/generic-spam/ (my
spam goes into several different folders depending on what marked it as
such).  'Y' feeds a message to sa-learn as ham.  It's mostly for
symmetry, and also as a quick fix if I'm too fast with 'y'.

macro index 'y' "<enter-command>unset wait_key\n<pipe-entry>sa-learn
--no-rebuild --single --spam > /dev/null 2>&1 &\n<enter-command>set
wait_key\n<save-entry>=spam/generic-spam/\n\n"

macro pager 'Y' "<enter-command>unset wait_key\n<pipe-entry>sa-learn
--no-rebuild --single --ham > /dev/null 2>&1\n<enter-command>set
wait_key\n"

-- 
Rob Weir <rweir at ertius.org>  |   mlspam at ertius.org   |   http://www.ertius.org/
GPG keys: 1024D/1E73B7CD, 4096R/3ABDE5EC     |      Do I look like I want a CC?
Words of the day:       India INS blackjack bluebird infowar wire transfer BCCI



More information about the linux mailing list