[clug] Spambayes

David Gibson david at gibson.dropbear.id.au
Sun May 25 19:52:12 EST 2003


On Sat, May 24, 2003 at 02:59:29PM +1000, Mark Triggs wrote:
> David Gibson <david at gibson.dropbear.id.au> writes:
> 
> > On Fri, May 23, 2003 at 12:49:26PM +1000, Mark Triggs wrote:
> >
> >> Once you're confident that it is unlikely to catch false positives, you
> >> can have it update its lists of spam/ham words automatically as it
> >> receives new messages, so little manual intervention is required.
> >
> > Again, that's something you want to be really cautious about:  if it
> > ever does start getting the wrong idea, it will reinforce itself and
> > drift even further off track
> 
> Absolutely - I haven't done this myself as yet. Bogofilter and gnus
> integrate quite nicely, so I have all spam going to a separate group
> which I can inspect before the meqssages are analysed and added to the
> spam word list. I'm sure other mail clients would offer similar
> functionality.

The trouble with this approach is that you really don't want to have
to look through the spam folder - otherwise, why use a spam filter in
the first place.  Worse, unless the filter is crap, the false
positives are likely to be things that aren't obvious to spot - doubly
so if you're also using sender whitelists.  So to be confident you've
reclassified things correctly, you'll need to give the spam folder a
thorough rather than cursory examination.

It's not obvious how to deal with this problem.  One approach I've
thought of, but which I haven't seen implemented is to randomly let
through some percentage of mails without applying the filter.  Then
base futher training on the user's classification of these mails only.

-- 
David Gibson			| For every complex problem there is a
david at gibson.dropbear.id.au	| solution which is simple, neat and
				| wrong.
http://www.ozlabs.org/people/dgibson



More information about the linux mailing list