[clug] Spambayes
David Gibson
david at gibson.dropbear.id.au
Sun May 25 19:52:12 EST 2003
On Sat, May 24, 2003 at 02:59:29PM +1000, Mark Triggs wrote:
> David Gibson <david at gibson.dropbear.id.au> writes:
>
> > On Fri, May 23, 2003 at 12:49:26PM +1000, Mark Triggs wrote:
> >
> >> Once you're confident that it is unlikely to catch false positives, you
> >> can have it update its lists of spam/ham words automatically as it
> >> receives new messages, so little manual intervention is required.
> >
> > Again, that's something you want to be really cautious about: if it
> > ever does start getting the wrong idea, it will reinforce itself and
> > drift even further off track
>
> Absolutely - I haven't done this myself as yet. Bogofilter and gnus
> integrate quite nicely, so I have all spam going to a separate group
> which I can inspect before the meqssages are analysed and added to the
> spam word list. I'm sure other mail clients would offer similar
> functionality.
The trouble with this approach is that you really don't want to have
to look through the spam folder - otherwise, why use a spam filter in
the first place. Worse, unless the filter is crap, the false
positives are likely to be things that aren't obvious to spot - doubly
so if you're also using sender whitelists. So to be confident you've
reclassified things correctly, you'll need to give the spam folder a
thorough rather than cursory examination.
It's not obvious how to deal with this problem. One approach I've
thought of, but which I haven't seen implemented is to randomly let
through some percentage of mails without applying the filter. Then
base futher training on the user's classification of these mails only.
--
David Gibson | For every complex problem there is a
david at gibson.dropbear.id.au | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson
More information about the linux
mailing list