[clug] Spambayes

Wed May 21 12:15:19 EST 2003

On Wed, May 21, 2003 at 12:02:14PM +1000, Nemo -earth native- wrote:
> On Wed, May 21, 2003 at 11:40:10AM +1000, Rob Shugg did utter:
> > How does it cope with the spam we have been getting through clug, I am 
> > concerned that if i mark this stuff it will treat all clug messages as spam?
> 
> I'm currently on spamassassin with it's own bayesiain system turned on,
> and it's successfully caught every spam sent to the clug list (and some
> other lists I'm on), without generative any false positives. 
> 
> Personally, I'd be interested to see a comparison of the relative
> effectiveness of bogofilter, spambayes, spamassassin, etc... for
> example, train each system on the same block of spam and ham, and then
> test each on a new block of email known to contain both ham and spam... 
> 
> Say, 3 months of email logged for the training, and then the next three
> months of email as the testing... with a collection of email, such a
> test could be accomplished relatively easily I imagine... sounds like it
> could make a good clug meet talk/demo/something... if an appriate
> pseudo-delivery system could be setup, some rbl lists and other spam
> filter methods could be similarly tested.
> (http://pserver.samba.org/cgi-bin/cvsweb/junkcode/spamsum/ for example)

I've actually been thinking about doing something like this for a
while.  I already have the training and testing sets:  a complete
archive of mail for nearly three years, with hand sorted spam for all
that time as well.

Now, in my copious free time...

-- 
David Gibson			| For every complex problem there is a
david at gibson.dropbear.id.au	| solution which is simple, neat and
				| wrong.
http://www.ozlabs.org/people/dgibson