[clug] Spambayes

Bill Clarke llib at computer.org
Fri May 23 09:14:27 EST 2003


Nemo -earth native- wrote, On 22/05/03 12:29:
> Q: What spamfiltering system do people use?

spamassassin, v 2.54

> Q: Is it trainable?

yes

> Q: If it is, do you update it with ALL messages once classified as
> spam/ham, or only with false pos/neg results? (or not at all? Or
> something else?)

i initially trained with my previously saved few months of spam and
archives of ham.  all new spam and misclassified ham is fed back to
sa-learn and archived, so i can re-train if necessary.  occasionally i
will feed high-scored ham (or those with high bayes score) back to
sa-learn so it knows they're ham.  i recently changed spamassassin to
use DB_File (recommended, but not required) which sped up the learning
considerably; but i had to reinitialise the learner.  no remote tests
are done.  i also lowered my spam score limit to 3; this has increased
my false positives but i clear my low-scoring spam every couple of days.

however, unlike rob weir, i use imap from a different server than my
mailer (usually moz 1.4b).  this makes it more difficult to feed back to
sa-learn.  so what i do is forward (as attachments) my set of spam (or
ham) to a special email address (e.g.,
my_account+this_is_spam at my_server).  i wrote a perl script
(sa-learn-attach) which takes a message of attachments and learns from
each of the attachments; it doesn't call sa-learn --- that would be
quite inefficient --- instead it links directly into the SpamAssassin
learner.  an early procmail filter links the two.  if anyone is
interested in it, an early version is in the spamassasin mail archives,
or i can email it to you.  it requires MIME::Tools.

cheers,
/lib
-- 
/lib BillClarke PostdoctoralFellow CompSci ANU cs.anu.edu.au/CC-NUMA
http://llib.cjb.net llib at computer.org  tel:+61-2-6125x5687 fax:x0010
PGPid:B381EE7DB7D3E58F17248C672E2DA124ADADF444 GNU unix LaTeX XPilot
Buffy DrWho Goodies StarTrek XFiles Origami SML SMP MPI mozilla tcsh
Asimov Bear Clarke Donaldson Volleyball Ultimate Cricket emacs C++ X
Jordan Kay Lackey Martin Stasheff DeepPurple H&C KLF Queen PinkFloyd
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 260 bytes
Desc: not available
Url : http://lists.samba.org/archive/linux/attachments/20030523/44767e6f/attachment.bin


More information about the linux mailing list