[clug] spamsum usage in the real world

Tue Nov 14 09:51:13 GMT 2006

Based on a cursory glance of the spampot mailbox (which has actually
been active for a little while now, in preparation for this
possibility), there is a fair amount of spam which is purely duplicated
(most obviously on the subject line), but by visual scanning, also in
the entire body (dunno about attachments though). In analysing caught
spam subject lines only, they appear to regularly send blocks of mail
over a day or two before changing. 

Just quietly, I also ran a test of this spamsum earlier today and with
a couple of thousand signatures from the spampot only, it was capturing
roughly 6 messages for every message going into the spampot. (and thus
performing on par with what spamassassin was capturing at a score of
15+)

Note that's only a few hours of test though. 

And in the long run, I definately agree - the trend for spam will be to
be more unique over time. Spamsum nicely finds "similarness" though, so
it seems still quite effective. 

dspam was my next consideration :)

On Tue, Nov 14, 2006 at 07:45:46PM +1100, Kim Holburn did utter:
> Currently spammers use botnets and have huge computational  
> resources.  Many spams are generated individually on the fly and  
> filled with random text trawled from the net or images with random  
> dots and animated layers to foil gocr.  I don't see that any kind of  
> signature generation will really help.  I could be wrong.
> 
> What about dspam?
> 
> 
> http://www.eweek.com/article2/0,1895,2051950,00.asp
> 
> >The Spammers Strike Back
> >
> >
> >But the most interesting factor, also from the "anti-anti-spam  
> >spam" department, is advances in image spam. We've seen image spam  
> >for years, but it's gotten much fancier. Incidentally, Borderware  
> >is expected to announce new technology next week related to  
> >fighting image spam.
> >
> >You can't just block images in e-mail. (Well, technically you can,  
> >but people wouldn't stand for it.) So you have to figure out,  
> >somehow, which are the bad guys. There have been two basic methods  
> >employed: fingerprinting and OCR.
> >
> >With fingerprinting you try to identify a specific graphic through  
> >some set of characteristics, perhaps as simple as a CRC of its  
> >contents, perhaps some more complicated pattern recognition. You  
> >can determine, even offline by human examination, if the graphic is  
> >spam, and then when another graphic with the same fingerprint shows  
> >up you block it.
> >
> >The counter to this technique is actually pretty obvious: By  
> >modifying relatively few pixels in the graphic, say by changing the  
> >color slightly in every 10th pixel, you distort the fingerprint.  
> >Now introduce some randomization into that, by randomizing the  
> >pattern of changed pixels and the color shift, and fingerprinting  
> >becomes far more difficult. There are vendors working on solutions  
> >though, as we'll hear in the near future.
> >
> >OCR works by attempting to "read" the characters out of the  
> >graphics. It's an old idea; a friend of mine got an OCR patent back  
> >in the mid-'80s that's already expired. OCR works pretty well under  
> >stable, simple circumstances, like black block characters on white  
> >paper, but it's not hard to make life difficult for an OCR algorithm.
> >
> >Random dark clumps of pixels on the image create the analog of dirt  
> >on the paper. Or how about "speckling," in which patterns and color  
> >changes are inserted into the character drawing in the graphic?
> >
> >It's not unlike a "captcha," one of those Turing tests you have to  
> >pass in order to sign up for a Yahoo Mail account or similar  
> >things. They take a dirty graphic and draw characters on it, often  
> >wavy and distorted characters. You can read them, but they are hard  
> >for a program to read.
> >
> >PointerSpam increases have also been blamed on "island hopping," a  
> >newly emerging delivery technique that preys on far-flung domain  
> >names. Click here to read more.
> >
> >But wait, it gets worse. Spammers are taking advantage of GIF file  
> >features to make things even harder for anti-spam tools. The first  
> >technique is to use an animated GIF and to put the spam message on  
> >a second or subsequent and last image. There's a good chance that  
> >anti-spam software will only examine the first image. There are  
> >also layered GIFs that allow you to place different characters of a  
> >message in layers and appear to the user to be a single flat image.  
> >But software that examines the image will not easily see the  
> >picture the user sees.
> >
> >It's interesting that this latest flare-up in the spam war is  
> >costing both spammers and anti-spammers dearly. It's probably less  
> >of an issue for spammers because their costs are largely fixed and  
> >they offload much of the bandwidth and processing costs on to  
> >unsuspecting infected bot users. But anti-spammers are incurring  
> >much greater costs in processing and bandwidth.
> >
> >I've always been a fan of outsourced services like Postini for mail  
> >security, and it's times like this that they really save your butt.  
> >Are your network and your mail security infrastructure ready for a  
> >60 percent increase in spam? I suspect a lot of companies with  
> >underpowered appliances are losing mail from overloaded hardware  
> >these days. But Postini can handle it, and to the extent that it is  
> >effective in blocking the spam, you don't even see the increase in  
> >bandwidth, let alone the spam.
> >
> >Of course, Postini is not perfect in blocking spam; nobody is,  
> >which means that it has to be going up for everyone. If your anti- 
> >spam system blocks 97 percent of spam and you get 1,000 spam  
> >messages a day, a 60 percent increase means 18 more spam messages  
> >getting through than before.
> >
> >Some time ago I asked in a column, rhetorically, whether people  
> >would put up with spam as it approached 100 percent of the corpus  
> >of e-mail. According to Postini we're at 91 percent and rising, and  
> >I have to ask again: How bad does it have to get? The truth is that  
> >it can get a lot worse than it is now before enough people  
> >contemplate really serious measures.
> >
> >Security Center Editor Larry Seltzer has worked in and written  
> >about the computer industry since 1983.
> 
> 
> On 2006/Nov/14, at 6:52 PM, Nemo wrote:
> 
> >Hey all
> >
> >I've been tackling a small problem of spam (ironically, some of which
> >claims to be able to get rid of any small problems I might have ;) of
> >late.
> >
> >Currently we're just running spamassassin (via spamd/spamc) over all
> >messages (many customers, we don't want to teach them how to train
> >a bayesian filter), but this is a chunky performance hit.
> >
> >So I was thinking of throwing tridge's spamsum into the mix - ie, find
> >some old addresses on the domain which gather a heap of email that can
> >be assumed to be receiving spam only, and use those to generate  
> >spamsum
> >signatures - which can then be used as a check on all incoming  
> >messages
> >before the resource-hungry spamassassin gets to see them.
> >
> >Has anyone else done anything like this - or indeed, heard of spamsum
> >being used in the wild at all?  A search of the electric google showed
> >up many spamsum references, but nothing about people actually  
> >*using* it.
> >
> >For what I'm planning, this seems to be the rundown of pros and  
> >cons...
> >
> >Pro:
> >* avoid spamassassin heaviness on any messages caught by spamsum -  
> >which
> >is a much lighter-weight utility
> >
> >Con:
> >* For any email spamsum doesn't recognise, it's an extra process
> >spawned, in part negating the resource-saving aspect of running  
> >spamsum
> >at all
> >* The spam honeypot (spampot) addresses are assumed to only receive
> >spam. However, they could also receive:
> >    [a] personal email
> >    [b] chainmail funnies
> >    [c] newsletters
> >    ...[a] wont matter since it's spamsum signature is unlikely to  
> >match
> >    anything else anyway. [b] and [c] could be a potential source of
> >    false-positives however.
> >* receiving the spampot messages takes up bandwidth and resources that
> >could be saved by rejecting them earlier. Potential drive space  
> >resources
> >also (since I'd want to save them for a few days minimum)
> >
> >
> >
> >Have I missed anything? My thoughts are that the pros will outweigh  
> >the
> >cons and make for more email happiness all around.
> >
> >Thoughts appreciated. I feel like I'm heading into some unknown
> >territories here :)
> >
> >.../Nemo
> >-- 
> >linux mailing list
> >linux at lists.samba.org
> >https://lists.samba.org/mailman/listinfo/linux
> 
> --
> Kim Holburn
> IT Network & Security Consultant
> Ph: +61 2 61258620 M: +61 417820641  F: +61 2 6230 6121
> mailto:kim at holburn.net  aim://kimholburn
> skype://kholburn - PGP Public Key on request
> Cacert Root Cert: http://www.cacert.org/cacert.crt
> Aust. Spam Act: To stop receiving mail from me: reply and let me know.
> Use ISO 8601 dates [YYYY-MM-DD] http://www.saqqara.demon.co.uk/ 
> datefmt.htm
> 
> Democracy imposed from without is the severest form of tyranny.
>                           -- Lloyd Biggle, Jr. Analog, Apr 1961
> 
> 
> 
> -- 
> linux mailing list
> linux at lists.samba.org
> https://lists.samba.org/mailman/listinfo/linux

-- 
  ------------------------------------------ --------------------------
                                                    earth native