[distcc] LZO compression

Martin Pool mbp at samba.org
Mon Dec 16 09:50:00 GMT 2002


On 16 Dec 2002, Stephen White <swhite at decisionsoft.com> wrote:
> > From Martin Pool <mbp at samba.org>
> > Date: Monday, 16 Dec 2002, 00:52
> >
> > This might be a good time to add a message digest check to the
> > protocol to protect against transmission errors.  That might allow use
> > of the fast/dangerous lzo decompressor, although doing so would depend
> > on calculating and knowing the digest before sending the body, which
> > conflicts with the idea of not loading it all into memory.
> 
> Having just looked at the OpenSSL libraries, its MD5sum code allows the
> data to be passed a chunk at the time .. so the md5sum could be
> calculated as the file is compressed or decompressed and the value sent
> after the body.

I probably said MD5 before, but actually MD4 would be a better
solution.  It's not strong against a malicious attack, but it is very
strong against random errors introduced by an (only apparently
malicious :-) network card.  It's substantially cheaper.

There may be some other algorithm which is even better.

> This calculation might want adding to the non-compressing read/write
> routines too, though TCP/IP already includes a basic checksum and in
> this circumstance there isn't really anywhere else for it to get
> corrupted.

Errors not caught by TCP are pretty rare.  (People use HTTP and FTP
with no checksums all the time and rarely see errors.)  However, they
can happen.  If it's not too expensive, it would be good to do
checksums on all transmissions.  

This probably requires, or deserves, a protocol version bump but I
think that's no big deal.

You might like to look at the performance test suite I'm doing in the
bench/ directory in CVS.  It's still rough, but it should give a way
of checking the impact of changes like turning on compression, and
also of easily checking for changes that break some project.

-- 
Martin 



More information about the distcc mailing list