[distcc] Re: Using distcc for other tasks (distributed "filtering")

Ben Elliston bje at wasabisystems.com
Sun Feb 15 20:33:11 GMT 2004


Christian Leber <christian at leber.de> writes:

> I really enjoy using distcc for compilations, now I'm searching for
> a simple way to distribute data compression over a network
> (compressing Knoppix, takes about 100 minutes ever time), i just
> need to get a 64 kb to the other boxes, compress them and get them
> back (that may be for example 30000 64kb blocks), so it's basically
> the same like compressing.

gzip has a nice property that:
        cat A B | gzip > foo.gz

is functionally equivalent to:
        (gzip < A && gzip < B) > bar.gz

The best way to parallelise your compression work would be to divide
your workload into N pieces, where N is the number of machines you
have.  Use split(1) to break the input into N pieces and use each host
to gzip one chunk.  At the end, "cat" the result together again.

Ben




More information about the distcc mailing list