[distcc] Re: Using distcc for other tasks (distributed "filtering")

Christian Leber christian at leber.de
Sun Feb 15 21:45:29 GMT 2004


On Mon, Feb 16, 2004 at 07:33:11AM +1100, Ben Elliston wrote:
> gzip has a nice property that:
>         cat A B | gzip > foo.gz
> 
> is functionally equivalent to:
>         (gzip < A && gzip < B) > bar.gz
> 
> The best way to parallelise your compression work would be to divide
> your workload into N pieces, where N is the number of machines you
> have.  Use split(1) to break the input into N pieces and use each host
> to gzip one chunk.  At the end, "cat" the result together again.

Exactly, i have about 30000 pieces, my problem is that i don't know how
to get distcc to pipe it through the gzip(*) on the remote boxes.

Christian Leber

(*) in fact it's stuff from 7z that takes about 15x the time, for gzip
this would not be worth, but is decompressable with normal gzip

-- 
  "Omnis enim res, quae dando non deficit, dum habetur et non datur,
   nondum habetur, quomodo habenda est."       (Aurelius Augustinus)
  Translation: <http://gnuhh.org/work/fsf-europe/augustinus.html>



More information about the distcc mailing list