[distcc] Preprocessing limit

Martin Pool mbp at samba.org
Sun Feb 22 23:44:14 GMT 2004


On 20 Feb 2004, john moser <bluefoxicy at linux.net> wrote:
> try reading it again, from the beginning.
>
> Let me try it like this.
> 
> DISTRIBUTED COMPUTING NETWORK: 3700 NODES
> NODE X:  1.5 Ghz Processor
> TIME TO PROCESS 5 PARALLEL PREPROCESSINGS:  45 SECONDS
>   NODES USED:  5
> TIME TO PROCESS 10 PARALLEL PREPROCESSINGS:  130 SECONDS
>   NODES USED:  10
> TIME TO PROCESS 3700 PARALLEL PREPROCESSINGS:  49 DAYS, 6 HOURS, 15 MINUTES, 27 SECONDS
>   NODES USED:  about 1-2 at a time, as the preprocessings slowly finish on those last 3 days and get sent out at random times.
> 
> You can NOT do as many preprocesses in parallel as you have nodes sometimes.
> To MAXIMIZE efficienty, you need to specify -j$NUMBEROFNODES and LOCK the
> number of parallel preprocessing operations to a lower number.  Then, WHILE
> one node is compiling a complex source file, you can preprocess AND send out
> another job, possibly BEFORE that one finishes.
> 
> Simple enough?  The idea is to get the job OFF the box ASAP so it can come
> back FINISHED ASAP.
> 
> Now, THNK this time, before you incur my wrath again.

Very funny.

Remember kids, THNK first!

> 
> On 19 Feb 2004, john moser <bluefoxicy at linux.net> wrote:
> 
> > distcc needs a way to limit how many preprocessing jobs it can run at once.
> > It may be advantageous to have, say, 150,000 make jobs (if you have a 100000
> > node computing network, for example; let's say HP decides it wants to wait
> > 2 minutes to compile a new operating system and all its tools).  Running
> > 150,000 parallel preprocessings will take hours.  After maybe 80% of that time,
> > a few jobs will trickle out to the compiling nodes.
> >
> > Instead, one could limit how many preprocessings can occur.  The distcc would
> > sleep until there's a free local preprocessing slot, and then run that
> > preprocessing, then ship out to a free node.  In this way, the actual
> > efficiency will more effectively approach the theoretical efficiency.
> > 
> > Think about it.  Need you wait 10 minutes with 50 jobs before shoving them out
> > the network?  Are you always going to have enough processing power to get close
> > to theorectical values?  What's the best way to get off the box
> > ASAP?
> 
> That's what the -j level is for.
> 


-- 
Martin 



More information about the distcc mailing list