[distcc] speedup with >2 hosts

Harold L Hunt II huntharo at msu.edu
Tue Apr 6 02:32:35 GMT 2004


Stefan,

SCHMID Stefan wrote:
> hi,
> 
> i'm using a system of five dual-CPU distcc-hosts for distributed 
> compilation. with all my projects, one additional host is very useful 
> but the third to fifth hardly contribute anything at all. moreover, 
> there is quite some variation in the execution time although all hosts 
> do nothing else besides compilation (maybe because of the sleeping times 
> when hosts are locked?). it is therefore also hard to tell which 
> -j-option is the best for a given number of hosts...
> 
> all in all, with 5 hosts (10 cpu's) i hardly get any speedups greater 
> than 2 (compared to one host, i.e. 2 cpu's). do you have an idea where i 
> do something wrong?

Try not putting 'localhost' in DISTSCC_HOSTS as you are likely reaching
the point where the controlling host cannot pre-process, distribute, and
reassemble files any faster while still compiling a file locally.  This
is always the limiting factor on Cygwin... in fact, on Cygwin, you
hardly get any cpu usage on any remote host because 75% of compilation
time under Cygwin is spent forking processes and pre-processing files.

I would try a "-j n" value around (1.5 * number_of_cpus_on_build_hosts).
  For example, for 5 dual cpu hosts, I would use "-j 15" as a test.  The
reason for this is that you want the central host to have a file ready
to be compiled by each host as soon as that hosts last file is finished.
  If you used "-j 2" you wouldn't have files ready when the build hosts
are available.  If you used "-j 1000" you'd be swamping the build host
with useless work that would actually prevent it from processing current
requests in a timely fashion.  This is why you want a job value of
somewhere between 1 and 2 times the number of cpus that are performing
compilations.

If one of your hosts is significantly faster than the others, or even
marginally faster, use it as the machine that runs the compilation.  The
reason for this is that it will be more likely to be able to keep the
other machines busy without becoming saturated.

Hope that helps.  Let us know if you are are able to improve the
performance at all by not including localhost in DISTCC_HOSTS.

Harold



More information about the distcc mailing list