[distcc] homogeneous environments

Thu Apr 30 02:57:42 GMT 2009

On Wed, Apr 29, 2009 at 5:41 PM, Robert W. Anderson <
anderson110 at poptop.llnl.gov> wrote:

> I just disabled the client check in the source and everything seemed to
> work from that point.

That's OK if you're on a trusted network behind a firewall, or if you're not
concerned about security.

What's possibly more interesting is the performance results.  I'm compiling
> about 1300 source files on nodes that have 16 cpu's each.
>
> Single node (plain GNU make, no distcc):
>
> -j2 8m 19s
> -j4 5m 46s
> -j5 6m 39s
> -j8 10m 35s
>
> I don't understand this, but it is repeatable.  Any ideas on that one?

It looks like your machine probably has 4 CPUs, with each job using nearly
100% CPU.
Or possibly it has 2 CPUs, with each job using about 50% CPU, and spending
the remaining time waiting for I/O (but I think this possibility is less
likely).
Either way, using "-j4" already gets you the maximum amount of
parallelization that you can benefit from, with total CPU utilization for
each CPU being near 100%.  Higher levels of parallelism can't increase CPU
utilization beyond 100%, but they do add overheads, due to a larger working
set, more task switching, and worse locality.  So higher -j levels increase
the overall build time.

> I'm not sure how to effectively profile this. All the sources are on NFS.
>
> So I then went multi-node w/ 4 jobs per node.  Using localhost as a server
> only seems to slow things down, incidentally.
>
> 1 node,  -j4:  5m 28s (using distcc and 1 remote node)
> 2 nodes, -j8:  2m 57s
> 3 nodes, -j12: 2m 16s
> 4 nodes, -j16: 1m 58s
> 5 nodes, -j20: 2m 7s
>
> Scaling seems to break down around the 4 node mark.  Our link step is only
> 5-6 seconds, so we are not getting bound by that.  Messing with -j further
> doesn't seem to help.  Any ideas for profiling this to find any final
> bottlenecks?

First, try running "top" during the build to determine the CPU usage on your
local host.  If it stays near 100%, then the bottleneck is local jobs such
as linking and/or include scanning, and top will show you which jobs are
using the CPU most.  That's quite likely to be the limiting factor if you
have a large number of nodes.

Another possibility is lack of parallelism in your Makefile; you may have
1300 source files, but the dependencies in your Makefile probably mean that
you can't actually run 1300 compiles in parallel.  Maybe your Makefile only
allows about 16 compiles to run in parallel on average.

-- 
Fergus Henderson <fergus.henderson at gmail.com>
-------------- next part --------------
HTML attachment scrubbed and removed