[distcc] homogeneous environments

Robert W. Anderson anderson110 at poptop.llnl.gov
Thu Apr 30 17:58:03 GMT 2009


Fergus Henderson wrote:
>     What's possibly more interesting is the performance results.  I'm
>     compiling about 1300 source files on nodes that have 16 cpu's each.
> 
>     Single node (plain GNU make, no distcc):
> 
>     -j2 8m 19s
>     -j4 5m 46s
>     -j5 6m 39s
>     -j8 10m 35s
> 
>     I don't understand this, but it is repeatable.  Any ideas on that one?
> 
> 
> It looks like your machine probably has 4 CPUs, with each job using 
> nearly 100% CPU.

My node actually has four quad-core processors.

make -j4 cpu utilization, according to top:

Cpu0  :  0.7%us,  4.0%sy,  0.0%ni, 95.3%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu1  : 13.6%us, 29.7%sy,  0.0%ni, 56.8%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu2  :  0.7%us,  1.8%sy,  0.0%ni, 97.4%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu3  :  0.7%us,  0.7%sy,  0.0%ni, 98.5%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu4  : 26.8%us, 22.4%sy,  0.0%ni, 49.6%id,  0.0%wa,  0.0%hi,  1.1%si
Cpu5  :  7.0%us, 15.8%sy,  0.0%ni, 77.3%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu6  :  2.9%us,  2.9%sy,  0.0%ni, 94.2%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu7  :  2.6%us,  0.7%sy,  0.0%ni, 96.7%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu8  :  0.7%us,  3.7%sy,  0.0%ni, 95.6%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu9  :  9.5%us, 15.8%sy,  0.0%ni, 74.7%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu10 :  0.7%us,  2.9%sy,  0.0%ni, 96.4%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu11 :  0.4%us,  0.4%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu12 :  1.5%us,  5.8%sy,  0.0%ni, 92.7%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu13 :  5.1%us,  5.1%sy,  0.0%ni, 89.8%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu14 :  6.6%us,  5.1%sy,  0.0%ni, 88.3%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu15 : 27.5%us, 30.0%sy,  0.0%ni, 41.8%id,  0.0%wa,  0.0%hi,  0.7%si

It bounces all over the place but this is not an atypical snapshot.  The 
machine is mostly idle, according to the fourth column.  Is this somehow 
a major resource contention issue?  Disk access, maybe?  Note that I 
have tried local disk access on a /tmp partition, and while overall 
performance improves a bit, the scaling with increasing -j does not. 
-j5 is still slower than -j4.

A make -j8 run barely eats any more cpu.  This is with fully local disk, 
too:

Cpu0  : 20.0%us, 38.2%sy,  0.0%ni, 41.8%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu1  :  5.4%us, 30.4%sy,  0.0%ni, 64.3%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu2  : 14.5%us, 21.8%sy,  0.0%ni, 63.6%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu3  : 12.7%us, 41.8%sy,  0.0%ni, 45.5%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu4  : 20.0%us, 27.3%sy,  0.0%ni, 52.7%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu5  :  3.7%us, 29.6%sy,  0.0%ni, 66.7%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu6  :  1.8%us, 32.7%sy,  0.0%ni, 65.5%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu7  :  1.8%us,  3.5%sy,  0.0%ni, 94.7%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu8  :  1.8%us, 30.9%sy,  0.0%ni, 67.3%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu9  :  5.4%us, 23.2%sy,  0.0%ni, 71.4%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu10 :  3.6%us, 12.7%sy,  0.0%ni, 83.6%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu11 :  0.0%us,  7.3%sy,  0.0%ni, 92.7%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu12 :  3.6%us, 32.7%sy,  0.0%ni, 63.6%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu13 :  5.4%us, 19.6%sy,  0.0%ni, 75.0%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu14 :  0.0%us,  5.5%sy,  0.0%ni, 94.5%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu15 :  3.5%us, 28.1%sy,  0.0%ni, 68.4%id,  0.0%wa,  0.0%hi,  0.0%si

Recall that -j8 results in a 10m build whereas -j4 results in a 5m build.

>     I'm not sure how to effectively profile this. All the sources are on
>     NFS.
> 
>     So I then went multi-node w/ 4 jobs per node.  Using localhost as a
>     server only seems to slow things down, incidentally.
> 
>     1 node,  -j4:  5m 28s (using distcc and 1 remote node)
>     2 nodes, -j8:  2m 57s
>     3 nodes, -j12: 2m 16s
>     4 nodes, -j16: 1m 58s
>     5 nodes, -j20: 2m 7s
> 
>     Scaling seems to break down around the 4 node mark.  Our link step
>     is only 5-6 seconds, so we are not getting bound by that.  Messing
>     with -j further doesn't seem to help.  Any ideas for profiling this
>     to find any final bottlenecks?
> 
> 
> First, try running "top" during the build to determine the CPU usage on 
> your local host.  If it stays near 100%, then the bottleneck is local 
> jobs such as linking and/or include scanning, and top will show you 
> which jobs are using the CPU most.  That's quite likely to be the 
> limiting factor if you have a large number of nodes.

Not surprisingly (now), the localhost CPU is mostly idle as well during 
a multi-node build.  A snapshot:

Cpu0  :  0.0%us, 11.5%sy,  0.0%ni, 88.5%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu1  :  1.9%us, 26.9%sy,  0.0%ni, 71.2%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu2  :  1.9%us, 29.6%sy,  0.0%ni, 68.5%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu3  :  1.9%us, 17.0%sy,  0.0%ni, 81.1%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu4  :  9.4%us, 43.4%sy,  0.0%ni, 43.4%id,  0.0%wa,  0.0%hi,  3.8%si
Cpu5  :  3.8%us, 28.3%sy,  0.0%ni, 67.9%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu6  :  1.9%us, 18.9%sy,  0.0%ni, 79.2%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu7  :  1.9%us, 28.8%sy,  0.0%ni, 69.2%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu8  :  1.9%us, 11.3%sy,  0.0%ni, 86.8%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu9  :  3.7%us, 37.0%sy,  0.0%ni, 59.3%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu10 :  1.9%us, 26.4%sy,  0.0%ni, 71.7%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu11 :  0.0%us, 11.3%sy,  0.0%ni, 88.7%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu12 :  1.9%us, 15.4%sy,  0.0%ni, 82.7%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu13 :  1.9%us, 30.2%sy,  0.0%ni, 67.9%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu14 :  3.8%us, 22.6%sy,  0.0%ni, 73.6%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu15 :  1.9%us, 24.5%sy,  0.0%ni, 73.6%id,  0.0%wa,  0.0%hi,  0.0%si


> Another possibility is lack of parallelism in your Makefile; you may 
> have 1300 source files, but the dependencies in your Makefile probably 
> mean that you can't actually run 1300 compiles in parallel.  Maybe your 
> Makefile only allows about 16 compiles to run in parallel on average.

I believe I fixed my makefiles to be, after a couple of short initial 
serial steps, fully parallel in compiling the source, both per directory 
and per source file.  I do see the directories being interleaved in my 
output, and also big bursts of files from the same directory being launched.


-- 
Robert W. Anderson
Center for Applied Scientific Computing
Email: anderson110 at llnl.gov
Tel: 925-424-2858  Fax: 925-423-8704


More information about the distcc mailing list