[distcc] homogeneous environments
Robert W. Anderson
anderson110 at poptop.llnl.gov
Thu Apr 30 17:58:03 GMT 2009
Fergus Henderson wrote:
> What's possibly more interesting is the performance results. I'm
> compiling about 1300 source files on nodes that have 16 cpu's each.
>
> Single node (plain GNU make, no distcc):
>
> -j2 8m 19s
> -j4 5m 46s
> -j5 6m 39s
> -j8 10m 35s
>
> I don't understand this, but it is repeatable. Any ideas on that one?
>
>
> It looks like your machine probably has 4 CPUs, with each job using
> nearly 100% CPU.
My node actually has four quad-core processors.
make -j4 cpu utilization, according to top:
Cpu0 : 0.7%us, 4.0%sy, 0.0%ni, 95.3%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu1 : 13.6%us, 29.7%sy, 0.0%ni, 56.8%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu2 : 0.7%us, 1.8%sy, 0.0%ni, 97.4%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu3 : 0.7%us, 0.7%sy, 0.0%ni, 98.5%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu4 : 26.8%us, 22.4%sy, 0.0%ni, 49.6%id, 0.0%wa, 0.0%hi, 1.1%si
Cpu5 : 7.0%us, 15.8%sy, 0.0%ni, 77.3%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu6 : 2.9%us, 2.9%sy, 0.0%ni, 94.2%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu7 : 2.6%us, 0.7%sy, 0.0%ni, 96.7%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu8 : 0.7%us, 3.7%sy, 0.0%ni, 95.6%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu9 : 9.5%us, 15.8%sy, 0.0%ni, 74.7%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu10 : 0.7%us, 2.9%sy, 0.0%ni, 96.4%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu11 : 0.4%us, 0.4%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu12 : 1.5%us, 5.8%sy, 0.0%ni, 92.7%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu13 : 5.1%us, 5.1%sy, 0.0%ni, 89.8%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu14 : 6.6%us, 5.1%sy, 0.0%ni, 88.3%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu15 : 27.5%us, 30.0%sy, 0.0%ni, 41.8%id, 0.0%wa, 0.0%hi, 0.7%si
It bounces all over the place but this is not an atypical snapshot. The
machine is mostly idle, according to the fourth column. Is this somehow
a major resource contention issue? Disk access, maybe? Note that I
have tried local disk access on a /tmp partition, and while overall
performance improves a bit, the scaling with increasing -j does not.
-j5 is still slower than -j4.
A make -j8 run barely eats any more cpu. This is with fully local disk,
too:
Cpu0 : 20.0%us, 38.2%sy, 0.0%ni, 41.8%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu1 : 5.4%us, 30.4%sy, 0.0%ni, 64.3%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu2 : 14.5%us, 21.8%sy, 0.0%ni, 63.6%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu3 : 12.7%us, 41.8%sy, 0.0%ni, 45.5%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu4 : 20.0%us, 27.3%sy, 0.0%ni, 52.7%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu5 : 3.7%us, 29.6%sy, 0.0%ni, 66.7%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu6 : 1.8%us, 32.7%sy, 0.0%ni, 65.5%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu7 : 1.8%us, 3.5%sy, 0.0%ni, 94.7%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu8 : 1.8%us, 30.9%sy, 0.0%ni, 67.3%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu9 : 5.4%us, 23.2%sy, 0.0%ni, 71.4%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu10 : 3.6%us, 12.7%sy, 0.0%ni, 83.6%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu11 : 0.0%us, 7.3%sy, 0.0%ni, 92.7%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu12 : 3.6%us, 32.7%sy, 0.0%ni, 63.6%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu13 : 5.4%us, 19.6%sy, 0.0%ni, 75.0%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu14 : 0.0%us, 5.5%sy, 0.0%ni, 94.5%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu15 : 3.5%us, 28.1%sy, 0.0%ni, 68.4%id, 0.0%wa, 0.0%hi, 0.0%si
Recall that -j8 results in a 10m build whereas -j4 results in a 5m build.
> I'm not sure how to effectively profile this. All the sources are on
> NFS.
>
> So I then went multi-node w/ 4 jobs per node. Using localhost as a
> server only seems to slow things down, incidentally.
>
> 1 node, -j4: 5m 28s (using distcc and 1 remote node)
> 2 nodes, -j8: 2m 57s
> 3 nodes, -j12: 2m 16s
> 4 nodes, -j16: 1m 58s
> 5 nodes, -j20: 2m 7s
>
> Scaling seems to break down around the 4 node mark. Our link step
> is only 5-6 seconds, so we are not getting bound by that. Messing
> with -j further doesn't seem to help. Any ideas for profiling this
> to find any final bottlenecks?
>
>
> First, try running "top" during the build to determine the CPU usage on
> your local host. If it stays near 100%, then the bottleneck is local
> jobs such as linking and/or include scanning, and top will show you
> which jobs are using the CPU most. That's quite likely to be the
> limiting factor if you have a large number of nodes.
Not surprisingly (now), the localhost CPU is mostly idle as well during
a multi-node build. A snapshot:
Cpu0 : 0.0%us, 11.5%sy, 0.0%ni, 88.5%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu1 : 1.9%us, 26.9%sy, 0.0%ni, 71.2%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu2 : 1.9%us, 29.6%sy, 0.0%ni, 68.5%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu3 : 1.9%us, 17.0%sy, 0.0%ni, 81.1%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu4 : 9.4%us, 43.4%sy, 0.0%ni, 43.4%id, 0.0%wa, 0.0%hi, 3.8%si
Cpu5 : 3.8%us, 28.3%sy, 0.0%ni, 67.9%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu6 : 1.9%us, 18.9%sy, 0.0%ni, 79.2%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu7 : 1.9%us, 28.8%sy, 0.0%ni, 69.2%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu8 : 1.9%us, 11.3%sy, 0.0%ni, 86.8%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu9 : 3.7%us, 37.0%sy, 0.0%ni, 59.3%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu10 : 1.9%us, 26.4%sy, 0.0%ni, 71.7%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu11 : 0.0%us, 11.3%sy, 0.0%ni, 88.7%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu12 : 1.9%us, 15.4%sy, 0.0%ni, 82.7%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu13 : 1.9%us, 30.2%sy, 0.0%ni, 67.9%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu14 : 3.8%us, 22.6%sy, 0.0%ni, 73.6%id, 0.0%wa, 0.0%hi, 0.0%si
Cpu15 : 1.9%us, 24.5%sy, 0.0%ni, 73.6%id, 0.0%wa, 0.0%hi, 0.0%si
> Another possibility is lack of parallelism in your Makefile; you may
> have 1300 source files, but the dependencies in your Makefile probably
> mean that you can't actually run 1300 compiles in parallel. Maybe your
> Makefile only allows about 16 compiles to run in parallel on average.
I believe I fixed my makefiles to be, after a couple of short initial
serial steps, fully parallel in compiling the source, both per directory
and per source file. I do see the directories being interleaved in my
output, and also big bursts of files from the same directory being launched.
--
Robert W. Anderson
Center for Applied Scientific Computing
Email: anderson110 at llnl.gov
Tel: 925-424-2858 Fax: 925-423-8704
More information about the distcc
mailing list