[distcc] homogeneous environments
Fergus Henderson
fergus at google.com
Thu Apr 30 18:08:25 GMT 2009
The include server could be the bottleneck. What's the CPU usage for the
include server process?
Or it could be disk I/O. Try iostat or vmstat to profile that.
On Thu, Apr 30, 2009 at 1:58 PM, Robert W. Anderson <
anderson110 at poptop.llnl.gov> wrote:
> Fergus Henderson wrote:
>
>> What's possibly more interesting is the performance results. I'm
>> compiling about 1300 source files on nodes that have 16 cpu's each.
>>
>> Single node (plain GNU make, no distcc):
>>
>> -j2 8m 19s
>> -j4 5m 46s
>> -j5 6m 39s
>> -j8 10m 35s
>>
>> I don't understand this, but it is repeatable. Any ideas on that one?
>>
>>
>> It looks like your machine probably has 4 CPUs, with each job using nearly
>> 100% CPU.
>>
>
> My node actually has four quad-core processors.
>
> make -j4 cpu utilization, according to top:
>
> Cpu0 : 0.7%us, 4.0%sy, 0.0%ni, 95.3%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu1 : 13.6%us, 29.7%sy, 0.0%ni, 56.8%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu2 : 0.7%us, 1.8%sy, 0.0%ni, 97.4%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu3 : 0.7%us, 0.7%sy, 0.0%ni, 98.5%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu4 : 26.8%us, 22.4%sy, 0.0%ni, 49.6%id, 0.0%wa, 0.0%hi, 1.1%si
> Cpu5 : 7.0%us, 15.8%sy, 0.0%ni, 77.3%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu6 : 2.9%us, 2.9%sy, 0.0%ni, 94.2%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu7 : 2.6%us, 0.7%sy, 0.0%ni, 96.7%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu8 : 0.7%us, 3.7%sy, 0.0%ni, 95.6%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu9 : 9.5%us, 15.8%sy, 0.0%ni, 74.7%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu10 : 0.7%us, 2.9%sy, 0.0%ni, 96.4%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu11 : 0.4%us, 0.4%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu12 : 1.5%us, 5.8%sy, 0.0%ni, 92.7%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu13 : 5.1%us, 5.1%sy, 0.0%ni, 89.8%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu14 : 6.6%us, 5.1%sy, 0.0%ni, 88.3%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu15 : 27.5%us, 30.0%sy, 0.0%ni, 41.8%id, 0.0%wa, 0.0%hi, 0.7%si
>
> It bounces all over the place but this is not an atypical snapshot. The
> machine is mostly idle, according to the fourth column. Is this somehow a
> major resource contention issue? Disk access, maybe? Note that I have
> tried local disk access on a /tmp partition, and while overall performance
> improves a bit, the scaling with increasing -j does not. -j5 is still slower
> than -j4.
>
> A make -j8 run barely eats any more cpu. This is with fully local disk,
> too:
>
> Cpu0 : 20.0%us, 38.2%sy, 0.0%ni, 41.8%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu1 : 5.4%us, 30.4%sy, 0.0%ni, 64.3%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu2 : 14.5%us, 21.8%sy, 0.0%ni, 63.6%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu3 : 12.7%us, 41.8%sy, 0.0%ni, 45.5%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu4 : 20.0%us, 27.3%sy, 0.0%ni, 52.7%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu5 : 3.7%us, 29.6%sy, 0.0%ni, 66.7%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu6 : 1.8%us, 32.7%sy, 0.0%ni, 65.5%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu7 : 1.8%us, 3.5%sy, 0.0%ni, 94.7%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu8 : 1.8%us, 30.9%sy, 0.0%ni, 67.3%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu9 : 5.4%us, 23.2%sy, 0.0%ni, 71.4%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu10 : 3.6%us, 12.7%sy, 0.0%ni, 83.6%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu11 : 0.0%us, 7.3%sy, 0.0%ni, 92.7%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu12 : 3.6%us, 32.7%sy, 0.0%ni, 63.6%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu13 : 5.4%us, 19.6%sy, 0.0%ni, 75.0%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu14 : 0.0%us, 5.5%sy, 0.0%ni, 94.5%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu15 : 3.5%us, 28.1%sy, 0.0%ni, 68.4%id, 0.0%wa, 0.0%hi, 0.0%si
>
> Recall that -j8 results in a 10m build whereas -j4 results in a 5m build.
>
> I'm not sure how to effectively profile this. All the sources are on
>> NFS.
>>
>> So I then went multi-node w/ 4 jobs per node. Using localhost as a
>> server only seems to slow things down, incidentally.
>>
>> 1 node, -j4: 5m 28s (using distcc and 1 remote node)
>> 2 nodes, -j8: 2m 57s
>> 3 nodes, -j12: 2m 16s
>> 4 nodes, -j16: 1m 58s
>> 5 nodes, -j20: 2m 7s
>>
>> Scaling seems to break down around the 4 node mark. Our link step
>> is only 5-6 seconds, so we are not getting bound by that. Messing
>> with -j further doesn't seem to help. Any ideas for profiling this
>> to find any final bottlenecks?
>>
>>
>> First, try running "top" during the build to determine the CPU usage on
>> your local host. If it stays near 100%, then the bottleneck is local jobs
>> such as linking and/or include scanning, and top will show you which jobs
>> are using the CPU most. That's quite likely to be the limiting factor if
>> you have a large number of nodes.
>>
>
> Not surprisingly (now), the localhost CPU is mostly idle as well during a
> multi-node build. A snapshot:
>
> Cpu0 : 0.0%us, 11.5%sy, 0.0%ni, 88.5%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu1 : 1.9%us, 26.9%sy, 0.0%ni, 71.2%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu2 : 1.9%us, 29.6%sy, 0.0%ni, 68.5%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu3 : 1.9%us, 17.0%sy, 0.0%ni, 81.1%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu4 : 9.4%us, 43.4%sy, 0.0%ni, 43.4%id, 0.0%wa, 0.0%hi, 3.8%si
> Cpu5 : 3.8%us, 28.3%sy, 0.0%ni, 67.9%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu6 : 1.9%us, 18.9%sy, 0.0%ni, 79.2%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu7 : 1.9%us, 28.8%sy, 0.0%ni, 69.2%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu8 : 1.9%us, 11.3%sy, 0.0%ni, 86.8%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu9 : 3.7%us, 37.0%sy, 0.0%ni, 59.3%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu10 : 1.9%us, 26.4%sy, 0.0%ni, 71.7%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu11 : 0.0%us, 11.3%sy, 0.0%ni, 88.7%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu12 : 1.9%us, 15.4%sy, 0.0%ni, 82.7%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu13 : 1.9%us, 30.2%sy, 0.0%ni, 67.9%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu14 : 3.8%us, 22.6%sy, 0.0%ni, 73.6%id, 0.0%wa, 0.0%hi, 0.0%si
> Cpu15 : 1.9%us, 24.5%sy, 0.0%ni, 73.6%id, 0.0%wa, 0.0%hi, 0.0%si
>
>
> Another possibility is lack of parallelism in your Makefile; you may have
>> 1300 source files, but the dependencies in your Makefile probably mean that
>> you can't actually run 1300 compiles in parallel. Maybe your Makefile only
>> allows about 16 compiles to run in parallel on average.
>>
>
> I believe I fixed my makefiles to be, after a couple of short initial
> serial steps, fully parallel in compiling the source, both per directory and
> per source file. I do see the directories being interleaved in my output,
> and also big bursts of files from the same directory being launched.
>
>
> --
> Robert W. Anderson
> Center for Applied Scientific Computing
> Email: anderson110 at llnl.gov
> Tel: 925-424-2858 Fax: 925-423-8704
> __ distcc mailing list http://distcc.samba.org/
> To unsubscribe or change options:
> https://lists.samba.org/mailman/listinfo/distcc
>
--
Fergus Henderson <fergus at google.com>
-------------- next part --------------
HTML attachment scrubbed and removed
More information about the distcc
mailing list