[distcc] (fwd from peter@hawkins.emu.id.au) Bug#181152: distcc host selection algorithm is too naive

Martin Pool mbp at samba.org
Mon Feb 17 01:16:34 GMT 2003


----- Forwarded message from Peter Hawkins <peter at hawkins.emu.id.au> -----

From: peter at hawkins.emu.id.au (Peter Hawkins)
Subject: Bug#181152: distcc host selection algorithm is too naive
Date: Sun, 16 Feb 2003 12:09:17 +1100
To: submit at bugs.debian.org
User-Agent: Mutt/1.5.3i
X-Spam-Status: No, hits=-1.9 required=4.0 tests=HAS_PACKAGE,
    SPAM_PHRASE_00_01,USER_AGENT,USER_AGENT_MUTT version=2.44
X-Bogosity: No, tests=bogofilter, spamicity=0.000000, version=0.10.2

Package: distcc
Version: 1.1-1
Severity: wishlist 

Hi...

Thanks for packaging distcc.

The algorithm distcc uses to perform selection of which host a job
should be built on is too naive. The relevant code in src/where.c tends
to favour placing a job on the first machine in the DISTCC_HOSTS list
(since the algorithm basically equates to 'pick the first machine in the
list with a free execution slot').

With my experimental setup of 4 nearly identical machines, three of them
diskless (NFS root) 2.0Ghz P4 workstations with 512Mb of RAM, and one master
machine with the same characteristics as well as an IDE disk drive, I
found that predominantly all jobs were being sent to the first machine
to the exclusion of nearly all the others (as demonstrated by the use of
top, ps and looking at the load averages of the machines as the build
was occurring).

The test I was using was the Linux 2.5.61 kernel source tree built with
the commands:

export DISTCC_HOSTS="machine1 machine2 machine3 localhost"
time make -j20 CC='distcc'

Before:
real    10m58.661s
user    2m1.590s
sys     0m30.380s

I then attempted did a quick hack of the selection algorithm that picks
a random starting point in the list, and then walks along the list by a
small prime increment (say 3) until a free slot is found or all hosts
have been checked (wrapping back to the beginning of the list when
necessary).

After:
real    5m41.631s
user    3m0.200s
sys     0m32.750s

Caching should not have played a big role, since although I performed
both builds consecutively, I performed the 'before' test after a
previous build had been done (so roughly similar things should have been
cached). I can investigate this further if it seems worthwhile. I also
intend on scaling the test up to a set of 20 machines once I get time.

It seems there are substantial gains to be made from using a better queueing
algorithm...

=)
Peter


----- End forwarded message -----
-- 
Martin 


More information about the distcc mailing list