[distcc] Re: (fwd from peter@hawkins.emu.id.au) Bug#181152: distcc host selection algorithm is too naive

Martin Pool mbp at samba.org
Mon Feb 17 01:44:07 GMT 2003


On 17 Feb 2003, Martin Pool <mbp> wrote:

> The algorithm distcc uses to perform selection of which host a job
> should be built on is too naive.

OK.  It seems to work for me (spreading load reasonably), but I have
only personally tested up to three machines.

> The relevant code in src/where.c tends to favour placing a job on
> the first machine in the DISTCC_HOSTS list (since the algorithm
> basically equates to 'pick the first machine in the list with a free
> execution slot').

It's meant to be slightly biased towards the earlier machines, so that
people can prefer faster ones.  Perhaps it is too naive, but I am
surprised that the results are as bad as you report.

Is it perhaps the case that the state directory is stored on an NFS
disk where locking is not working properly?  In a verbose trace, can
you see it finding any locks to be busy?

If locking was failing altogether then I'd expect the kind of speedup
you report from just doing random distribution.

> With my experimental setup of 4 nearly identical machines, three of them
> diskless (NFS root) 2.0Ghz P4 workstations with 512Mb of RAM, and one master
> machine with the same characteristics as well as an IDE disk drive, I
> found that predominantly all jobs were being sent to the first machine
> to the exclusion of nearly all the others (as demonstrated by the use of
> top, ps and looking at the load averages of the machines as the build
> was occurring).

I can't understand at the moment why this would happen.  Surely if
many jobs are running on the first machine then at least some of its
slots would be marked as busy and jobs would flow to the next one.

Please post a representative subset of the verbose log from the
client.  Say the first 3000 lines.

-- 
Martin 


More information about the distcc mailing list