[distcc] distcc scalability with # of users?

Fri Apr 16 05:08:57 GMT 2004

Martin Pool wrote:
>>1. Since the list of hosts read from $prefix/etc/distcc/hosts is
>>the same for all workstations, every workstation will
>>issue large compile jobs to itself sometimes even though it'd be better
>>off only handling preprocessing and linking (right?)
> 
> Linking counts against jobs running on the local machine too.  If it's
> linking in parallel with compiling then it should try to do the
> compiles remotely.

Ah, so it notices that "localhost" is the same machine as "zytor"
when running on zytor?

> It would probably be good to finish off support for DNS multi-A
> records, and use that to spread work across machines.  I don't think
> much more needs to be done.

That's just a direct replacement for the hosts file, though, isn't it?
I'm not sure I want IT to be involved in this; it's a lot easier for
me to modify distcc/hosts than it is to create DNS entries!
[BTW x at xman dot org is working on the Rendezvous patch for distcc on linux,
I hear.  I'm not interested in that myself, but maybe others are.]

>>2. Distcc won't currently check the load average of each compile server,
>>so workstations busy with non-distcc jobs will get slammed with
>>distcc jobs, negatively impacting normal use of the workstations.
> 
> If the workstations have a reasonable amount of memory then running a
> couple of low-priority daemons should not hurt too much.  Remember it
> will only accept about 2*NCPUS.

Yes, but that means the compile jobs (which could run faster on
some other compile server which is ready and waiting) will execute
slower.

> We could check the load average before accepting jobs but that is
> actually a pretty poor measure for modern machines.

Oh, I dunno, the number of processes in 'R' state seems
like it'd be a pretty good measure of load if there's
plenty of RAM and the distcc job wouldn't cause any disk I/O.
Or the number of processes in 'R' or 'D' state if jobs do tend
to do disk I/O, maybe.

>>3. If more than one user is issuing distcc jobs, their distcc's
>>will sometimes issue jobs to the same machine by chance
>>(fairly often, if distcc assigns jobs in order of the etc/distcc/hosts 
>>file).
> 
> 
> Right, so those jobs will just stall for a bit.

Again, reducing the performance of the cluster.

>>Has anyone looked at these issues?   I suppose a first step I
>>might take if nobody else has might be to run a few benchmarks
>>to see if these potential problems actually happen in the real
>>world.
> 
> That would be good.

OK, benchmarks coming up... I only have six machines in the cluster at the
moment, but I can probably coax a few more coworkers into joining for the
good of science :-)   Maybe we'll try running 1 to N/2 copies of the same
job on an N machine cluster (simulating various numbers of different users
doing normal work) and plot the compile time for each.
- Dan

-- 
My technical stuff: http://kegel.com
My politics: see http://www.misleader.org for examples of why I'm for regime change