[distcc] distcc scalability with # of users?

Daniel Kegel dkegel at google.com
Sat Apr 17 01:42:13 GMT 2004

Dan Kegel wrote:
> I'm interested in using distcc in a group of 16 programmers
> where all 16 workstations are equally fast, all 16
> workstations are both compile servers and compile clients,
> and all access a shared copy of distcc over NFS or SMB.
> Several potential problems come to mind when thinking about this:
> 1. Since the list of hosts read from $prefix/etc/distcc/hosts is
> the same for all workstations, every workstation will
> issue large compile jobs to itself sometimes even though it'd be better
> off only handling preprocessing and linking (right?)

We couldn't demonstrate this with our little synthetic benchmark.
(Maybe it happens in the real world; no idea.)

> 2. Distcc won't currently check the load average of each compile server,
> so workstations busy with non-distcc jobs will get slammed with
> distcc jobs, negatively impacting normal use of the workstations.
> 3. If more than one user is issuing distcc jobs, their distcc's
> will sometimes issue jobs to the same machine by chance
> (fairly often, if distcc assigns jobs in order of the etc/distcc/hosts 
> file).

We did verify these two just now using a trivial synthetic benchmark.

It'd probably be 'easy' to make the distcc server check the load
average, and drop the connection if it was over some configurable threshold.
As dparent.c says,

  * @todo Quite soon we need load management.  Basically when we think
  * we're "too busy" we should stop accepting connections.  This could
  * be because of the load average, or because too many jobs are
  * running, or perhaps just because of a signal from the administrator
  * of this machine.

So maybe we'll try to implement this todo note.
- Dan

More information about the distcc mailing list