[distcc] load management

Martin Pool mbp at samba.org
Fri Apr 30 01:08:43 GMT 2004


On 29 Apr 2004, Daniel Kegel <dank at kegel.com> wrote:
> Martin Pool wrote:
> >Please look at the notes on this in the TODO and protocol-3.txt
> >documents in the source distribution.
> 
> Unfortunately, protocol-3.txt isn't in the current release,
> and you don't have a web gateway set up so we can look at
> the arch repository yet.
> 
> Is this a clever ploy to get us to install the arch client?  :-)

Ha ha.  Not really, though if I really want to fulfil my Arch sales
quota I suppose it should be.

The nub of it is that the protocol should change so that there is a
HELO-style handshake after the client connects before it starts to
send the job.  This can specify the highest protocol level understood
by each end.

  C: Hello, I speak protocol 3
  S: Welcome, I speak protocol 3.  Go ahead.

So the client knows the server is actually connected to the socket,
rather than merely having had the remote kernel accept the connection
and put it in the queue.

(Incidentally this might make handling of mismatched or downlevel
versions a little better; we can find out what the server supports and
then talk that, or at least give a nice error.)

Having done this, we change the client so that it will in parallel
open a connection to N different servers.  The client will proceed to
send a request to whichever responds first, and it will drop the other
connections.

So servers which are too loaded can just not respond, and the client
will go somewhere else.  It probably also gives a slight skew towards
servers which are faster/closer, but I'm not really counting on that.

This can be done purely over TCP.

(This could be refined a bit to better handle hoarded SSH connections;
in particular we might not drop the request but rather just send
nothing, or send a "never mind" packet.)

I think it's enough for servers to be either "too busy" or "not too
busy".  

> For what it's worth, I suspect that two simple changes --
> randomizing the host list and (at least with the forking server)
> dropping connection on high load average --
> would measurably improve performance on a shared cluster
> of workstations doubling as distcc servers
> without any protocol changes.  If Josh and I get a chance, we'll
> give those little tweaks a try.

Why only on the forking server?

> Since some people rely on the host list not being randomized,
> this would have to be optional somehow, I suppose.

> Likewise, the load average check should probably be
> controlled by a server commandline option.

We already have code to determine the number of CPUs.  I think it
would be OK to say by default that more than 2*NCPUS is too busy.

-- 
Martin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.samba.org/archive/distcc/attachments/20040430/306888cc/attachment.bin


More information about the distcc mailing list