[distcc] load management

Wed Apr 28 00:41:55 GMT 2004

Wayne Davison wrote:
> Here's an idea I had been kicking around for awhile but never got to.
> All systems that want to make distcc requests need at least one local
> server process.  This server would be responsible for talking with the
> other servers that were around (configured in some manner), keeping
> track of each one's load (they would have either an open socket or get
> regular UDP updates) and also how fast they have executed jobs (per K)
> at various loads.  The idea is that this current status is maintained on
> each local server (so no central server going down is a single point of
> failure), but not by every client when it starts up. 

Yes, that's the sort of thing I've been thinking about as the next
step past "just drop connection if load too high".

 > I imagined this
> server being the same one that handles the job requests from other
> machines (if configured to accept remote jobs).  So, the local distcc
> executable would just connect to the server at localhost, ask it to run
> a job, and send it the output of the preprocessed input.  The local
> server would be free to either send the job off to another host, or run
> it locally.

I don't think that's needed.  In fact, it's probably better if
the local server is connected to via a unix domain socket.
That's slightly faster and more secure.
Also, it lets us do tricky things like passing an open
socket from the local server to the distcc program,
so the bytes don't have to get relayed through the local server.
(I've been waiting ten years for a reason to use fd passing!)
It even lets us pass credentials, so the local server could
even know for sure which unix user was making the request;
that could come in handy if we want to restrict status info
about jobs to the user who submitted the jobs.

> The stats kept could start simple (with job balancing and overloading
> protection) and get as complicated as desired.  For instance, a future
> complicated algorithm could take into account transfer time in addition
> to compile speed per-K and current load.  (I imagine every job that a
> remote server performs coming back with the elapsed compile time as
> measured on the remote system so that the sending server could figure
> out how fast/slow the transfer part of the equation was.  I forget if
> distcc already has this or not.)
> 
> To get status, a status tool would connect to the local server and ask
> for a summary.

Absolutely.

- Dan