[distcc] Re: distcc task limitation

Wed Sep 11 12:56:00 GMT 2002

On 22 Aug 2002, Andreas Granig <andreas.granig at infonova.com> wrote:

> My first modification was to resolve multi-A-records to hold the IPs of
> the buildhosts centralized on an internal nameserver and export
> DISTCC_HOSTS="localhost distcchosts" on the client machines.
> Adding/removing a buildhost now doesn't demand modifications on the
> client machines.

That sounds good.  I was also thinking about adding support for the
new DNS SRV record type.

> I've moved gethostbyname() from dcc_open_socket_out(...) to
> dcc_pick_buildhost(...) to include multiple IPs for a hostname in the
> round robin selection of the buildhosts. Now the struct dcc_hostdef also
> holds a struct in_addr with the selected ip.

> The second modification was limiting concurrent tasks.
> This means that the client have to know if a buildhost is accepting
> further jobs. To hold the network traffic small, the client
> opens a connection to the buildhost while picking a proper one; it locks
> the lockfile for that host, sends the header information ("DIST" and
> PROTO_VER) and then receives an integer indicating the amount of
> possible jobs on the buildhost. If that amount is 0, it closes the
> connection and tries the next host. If the amount is >1, it releases the
> lockfile for letting other processes from that client sending jobs to
> that buildhost. Then, everything as supplied before, ARGC, ARGV and the
> file is sent to the buildhost and so on.

OK, that is a reasonable way to do it.

I like having just a single request and reply, though it's not a big
deal.

What I had in mind was that the server would accept jobs whenever they
are sent, but it will not actually run the compiler until it had a
sufficiently low number of tasks running.  This can be regulated using
a lockfile system as on the client, which will work for either daemon
or inetd mode.  Also it can check load average or whatever else.  

This may do better at keeping the network evenly loaded.

This approach by itself will do poorly when one machine has many more
CPUs than another.  The slow machine will end up with many jobs queued
up and waiting.

Rather than an explicit protocol turnaround to indicate whether a job
can be accepted I would like to look at just having the network
connection block until it's ready.  Then the client might just make
sure it has only one attempt to connect to any server in progress at
any time.

-- 
Martin