[distcc] small redesign...

Thu Oct 16 01:43:02 MDT 2014

Hi Martin,

Lets assume that you can trigger more compilation tasks executors then you have.
In this scenario you are facing situation that cluster is saturated.
When such a compilation will be triggered by two developers, or two CI
(e.g jenkins) jobs, then cluster is saturated twice...

Default behaviour is to lock locally slot, and try to connect three
times, if not, fallback, if fallback is disabled CI got failed build
(fallback is not the case, since local machine cannot handle -j
$(distcc -j)).

consider scenario, I have 1000 objects, 500 executors,
- clean build on one machine takes
  1000 * 20 sec (one obj) = 20000 / 16 processors = 1000 sec,
- on cluster (1000/500) * 20 sec = 40 sec

Saturating cluster was impossible without pump mode, but now with pump
mode after "warm up" effect, pump can dispatch many tasks, and I faced
situation that saturated cluster destroys almost  every compilation.

My expectation is that cluster wont reject my connect, or reject will
be handled, either by client, either by server.

by server:
- accept every connetion,
- fork child if not accepted by child,
- in case of pump prepare local dir structure, receive headers
- --critical section starts here-- multi value semaphore with value maxchild
  - execute job
- release semaphore

Also what you suggested may be even better solution, since client will
pick first avaliable executor instead of entering queue, so distcc
could make connection already in function dcc_lock_one()

I already tried to set DISTCC_DIR on a common nfs share, but in case
you are triggering so many jobs, this started to be bottle neck... I
won't tell about locking on nfs, and also scenario that somebody will
make a lock on nfs and machine will got crash - will not work by
design :)

I know that scenario is not happening very often, and it has more or
less picks characteristic, but we should be happy that distcc cluster
is saturated and this case should be handled.

hope it's more clear now!
br
LT

Łukasz Tasz

2014-10-16 1:39 GMT+02:00 Martin Pool <mbp at sourcefrog.net>:
> Can you try to explain more clearly what difference in queueing behavior you
> expect from this change?
>
> I think probably the main change that's needed is for the client to ask all
> masters if they have space, to avoid needing to effectively poll by
> retrying, or getting stuck waiting for a particular server.
>
> On Wed, Oct 15, 2014 at 12:53 PM, Łukasz Tasz <lukasz at tasz.eu> wrote:
>>
>> Hi Guys,
>>
>> please correct me if I'm wrong,
>> - currently distcc tries to connect server 3 times, with small delay,
>> - server forks x childs and all of them are trying to accept incoming
>> connection.
>> If server runs out of childs (all of them are busy), client will
>> fallback, and within next 60 sec will not try this machine.
>>
>> What do you think about redesigning distcc in a way that master server
>> will always accept inconing connection, fork a child, but in a same
>> time only x of them will be able to enter compilation
>> task(dcc_spawn_child)? (mayby preforking still could be used?)
>>
>> This may create kind of queue, client always can decide by his own, if
>> can wait some  time, or maximum is DISTCC_IO_TIMEOUT, but still it's
>> faster to wait, since probably on a cluster side it's just a pick of
>> saturation then making falback to local machine.
>>
>> currently I'm facing situation that many jobs are making fallback, and
>> localmachine is being killed by make's -j calculated for distccd...
>>
>> other trick maybe to pick different machine, if current is busy, but
>> this may be much more complex in my opinion.
>>
>> what do you think?
>> regards
>> Łukasz Tasz
>> __
>> distcc mailing list            http://distcc.samba.org/
>> To unsubscribe or change options:
>> https://lists.samba.org/mailman/listinfo/distcc
>
>
>
>
> --
> Martin