[distcc] small redesign...

Martin Pool mbp at sourcefrog.net
Thu Oct 23 18:42:09 MDT 2014


It seems like if there's nowhere to execute the job, we want the client
program to just pause, before using too many resources, until it gets
unqueued by a server ready to do the job. (Or, by a local slot being
available.)

On Thu Oct 16 2014 at 2:43:35 AM Łukasz Tasz <lukasz at tasz.eu> wrote:

> Hi Martin,
>
> Lets assume that you can trigger more compilation tasks executors then you
> have.
> In this scenario you are facing situation that cluster is saturated.
> When such a compilation will be triggered by two developers, or two CI
> (e.g jenkins) jobs, then cluster is saturated twice...
>
> Default behaviour is to lock locally slot, and try to connect three
> times, if not, fallback, if fallback is disabled CI got failed build
> (fallback is not the case, since local machine cannot handle -j
> $(distcc -j)).
>
> consider scenario, I have 1000 objects, 500 executors,
> - clean build on one machine takes
>   1000 * 20 sec (one obj) = 20000 / 16 processors = 1000 sec,
> - on cluster (1000/500) * 20 sec = 40 sec
>
> Saturating cluster was impossible without pump mode, but now with pump
> mode after "warm up" effect, pump can dispatch many tasks, and I faced
> situation that saturated cluster destroys almost  every compilation.
>
> My expectation is that cluster wont reject my connect, or reject will
> be handled, either by client, either by server.
>
> by server:
> - accept every connetion,
> - fork child if not accepted by child,
> - in case of pump prepare local dir structure, receive headers
> - --critical section starts here-- multi value semaphore with value
> maxchild
>   - execute job
> - release semaphore
>
>
> Also what you suggested may be even better solution, since client will
> pick first avaliable executor instead of entering queue, so distcc
> could make connection already in function dcc_lock_one()
>
> I already tried to set DISTCC_DIR on a common nfs share, but in case
> you are triggering so many jobs, this started to be bottle neck... I
> won't tell about locking on nfs, and also scenario that somebody will
> make a lock on nfs and machine will got crash - will not work by
> design :)
>
> I know that scenario is not happening very often, and it has more or
> less picks characteristic, but we should be happy that distcc cluster
> is saturated and this case should be handled.
>
> hope it's more clear now!
> br
> LT
>
>
>
>
>
>
>
>
>
> Łukasz Tasz
>
>
> 2014-10-16 1:39 GMT+02:00 Martin Pool <mbp at sourcefrog.net>:
> > Can you try to explain more clearly what difference in queueing behavior
> you
> > expect from this change?
> >
> > I think probably the main change that's needed is for the client to ask
> all
> > masters if they have space, to avoid needing to effectively poll by
> > retrying, or getting stuck waiting for a particular server.
> >
> > On Wed, Oct 15, 2014 at 12:53 PM, Łukasz Tasz <lukasz at tasz.eu> wrote:
> >>
> >> Hi Guys,
> >>
> >> please correct me if I'm wrong,
> >> - currently distcc tries to connect server 3 times, with small delay,
> >> - server forks x childs and all of them are trying to accept incoming
> >> connection.
> >> If server runs out of childs (all of them are busy), client will
> >> fallback, and within next 60 sec will not try this machine.
> >>
> >> What do you think about redesigning distcc in a way that master server
> >> will always accept inconing connection, fork a child, but in a same
> >> time only x of them will be able to enter compilation
> >> task(dcc_spawn_child)? (mayby preforking still could be used?)
> >>
> >> This may create kind of queue, client always can decide by his own, if
> >> can wait some  time, or maximum is DISTCC_IO_TIMEOUT, but still it's
> >> faster to wait, since probably on a cluster side it's just a pick of
> >> saturation then making falback to local machine.
> >>
> >> currently I'm facing situation that many jobs are making fallback, and
> >> localmachine is being killed by make's -j calculated for distccd...
> >>
> >> other trick maybe to pick different machine, if current is busy, but
> >> this may be much more complex in my opinion.
> >>
> >> what do you think?
> >> regards
> >> Łukasz Tasz
> >> __
> >> distcc mailing list            http://distcc.samba.org/
> >> To unsubscribe or change options:
> >> https://lists.samba.org/mailman/listinfo/distcc
> >
> >
> >
> >
> > --
> > Martin
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/distcc/attachments/20141024/224ce2bc/attachment.html>


More information about the distcc mailing list