[distcc] Remote Fallback

Thomas Walker Thomas.Walker at morganstanley.com
Thu May 22 03:44:18 GMT 2003

Martin Pool wrote:

> On 21 May 2003, Thomas Walker <Thomas.Walker at morganstanley.com> wrote:
> > You are correct about the reasoning... I am going to have a very
> > large number of backend servers in the list that may, on occasion,
> > not be available.  I was looking at just carefully managing the list
> > so that I make sure that everything in it is good but that adds a
> > reasonably significant amount of overhead.
> Another thing you can do is have some kind of script that updates
> /etc/distcc/hosts depending on the network state.  (Needing a separate
> program to do it is perhaps ugly, but it allows some flexibility.)

Unfortunately this doesn't really work in our environment (updating things in
/etc on thousands of potential client machines is very much non-trivial).  I
already have a perl hack for host selection based on basic entitlement
principles but haven't taken it to the point where I'm checking potential server
availability (this involves too much overhead to do per execution and I, like
you, have been shying away from having a daemon that handles server allocation
at this point).

> > I've seen people on this list wondering whether distcc could be made
> > to try other remote hosts before falling back but never saw a
> > response.
> > I was looking at the 2.3 code yesterday and it seemed to require a
> > reasonably intrusive patch (make the hostlist static and reorder how
> > some things are done in distcc.c to make the retry logic possible
> > without repeating some things unnecessarily) but it sounds like you
> > may have already made some of the changes I would need in 2.4.  I'll
> > take a look at the new code and see what I can do...
> I suspect remembering unreachable hosts may give decent performance in
> this situation while still keeping things conceptually simple.  Let us
> know.

I was looking at making the host list static and just simply removing a server
for that execution but, with the addition of the backoff, will probably just
mark the server as bad via the method you've provided.  This could, however,
cause a problem if it takes you more than <backoff time> (currently 60sec) to go
through the list completely.  Is there a reason why you picked 60 seconds aside
from convenience (I don't know about your environment, but in mine, if I lose a
machine, it takes a lot more than a minute for it to return - even a panic
followed by an automatic reboot takes at least 2 min to come back completely,
maybe you were thinking of other sorts of temporary problems).

> In 3.x I'd like the client to simultaneously open connections to
> several hosts and use whichever one is ready first.  That should also
> help with this situation.  I don't think it would be worthwhile to
> change it in 2.x, but of course you can try if you want.

ala dhcp... would be a very nice idea but definitaly a non-trivial hack to the
current code.

> --
> Martin
> linux.conf.au 2004: Adelaide, Australia         http://lca2004.linux.org.au/

NOTICE: If received in error, please destroy and notify sender.  Sender does not
waive confidentiality or privilege, and use is prohibited.

More information about the distcc mailing list