[distcc] timeouts in 2.17

Martin Pool mbp at sourcefrog.net
Sun Aug 8 23:53:20 GMT 2004


On  8 Aug 2004, Jean Delvare <khali at linux-fr.org> wrote:
> Hi all,
> 
> The announcement for 2.17 mentioned the addition of timeouts on network
> operations. This sounds like a very interesting feature. To say the
> truth, this was the only feature that was still missing before I could
> propose distcc as replacement at my job's place for our homemade, bogus
> distributed compilation system. I just gave a try and it seems to work
> very well :) Congratulations!

Hi,

I'm glad it's useful to you.  I finally got around to writing it
because I was travelling with my laptop and I got sick of it blocking
looking for servers.

> I would, however, have questions. I couldn't find any information about
> this in the man page of distccd. Did I miss something, or is it really
> not there?

Excellent questions.  It's not documented because it's meant to "just
work" but it probably should be mentioned.

> My questions:
> 
> What is the default timeout?

There are several timeouts for different phases of work.  (These
correspond to the strip colors in the GUI monitor.)

I did this because it might take a long time to build the file, but if
we can't open the connection in a few seconds then it's either not
there, or too slow to be useful.

On the client:

5s: Resolving the hostname and opening the tcp connection.

30s: Sending the request.

180s: Compilation.

On the server:

30s: Establishing the incoming connection and getting the headers.

30s: Receiving the job.

240s: Compilation and sending results back.

> How frequently are hosts retried after a timeout (by default)? After a
> connection failure (by default)?

In either case, they are "disliked" for 60s, and removed from
consideration for scheduling.

A common case is to have no servers reachable because the client is
disconnected.  Sometimes the client will know straight away that they
are unreachable, because there is no route.  In that case you just get
a warning for each host approximately once per minute.  

A worse case is that there is a route, but it doesn't get to the
machines.  In that case for each server it will delay one job for five
seconds once per minute, and the job will then run locally.  It seems
to work pretty well.  It turns out that the delay does not hurt too
much because the client is busy from running everything locally.

At the moment it does not retry jobs remotely, but it probably should.

> Is there a way to control these values?

There is not.  If there's any case where the defaults don't work I'd
rather fix the defaults if possible.

-- 
Martin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.samba.org/archive/distcc/attachments/20040809/6b9a426a/attachment.bin


More information about the distcc mailing list