[distcc] Re: distcc 0.10.1 - bugfixes for Solaris

Thu Sep 19 01:25:09 GMT 2002

On 18 Sep 2002, Gerhard Kutzelnigg <gerry at kutzelnigg.de> wrote:

> Hi Martin,
> 
> oops, one error was me. ;-) The W_EXITCODE is actually correctly handled. I 
> was using distcc-0.9 just two days ago and then fixed the "bug" in 0.10.1 as 
> well without testing if it was already fixed... ;-)
> 
> Sorry to bother, but I've got some comments on EINTR etc.
> 
> I am not sure if you think that it is strange that one must handle EINTR. It 
> looks to me that you think it should not happen during a read/write/sendfile 
> or whatever. Well, this is an error condition which really can and will 
> occure. You can use siginterrupt() to tell the system to interrupt / to not 
> interrupt system calls, if you want to ignore it, but it's not a bad 
> behaviour if a call is interrupted. 

Yes, I understand EINTR.  Previous to 0.10, we ignored SIGCHLD and
therefore there was no need to resume from a nonfatal signal.  (Well,
I suppose SIGSTOP and so on might have caused it, but anyhow.)  I
agree that read and write need to handle this.  It is merely a bug
that they do not.

> In fact it is very useful if, for example, you have signal handlers telling 
> you about a child termination or a timer signal etc. You can set a flag then 
> within this signal handler. But of course your main program needs to be 
> interrupted so you can see that some condition occured. From within the main 
> program you can now check this flag and do whatever you like ;-)
> 
> You have a notice message in io.c when calling sendfile() telling if a file 
> transfer was not complete in one go ("partial transmission, retrying x 
> bytes.."). This was quite confusing because when I read it I thought that 
> something really was wrong. I just could not imagine why a retransmission 
> should happen on a tcp socket... Looking in the code I noticed that it's 
> actually a notice message for a condition which is neither an error nor 
> something "worth" to tell about... Things like this can just happen (and 
> sometimes this is intended).

It's not an error, but it is an interesting condition.  Therefore a
low-priority trace message.

> Just think about a server which has to handle lots of connections. It can not 
> assure that every read/write always succeeds, maybe it is only possible to 
> only send a single byte.

I think that would cause read() to return 1, rather than raising EINTR.

> But if you are EINTRupted, you can take care of the 
> other connections.... (Okay in this example you would use select() and 
> non-blocking IO but it's just an example)
> 
> Again sorry to bother, maybe you already know about this, then it's okay. 
> Otherwise I hope I could give you a small explaination... :-)
> 
> One other thing I was thinking about was to do something with the distribution 
> mechanism in distcc. The locking stuff (for-loops counting to 50, creating 
> lockfiles etc.) works, but is not really nice...

> So if I hopefully find some 
> time I might write a scheduler daemon which handles this all. It would work 
> like this: Every volunteer distccd connections to this scheduler 
> (DISTCC_HOSTS will not be needed any more). This scheduler knows about all 
> connected volunteers. Whenever a compilation is requested, the distcc will 
> ask the scheduler about the host where to compile and after successfull 
> compilation send some statistical job information (execution time, lines / 
> second etc.) back to the scheduler. 

There is already an FAQ about this and a lot of discussion in the
mailing list.  Rather than explaining it to everybody I have answered
it there; if you have something to add please let me know.

> This scheduler will then internally build a list containing the locks which 
> job is running where and sort this list so that faster computers will more 
> often get compile requests.

That is an emergent property of the current leaky bucket system.

> You will never need again to adjust the 
> DISTCC_HOSTS on every single compile station because a new volunteer is ready 
> or some other is offline or whatever.
> 
> So please tell me if you're already planning something like this, then I have 
> some more spare time ;-)) Otherwise I'll try to catch some free evenings to 
> code this scheduler. I intend to implement it as a generic class which can be 
> used by every program which needs to distribute jobs anyhwere...
> 
> Have a nice day,
> 
> bye
> gerry

-- 
Martin