[distcc] Re: distcc 0.10.1 - bugfixes for Solaris
Martin Pool
mbp at samba.org
Thu Sep 19 01:25:09 GMT 2002
On 18 Sep 2002, Gerhard Kutzelnigg <gerry at kutzelnigg.de> wrote:
> Hi Martin,
>
> oops, one error was me. ;-) The W_EXITCODE is actually correctly handled. I
> was using distcc-0.9 just two days ago and then fixed the "bug" in 0.10.1 as
> well without testing if it was already fixed... ;-)
>
> Sorry to bother, but I've got some comments on EINTR etc.
>
> I am not sure if you think that it is strange that one must handle EINTR. It
> looks to me that you think it should not happen during a read/write/sendfile
> or whatever. Well, this is an error condition which really can and will
> occure. You can use siginterrupt() to tell the system to interrupt / to not
> interrupt system calls, if you want to ignore it, but it's not a bad
> behaviour if a call is interrupted.
Yes, I understand EINTR. Previous to 0.10, we ignored SIGCHLD and
therefore there was no need to resume from a nonfatal signal. (Well,
I suppose SIGSTOP and so on might have caused it, but anyhow.) I
agree that read and write need to handle this. It is merely a bug
that they do not.
> In fact it is very useful if, for example, you have signal handlers telling
> you about a child termination or a timer signal etc. You can set a flag then
> within this signal handler. But of course your main program needs to be
> interrupted so you can see that some condition occured. From within the main
> program you can now check this flag and do whatever you like ;-)
>
> You have a notice message in io.c when calling sendfile() telling if a file
> transfer was not complete in one go ("partial transmission, retrying x
> bytes.."). This was quite confusing because when I read it I thought that
> something really was wrong. I just could not imagine why a retransmission
> should happen on a tcp socket... Looking in the code I noticed that it's
> actually a notice message for a condition which is neither an error nor
> something "worth" to tell about... Things like this can just happen (and
> sometimes this is intended).
It's not an error, but it is an interesting condition. Therefore a
low-priority trace message.
> Just think about a server which has to handle lots of connections. It can not
> assure that every read/write always succeeds, maybe it is only possible to
> only send a single byte.
I think that would cause read() to return 1, rather than raising EINTR.
> But if you are EINTRupted, you can take care of the
> other connections.... (Okay in this example you would use select() and
> non-blocking IO but it's just an example)
>
> Again sorry to bother, maybe you already know about this, then it's okay.
> Otherwise I hope I could give you a small explaination... :-)
>
> One other thing I was thinking about was to do something with the distribution
> mechanism in distcc. The locking stuff (for-loops counting to 50, creating
> lockfiles etc.) works, but is not really nice...
> So if I hopefully find some
> time I might write a scheduler daemon which handles this all. It would work
> like this: Every volunteer distccd connections to this scheduler
> (DISTCC_HOSTS will not be needed any more). This scheduler knows about all
> connected volunteers. Whenever a compilation is requested, the distcc will
> ask the scheduler about the host where to compile and after successfull
> compilation send some statistical job information (execution time, lines /
> second etc.) back to the scheduler.
There is already an FAQ about this and a lot of discussion in the
mailing list. Rather than explaining it to everybody I have answered
it there; if you have something to add please let me know.
> This scheduler will then internally build a list containing the locks which
> job is running where and sort this list so that faster computers will more
> often get compile requests.
That is an emergent property of the current leaky bucket system.
> You will never need again to adjust the
> DISTCC_HOSTS on every single compile station because a new volunteer is ready
> or some other is offline or whatever.
>
> So please tell me if you're already planning something like this, then I have
> some more spare time ;-)) Otherwise I'll try to catch some free evenings to
> code this scheduler. I intend to implement it as a generic class which can be
> used by every program which needs to distribute jobs anyhwere...
>
> Have a nice day,
>
> bye
> gerry
--
Martin
More information about the distcc
mailing list