[distcc] Aborting jobs in status "Connect"

Christian Breimann distccGroupEmail.20.cbreimann at spamgourmet.com
Wed Oct 1 07:26:56 GMT 2003


Hello,

I am using distcc to distribute jobs on several computers in an university 
network. Some of them usually have problems like a very high load caused by 
unsuccesful closed X-sessions or strange cups-daemon-processes. Due to 
problems like this, some machines allow normal users like me to connect, 
but not to log-in, i.e., resulting in "hanging" ssh- or rsh-sessions. Only 
root is able to log-in and terminate such high-load-processes or reboot the 
machine. I cannot ask my system administrator to do this several times a 
day.

However, in these cases, distcc seems to have a similar problem. The 
graphical monitor shows that distcc is in "Connect"-Status for several 
seconds or even minutes without anything happening for that job. All other 
machines get their jobs, finish them and get new jobs, only this one 
machine hangs. After everything has been completed, I can terminate the 
make-run using CTRL-C and start it again, so that the last jobs gets 
finished this time on another machine.

So I wonder, whether distcc can do the following for me:
If a distributed job remains in "Connect"-status for a certain amount of 
time, perhaps a user-defined number of seconds or a default of 10 seconds, 
distcc should kill this job, mark the machine as not available and 
redistribute the job in the same way as if the machine is not reachable at 
all.
If "Send"-status is reached before this time limit everything should be 
processed as before.

Is this easy to integrate or would it cause a big amount of work?

Best regards,

Christian Breimann




More information about the distcc mailing list