Different problem (Re: [distcc] Problems with distcc hanging on large compiles (Patch not effective))
mbp at samba.org
Sat Sep 14 10:00:01 GMT 2002
On 30 Aug 2002, Andreas Granig <andreas.granig at infonova.com> wrote:
> if I got it correctly, your problem is a hang on the client side? So
> I've the same problem here. The strange thing is that it only occures
> when distributing a job to a specific machine (client is Debian
> unstable on 2.4.18, daemon is Debian stable on 2.2.20), all other
> machines (Debian stable/unstable, RedHat, SuSe) run fine :o/
> It happens that the client is blocking in io.c - dcc_pump_readwrite(...)
> while read()ing the successfully compiled .o-file. "wanted" is e.a. 150000
> bytes, but I only read 149050 and than read() blocks. It seems that in
> some circumstances either "wanted" is calculated wrong on daemon side or
> some bytes of the .o-file get lost in some way...
> Little more info:
> *** client ***
> ** netstat **
> [agranig at azrael:agranig]$ netstat -pnat|grep distcc
> tcp 0 0 10.1.13.83:34845 10.1.7.24:4200 ESTABLISHED 13575/distcc
> *** daemon ***
> ** netstat **
> [agranig at corelli:agranig]$ netstat -pnat|grep distccd
> tcp 0 0 0.0.0.0:4200 0.0.0.0:* LISTEN 22789/distccd
> ** ps **
> [agranig at corelli:agranig]$ ps auxw|grep distccd
> agranig 22789 0.0 0.0 1380 136 ? SN Aug28 0:00 src/distccd --concurrent 1 --nice 5 --log-file=/home/agranig/distccd_corelli.log --verbose
It looks like the server task has terminated, but the client is still
trying to read. What I really need to see is the netstat information
about the server socket for this client. In this case, it would be
"10.1.13.83:34845 - 10.1.7.24:4200". Your grep command didn't find it
because the distccd process has presumably already terminated. Please
run "netstat -potentate" and grep for the client port number to get
the relevant fields.
At a guess, turning off corks on the server would fix it, but please
send the info anyhow. Also a tcpdump would be very useful.
> Btw, Martin, what about that idea/patch about task limitiation I sent
> you per mail last week? Already had a look on that?
You really have to send unified diffs to open source projects. Plain
diffs are useless for a CVS project because they can't be merged after
the tree has changed.
I thought I replied but maybe not: basically I agree that task
limitation is useful, but adding extra protocol interchanges is bad
for performance and simplicity. I think the scheduler will do
adequately if the server just throttles itself to an acceptable load.
More information about the distcc