Different problem (Re: [distcc] Problems with distcc hanging on large compiles (Patch not effective))

Martin Pool mbp at samba.org
Sat Sep 14 10:00:01 GMT 2002


On 30 Aug 2002, Andreas Granig <andreas.granig at infonova.com> wrote:
> Hi,
> 
> if I got it correctly, your problem is a hang on the client side? So
> I've the same problem here. The strange thing is that it only occures
> when distributing a job to a specific machine (client is Debian
> unstable on 2.4.18, daemon is Debian stable on 2.2.20), all other
> machines (Debian stable/unstable, RedHat, SuSe) run fine :o/
> 
> It happens that the client is blocking in io.c - dcc_pump_readwrite(...)
> while read()ing the successfully compiled .o-file. "wanted" is e.a. 150000
> bytes, but I only read 149050 and than read() blocks. It seems that in
> some circumstances either "wanted" is calculated wrong on daemon side or
> some bytes of the .o-file get lost in some way...
> 
> Little more info:
> 
> 
> *** client ***
> 
>  ** netstat **
> [agranig at azrael:agranig]$ netstat -pnat|grep distcc
>  tcp        0      0 10.1.13.83:34845        10.1.7.24:4200  ESTABLISHED 13575/distcc
> 
> *** daemon ***
> 
>  ** netstat **
> [agranig at corelli:agranig]$ netstat -pnat|grep distccd
> tcp        0      0 0.0.0.0:4200            0.0.0.0:*               LISTEN      22789/distccd
> 
>  ** ps **
> [agranig at corelli:agranig]$ ps auxw|grep distccd
> agranig  22789  0.0  0.0  1380  136 ?        SN   Aug28   0:00 src/distccd --concurrent 1 --nice 5 --log-file=/home/agranig/distccd_corelli.log --verbose

It looks like the server task has terminated, but the client is still
trying to read.  What I really need to see is the netstat information
about the server socket for this client.  In this case, it would be
"10.1.13.83:34845 - 10.1.7.24:4200".  Your grep command didn't find it
because the distccd process has presumably already terminated.  Please
run "netstat -potentate" and grep for the client port number to get
the relevant fields.

At a guess, turning off corks on the server would fix it, but please
send the info anyhow.  Also a tcpdump would be very useful.

> Btw, Martin, what about that idea/patch about task limitiation I sent
> you per mail last week? Already had a look on that?

You really have to send unified diffs to open source projects.  Plain
diffs are useless for a CVS project because they can't be merged after
the tree has changed.

I thought I replied but maybe not: basically I agree that task
limitation is useful, but adding extra protocol interchanges is bad
for performance and simplicity.  I think the scheduler will do
adequately if the server just throttles itself to an acceptable load.

-- 
Martin 



More information about the distcc mailing list