[distcc] Problems with distcc hanging on large compiles (More info)

Hien D. Ngo hien at moses.xp.com
Fri Aug 16 07:48:01 GMT 2002


A little more data:

One of the volunteer machines is in this state (netstat -an):

Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0     93 192.168.0.103:4200     192.168.0.252:55003     FIN_WAIT1   

The master machine shows these pertinent connection:

tcp        0      0 192.168.0.252:55003     192.168.0.103:4200     ESTABLISHED 

Left on its own, it will just sit in this stuck state forever.

Hien

---- Original Message ----
From:		Hien D. Ngo
Date:		Fri 8/16/02 10:32
To:		distcc at lists.samba.org
Subject:	[distcc] Problems with distcc hanging on large compiles


First off, have to say that this is a fantastic piece of software.

When I've gotten a complete run through without hanging up, my compiles sped up by a 
factor of 2-3x.  I'll publish concrete numbers when everything is fixed up.

Here's my environment/problem:

* Compiling on mixed Redhat Linux environment running 2.2.x and 2.4.x (all with gcc-
2.95.2 compiler).
* Machines range from dual P3 850's to dual P3 1.4 Ghz (6 machines total in my test 
bed.)
* Very large C/C++ codebase.
* Compiling with 'make -j15'  Average of 3-4 compiles concurrent per machine.
* Network is 100Mbs/full duplex, though machines are on lots of different segments.

Here's the log entries I get in my /var/log/messages on all machines:

Aug 16 11:02:46 foobar distccd[32644]: (dcc_readx) CRITICAL! unexpected eof on fd5
Aug 16 11:02:46 foobar distccd[32644]: (dcc_expect_token) ERROR: read failed while 
waiting for token "DOTI"
Aug 16 11:08:06 foobar
Aug 16 11:08:06 foobar syslogd: Cannot glue message parts together
Aug 16 11:08:17 foobar distccd[1314]: (dcc_readx) CRITICAL! unexpected eof on fd5
Aug 16 11:08:17 foobar distccd[1314]: (dcc_expect_token) ERROR: read failed while 
waiting for token "DOTI"

I also usually have several defunct distccd processes on each of my servers.  When 
the compile hangs, the last bit of output that I see tends to be "Leaving 
directory ..." so I presume it's likely a problem with the socket close/cleanup.

Any help would be greatly appreciated.  Thanks,

Hien

_______________________________________________
distcc mailing list
distcc at lists.samba.org
http://lists.samba.org/cgi-bin/mailman/listinfo/distcc




More information about the distcc mailing list