Different problem (Re: [distcc] Problems with distcc hanging on large compiles (Patch not effective))

Andreas Granig andreas.granig at infonova.com
Thu Aug 29 23:47:00 GMT 2002


if I got it correctly, your problem is a hang on the client side? So
I've the same problem here. The strange thing is that it only occures
when distributing a job to a specific machine (client is Debian
unstable on 2.4.18, daemon is Debian stable on 2.2.20), all other
machines (Debian stable/unstable, RedHat, SuSe) run fine :o/

It happens that the client is blocking in io.c - dcc_pump_readwrite(...)
while read()ing the successfully compiled .o-file. "wanted" is e.a. 150000
bytes, but I only read 149050 and than read() blocks. It seems that in
some circumstances either "wanted" is calculated wrong on daemon side or
some bytes of the .o-file get lost in some way...

Little more info:

*** client ***

 ** netstat **
[agranig at azrael:agranig]$ netstat -pnat|grep distcc
 tcp        0      0
 ESTABLISHED 13575/distcc

 ** gdb **
[agranig at azrael:agranig]$ gdb /usr/local/bin/distcc
GNU gdb 2002-08-18-cvs
(gdb) attach 13575
Attaching to program: /usr/local/bin/distcc, process 13575
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libpopt.so.0...done.
Loaded symbols for /lib/libpopt.so.0
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libnss_dns.so.2...done.
Loaded symbols for /lib/libnss_dns.so.2
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
0x4010abb4 in read () from /lib/libc.so.6
(gdb) backtrace full
#0  0x4010abb4 in read () from /lib/libc.so.6
No symbol table info available.
#1  0x4015ddd0 in __check_rhosts_file () from /lib/libc.so.6
No symbol table info available.
#2  0x0804d7d5 in dcc_r_fd (ifd=5, ofd=6, token=0x804e307 "DOTO", size_out=0x0)
    at bulk.c:198
        len = 206736
#3  0x0804d6a5 in dcc_r_file (ifd=5, filename=0xbffff609 "src/.debug/IString.o",
    token=0x804e307 "DOTO", size_out=0x0) at bulk.c:170
        ofd = 6
        ret = 388167
#4  0x080495b2 in dcc_compile_remote (argv=0x8050ed8,
    cpp_fname=0x8052178 "/tmp/distcc_000003e8/cppout_0000013575.i",
    output_fname=0xbffff609 "src/.debug/IString.o", cpp_pid=13605,
    host=0x8050f58, status=0xbffff398) at distcc.c:179
        stime_usec = 10000
        utime_usec = 90000
#5  0x08049860 in dcc_build_somewhere (argv=0x8050ed8, status=0xbffff398)
    at distcc.c:312
        input_fname = 0xbffff5f6 "src/IString.cxx"
        output_fname = 0xbffff609 "src/.debug/IString.o"
        cpp_fname = 0x8052178 "/tmp/distcc_000003e8/cppout_0000013575.i"
        cpp_pid = 13605
        ret = 0
        host = (struct dcc_hostdef *) 0x8050f58
#6  0x08049a56 in main (argc=14, argv=0xbffff404) at distcc.c:374
        status = 0

*** daemon ***

 ** netstat **
[agranig at corelli:agranig]$ netstat -pnat|grep distccd
tcp        0      0  *               LISTEN      22789/distccd

 ** ps **
[agranig at corelli:agranig]$ ps auxw|grep distccd
agranig  22789  0.0  0.0  1380  136 ?        SN   Aug28   0:00 src/distccd --concurrent 1 --nice 5 --log-file=/home/agranig/distccd_corelli.log --verbose

The logfile doesn't look very interesting, the process terminated

Btw, Martin, what about that idea/patch about task limitiation I sent
you per mail last week? Already had a look on that?


