[distcc] upgrading from 2.16 to 2.17: compiler crashes

Dimitri Papadopoulos-Orfanos papadopo at www.NOSPAM.fr
Tue Aug 24 09:53:01 GMT 2004


Hi,

I've updated distcc from 2.16 to 2.17 and I'm experiencing crashes.

The crash seems to be related to changes in the client part of distcc 
because:
- I can reproduce the crashes with
   * server distccd 2.16 + client distcc 2.17
   * server distccd 2.17 + client distcc 2.17
- I can't reproduce the crashes with
   * server distccd 2.16 + client distcc 2.16
   * server distccd 2.17 + client distcc 2.16

Our compilation farm runs Red Hat Linux 9 machines. Many of these 
machines are currently offline, which means the new timeout code may be 
triggered often.


I get such errors:

$ make -j20
[...lots of compilations succeed...]
distcc[20324] (dcc_connect_by_addr) ERROR: failed to connect to 
100.100.4.142:3632: No route to host
distcc[20326] (dcc_connect_by_addr) ERROR: failed to connect to 
100.100.4.129:3632: No route to host
distcc[20324] Warning: failed to distribute 
/home/username/aimsalgo/src/public/information/pdf.cc to linux022, 
running locally instead
distcc[20326] Warning: failed to distribute 
/home/username/aimsalgo/src/public/math/gaussj.cc to linux037, running 
locally instead
[...lots of compilations succeed...]
g++ -c  -Wall -W -D_REENTRANT -DCARTO_DEBUGMODE=\"default\" -DAIMS 
-I/home/username/aimsalgo-linux-default/include 
-I/home/username/aimsdata-linux-default/include 
-I/home/username/graph-linux-default/include 
-I/home/username/cartobase-linux-default/include 
-I/home/username/ecat+-linux-default/include 
-I/home/username/ecat-linux-default/include 
-I/home/username/vidaIO-linux-default/include -I/usr/X11R6/include -o 
mesh/voxel2facet.o /home/username/aimsalgo/src/aimsalgo/mesh/voxel2facet.cc
g++ -c  -Wall -W -D_REENTRANT -DCARTO_DEBUGMODE=\"default\" -DAIMS 
-I/home/username/aimsalgo-linux-default/include 
-I/home/username/aimsdata-linux-default/include 
-I/home/username/graph-linux-default/include 
-I/home/username/cartobase-linux-default/include 
-I/home/username/ecat+-linux-default/include 
-I/home/username/ecat-linux-default/include 
-I/home/username/vidaIO-linux-default/include -I/usr/X11R6/include -o 
mesh/meshMerge.o /home/username/aimsalgo/src/aimsalgo/mesh/meshMerge.cc
distcc[20711] ERROR: Connect timeout
distcc[20723] ERROR: Connect timeout
distcc[20724] ERROR: Connect timeout
distcc[20731] ERROR: Connect timeout
distcc[20727] ERROR: Connect timeout
distcc[20736] ERROR: Connect timeout
distcc[20730] ERROR: Connect timeout
distcc[20728] ERROR: Connect timeout
distcc[20729] ERROR: Connect timeout
distcc[20740] ERROR: Connect timeout
make[3]: *** [mesh/mesher.o] Segmentation fault (core dumped)
make[3]: *** Waiting for unfinished jobs....
make[3]: *** [mesh/reducedNeigh.o] Segmentation fault (core dumped)
make[3]: *** [mesh/vertices.o] Segmentation fault (core dumped)
make[3]: *** [mesh/splitting.o] Segmentation fault (core dumped)
make[3]: *** [mesh/surface.o] Segmentation fault (core dumped)
make[3]: *** [mesh/surf2facet.o] Segmentation fault (core dumped)
make[3]: *** [mesh/triangles.o] Segmentation fault (core dumped)
make[3]: *** [mesh/voxel2facet.o] Segmentation fault (core dumped)
make[3]: *** [mesh/meshMerge.o] Segmentation fault (core dumped)
make[3]: *** [mesh/smoothing.o] Segmentation fault (core dumped)
make[2]: *** [all] Error 2
make[2]: Leaving directory 
`/home/username/aimsalgo-linux-default/src/aimsalgo'
make[1]: *** [subdirs] Error 1
make[1]: Leaving directory `/home/username/aimsalgo-linux-default/src'
make: *** [all] Error 2
$


How to debug this? I'm not even sure whether it's distcc or g++ 
segfaulting. Since I don't run out of memory and a build with distcc 
2.16 succeeds, I guess that's distcc failing.

I do get core files in the build directory, but I'm not sure what to do 
with them since the debugger thinks they're not from distcc:

$ gdb /usr/local/distcc/bin/distcc 
aimsalgo-linux-default/src/aimsalgo/core.20711
[...]
warning: core file may not match specified executable file.
Core was generated by `g++ -c -Wall -W -D_REENTRANT 
-DCARTO_DEBUGMODE="default" -DAIMS -I/home/usernam'.
Program terminated with signal 11, Segmentation fault.
[...]
#0  0xbffff0c0 in ?? ()
(gdb) bt
#0  0xbffff0c0 in ?? ()
Cannot access memory at address 0x50ea
(gdb)
$


Any clue? How to debug that?

Best Regards,
Dimitri



More information about the distcc mailing list