[distcc] upgrading from 2.16 to 2.17: compiler crashes
Dimitri Papadopoulos-Orfanos
papadopo at www.NOSPAM.fr
Tue Aug 24 09:53:01 GMT 2004
Hi,
I've updated distcc from 2.16 to 2.17 and I'm experiencing crashes.
The crash seems to be related to changes in the client part of distcc
because:
- I can reproduce the crashes with
* server distccd 2.16 + client distcc 2.17
* server distccd 2.17 + client distcc 2.17
- I can't reproduce the crashes with
* server distccd 2.16 + client distcc 2.16
* server distccd 2.17 + client distcc 2.16
Our compilation farm runs Red Hat Linux 9 machines. Many of these
machines are currently offline, which means the new timeout code may be
triggered often.
I get such errors:
$ make -j20
[...lots of compilations succeed...]
distcc[20324] (dcc_connect_by_addr) ERROR: failed to connect to
100.100.4.142:3632: No route to host
distcc[20326] (dcc_connect_by_addr) ERROR: failed to connect to
100.100.4.129:3632: No route to host
distcc[20324] Warning: failed to distribute
/home/username/aimsalgo/src/public/information/pdf.cc to linux022,
running locally instead
distcc[20326] Warning: failed to distribute
/home/username/aimsalgo/src/public/math/gaussj.cc to linux037, running
locally instead
[...lots of compilations succeed...]
g++ -c -Wall -W -D_REENTRANT -DCARTO_DEBUGMODE=\"default\" -DAIMS
-I/home/username/aimsalgo-linux-default/include
-I/home/username/aimsdata-linux-default/include
-I/home/username/graph-linux-default/include
-I/home/username/cartobase-linux-default/include
-I/home/username/ecat+-linux-default/include
-I/home/username/ecat-linux-default/include
-I/home/username/vidaIO-linux-default/include -I/usr/X11R6/include -o
mesh/voxel2facet.o /home/username/aimsalgo/src/aimsalgo/mesh/voxel2facet.cc
g++ -c -Wall -W -D_REENTRANT -DCARTO_DEBUGMODE=\"default\" -DAIMS
-I/home/username/aimsalgo-linux-default/include
-I/home/username/aimsdata-linux-default/include
-I/home/username/graph-linux-default/include
-I/home/username/cartobase-linux-default/include
-I/home/username/ecat+-linux-default/include
-I/home/username/ecat-linux-default/include
-I/home/username/vidaIO-linux-default/include -I/usr/X11R6/include -o
mesh/meshMerge.o /home/username/aimsalgo/src/aimsalgo/mesh/meshMerge.cc
distcc[20711] ERROR: Connect timeout
distcc[20723] ERROR: Connect timeout
distcc[20724] ERROR: Connect timeout
distcc[20731] ERROR: Connect timeout
distcc[20727] ERROR: Connect timeout
distcc[20736] ERROR: Connect timeout
distcc[20730] ERROR: Connect timeout
distcc[20728] ERROR: Connect timeout
distcc[20729] ERROR: Connect timeout
distcc[20740] ERROR: Connect timeout
make[3]: *** [mesh/mesher.o] Segmentation fault (core dumped)
make[3]: *** Waiting for unfinished jobs....
make[3]: *** [mesh/reducedNeigh.o] Segmentation fault (core dumped)
make[3]: *** [mesh/vertices.o] Segmentation fault (core dumped)
make[3]: *** [mesh/splitting.o] Segmentation fault (core dumped)
make[3]: *** [mesh/surface.o] Segmentation fault (core dumped)
make[3]: *** [mesh/surf2facet.o] Segmentation fault (core dumped)
make[3]: *** [mesh/triangles.o] Segmentation fault (core dumped)
make[3]: *** [mesh/voxel2facet.o] Segmentation fault (core dumped)
make[3]: *** [mesh/meshMerge.o] Segmentation fault (core dumped)
make[3]: *** [mesh/smoothing.o] Segmentation fault (core dumped)
make[2]: *** [all] Error 2
make[2]: Leaving directory
`/home/username/aimsalgo-linux-default/src/aimsalgo'
make[1]: *** [subdirs] Error 1
make[1]: Leaving directory `/home/username/aimsalgo-linux-default/src'
make: *** [all] Error 2
$
How to debug this? I'm not even sure whether it's distcc or g++
segfaulting. Since I don't run out of memory and a build with distcc
2.16 succeeds, I guess that's distcc failing.
I do get core files in the build directory, but I'm not sure what to do
with them since the debugger thinks they're not from distcc:
$ gdb /usr/local/distcc/bin/distcc
aimsalgo-linux-default/src/aimsalgo/core.20711
[...]
warning: core file may not match specified executable file.
Core was generated by `g++ -c -Wall -W -D_REENTRANT
-DCARTO_DEBUGMODE="default" -DAIMS -I/home/usernam'.
Program terminated with signal 11, Segmentation fault.
[...]
#0 0xbffff0c0 in ?? ()
(gdb) bt
#0 0xbffff0c0 in ?? ()
Cannot access memory at address 0x50ea
(gdb)
$
Any clue? How to debug that?
Best Regards,
Dimitri
More information about the distcc
mailing list