[distcc] Re: upgrading from 2.16 to 2.17: compiler crashes

Thu Aug 26 09:41:48 GMT 2004

Hi,

Still about these distcc 2.17 crashes under heavy load.

I can't get any meaningful information from the core files, the stack 
seems to be messed up. Therefore I tried to run distcc under Valgrind. 
I've replaced distcc by a script that runs distcc under Valgrind:

#!/bin/sh
DISTCC_HOME=/usr/local/distcc-2.17-debug
exec valgrind --tool=memcheck \
	$DISTCC_HOME/bin/distcc g++ ${1+"$@"} \
	2>/tmp/valgrind-distcc-log.$$
exit 1

Unfortunately Valgrind doesn't seem to be of any help here. Here is the 
log from one of the crashed distcc processes:

==10661== Memcheck, a memory error detector for x86-linux.
==10661== Copyright (C) 2002-2004, and GNU GPL'd, by Julian Seward et al.
==10661== Using valgrind-2.1.2, a program supervision framework for 
x86-linux.
==10661== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward et al.
==10661== For more details, rerun with: -v
==10661==
distcc[10661] ERROR: Connect timeout
==10661== Invalid read of size 1
==10661==    at 0x52BFE094: ???
==10661==  Address 0x76 is not stack'd, malloc'd or (recently) free'd
==10661==
==10661== Process terminating with default action of signal 11 
(SIGSEGV): dumping core
==10661==  Access not within mapped region at address 0x76
==10661==    at 0x52BFE094: ???
==10661==
==10661== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 19 from 1)
==10661== malloc/free: in use at exit: 7453 bytes in 168 blocks.
==10661== malloc/free: 258 allocs, 90 frees, 33067 bytes allocated.
==10661== For a detailed leak analysis,  rerun with: --leak-check=yes
==10661== For counts of detected errors, rerun with: -v

So it seems there is some invalid read in distcc, but Valgrind is unable 
to tell where the error happens exactly. It's unable to analyse the 
stack, just like the debugger.

Any clue on how to further debug this?

Regards,
Dimitri