[distcc] garbage transferred instead of preprocessed source

Martin Pool mbp at sourcefrog.net
Wed Apr 12 00:13:53 GMT 2006

On 11/04/2006, at 7:27 PM, Zdenek Behan wrote:

> Hi,
> I encountered a very strange problem with distcc. Let me explain:
> I have 2 machines (both gentoo). One is i686 (fast) and the other  
> is ppc (slow). I have a working(tested) ppc crosscompiler on i686  
> and native compiler on ppp [same versions - 3.4.6]
> I emerged exactly the same version of distcc on both (tried  
> multiple versions), and ran them with:
> (slow machine)
> PATH="/usr/powerpc-unknown-linux-gnu/bin:/usr/powerpc-unknown-linux- 
> gnu/gcc-bin/3.4.6/" /usr/bin/distccd -p 55555 -N 10 --allow  
> --listen= --no-detach --user distcc -- 
> log-stderr
> (fast machine)
> PATH="/usr/powerpc-unknown-linux-gnu/bin:/usr/powerpc-unknown-linux- 
> gnu/gcc-bin/3.4.6/" /usr/bin/distccd -p 55555 -N 10 --allow  
> --listen= --no-detach --user distcc -- 
> log-stderr
> I put both hosts (201, 15) into /etc/distcc/hosts.
> Daemons work fine until i try to actually compile something.
> --
> #include <stdio.h>
> int main( int argc, char ** argv )
> {
>         printf("Hello world!\n");
>         return 1;
> }
> --
> I created simple hello.c to demonstrate. The command used is:
> distcc powerpc-unknown-linux-gnu-gcc -c -o hello.o hello.c
> Now i have 4 variants of using distcc. Fast to Fast (localhost)  
> Fast to Slow, Slow to Fast and Slow to Slow.
> When doing any of the pointless variants (F->S, F->F, S->S),  
> distccd creates /tmp/distccd_key.i on the local machine containing  
> preprocessed source and within fraction of a second, it's done.  
> Verbose distccd output says something like:

When you say "/tmp/distccd_key.i " I presume the "key" is actually  
some random hex characters?
> distccd[12392] (dcc_check_client) connection from
> distccd[12392] compile from hello.c to hello.o
> distccd[12392] (dcc_r_file_timed) 16695 bytes received in  
> 0.001372s, rate 11883kB/s
> distccd[12392] (dcc_collect_child) cc times: user 0.170000s, system  
> 0.040000s, 501 minflt, 1030 majflt
> distccd[12392] powerpc-unknown-linux-gnu-gcc hello.c on localhost  
> completed ok
> distccd[12392] job complete
> In the last variant (Slow -> Fast), it creates /tmp/distccd_key.i  
> as well, however, what it contains can hardly be compared to  
> preprocessed source. It's basically a binary file containing a  
> random dump of some disk data. I have a copy of such a file in case  
> anyone wants to see it, but there's not much to see, really.
> Naturally this fails with megabytes long error log going as following
> /tmp/distccd_237f6a6d.i:122: error: stray '\242' in program
> /tmp/distccd_237f6a6d.i:122: error: stray '\160' in program
> /tmp/distccd_237f6a6d.i:122: error: stray '\195' in program
> /tmp/distccd_237f6a6d.i:122: error: stray '\242' in program
> ...
> Output looks like this:
> distccd[11797] (dcc_check_client) connection from
> distccd[11797] compile from hello.c to hello.o
> distccd[11797] (dcc_r_file_timed) 16695 bytes received in  
> 0.002057s, rate 7926kB/s
> distccd[11797] (dcc_collect_child) cc times: user 0.425935s, system  
> 0.989849s, 889 minflt, 0 majflt
> distccd[11797] powerpc-unknown-linux-gnu-gcc hello.c on localhost  
> failed
> distccd[11797] job complete
> Notice the size file size actually being the same. It's the content  
> that is scrambled, for reason completely unknown to me. Neither  
> side does crash, only report the huge error log and then go on.
> Just for the record, distcc is built on both systems natively with  
> native compiler (same version - 3.4.6), glibc versions are not the  
> same, but i can hardly imagine that being a problem.
> Can anyone help me, or at least point me to where i should be  
> looking for the problem? This seems to be purely distcc issue, as  
> it never gets to actually compiling anything, besides, i believe my  
> crosscompiler setup is correct.
> My first guess was endianity swap (ppc is big endian), but since  
> there is some totally out of place text mixed up with garbage  
> binary data in the temporary file, i think that's not the solution.  
> So now i'm left with being completely clueless, and any help will  
> be appreciated.

I suspect you have a kernel bug on the ppc machine which is making it  
transmit the wrong data across the network.  To check it, please run  
on the ppc host

   tcpdump -w distcc.pcap 'tcp port 2622'

and compile a file.  Then stop tcpdump and post the capture file to  
me, or have a look at it in ethereal if you like.  I suspect we will  
see garbage in the DOTI field because sendfile isn't working  
properly.  What kernel are you running there?  Do you have a known  
good one you could try?


