[distcc] Using distcc with a new build system

nadim nadim at khemir.net
Sat Dec 18 11:42:06 GMT 2004


There seems to be little traffic on this mailing list and more on the news. If 
this is not the appropriate way to communicate with the distcc community, 
please let me know.

I'll start with giving you some context information that will make it easier 
to follow. I (Nadim Khemir) and Anders Lindgren have written a "rather" 
advanced build system which is close to cons/scons/cook etc... The system is 
written in perl and is fun to use (some of you, who have been on the cons 
mailling list, might remember me).

Because of the lack of "real" threads in perl, we implemented the 
parallelization of the build with processes which communicate through 
socketpair. we consider the parallelization code  to be prototype. But it's 
short and not very complex so it "should" be OK.

My test setup at home is 3 linux boxes but I'd like to use the build system 
with distcc at my job with a compilation farm and and heterogeneous machine 
park. I am, for example, interrested to know what to expect when many users 
compile on the farm at the same time. Today we have 300 developers with P4 
2GHz boxes, They don't compile all the time but when they do, they'd like to 
have some serious CPU available. I wonder if so many people sharing a compile 
farm of , say 20-30  boxes, will see it as faster or slower than their own 
box. Their on computer is running windows and installing cygwin (which is 
slow) and gcc localy on 300 computer seems to me like an administrative 

A/ We use colorgcc:
We sometimes get this output from distcc:

Node './lib/msm_terminated.o':
distcc -O2 -Wall -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align 
-Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes 
-Wmissing-declarations -Wredundant-decls -Wnested-externs -Winline -D_MODULE  
-g -pg -ffunction-sections  -I/devel/q04c/obigo/msf 
-I/devel/q04c/obigo/msf/msf_lib -I/devel/q04c/obigo/msf/msf_lib/config 
-I/devel/q04c/obigo/msf/msf_lib/export -I/devel/q04c/obigo/msf/msf_lib/intgr  
-o /devel/q04c/obigo/msf/out_nadim_2/lib/msm_terminated.o 
-c /devel/q04c/obigo/msf/lib/msm_terminated.c
cc1: warning: -ffunction-sections disabled; it makes profiling impossible
/home/nakh/tmp/distccd_ae3078de.i:1: error: stray '\33' in program
/home/nakh/tmp/distccd_ae3078de.i:1: error: parse error before '[' token
/home/nakh/tmp/distccd_ae3078de.i:1:3: invalid suffix "m" on integer constant
/home/nakh/tmp/distccd_ae3078de.i:1: error: syntax error at '#' token
/home/nakh/tmp/distccd_ae3078de.i:2: error: stray '\33' in program
/home/nakh/tmp/distccd_ae3078de.i:2:3: invalid suffix "m" on integer constant
/home/nakh/tmp/distccd_ae3078de.i:2: error: syntax error at '#' token
/home/nakh/tmp/distccd_ae3078de.i:3: error: stray '\33' in program
/home/nakh/tmp/distccd_ae3078de.i:3:3: invalid suffix "m" on integer constant

We think this has to do with the coloring codes 'colorgcc' adds but we don't 
understand what the problem is here. Is distcc using 'gcc -E' for 
pre-processing? 'colorgcc' happily mixes STDOUT and STDERR. This is a rather 
obvious 'colorgcc' error if the pre-processed code is output to STDOUT. If 
you tell us how the preprocessor is used, we can' see how to modify 
'colorgcc' so all distcc users can use it too.

B/Sometimes, distcc/our build system  hangs in a few strange ways
(this is a problem I have fixed in our code, still I wonder why I got the 
messages before)

We get this:

Mon Dec 13 14:18:06 2004
 31080  Receive     hdi_widgettextinputcreate.c      [0]

Mon Dec 13 14:18:07 2004
 31080  Receive     hdi_widgettextinputcreate.c      [0]

Mon Dec 13 14:18:08 2004
 31080  Receive     hdi_widgettextinputcreate.c      [0]
Mon Dec 13 14:18:10 2004

Mon Dec 13 14:18:11 2004

but the build system is still waiting for the command to finish!

Again this was an error in the build system communication with build 
processes. What I don't understand is why distccmon-text writes 'receive'.

This was also surprising:

[ali at khemir obigo]$ ps aux | grep cc
ali       1740  0.0  0.1  1972  516 pts0     SN   12:55   0:00 distcc -O2 
-Wall -Wshadow -Wpointer-arith -I/devel/q04c/obigo/msf/msf_lib/intgr 
-o /devel/q04c/obigo/projects/ali_grisar_runt/out_ali/msf/lib/hdi_widgetbargetvalues.o 
-c /devel/q04c/obigo/msf/lib/hdi_widgetbargetvalues.c
ali       1741  0.0  0.0     0    0 pts0     ZN   12:55   0:00 [cc] <defunct>
ali       2845  0.0  0.1  1944  668 pts1     SN   12:57   0:00 grep cc

Is distcc waiting for a zombie process here? The first idea we had was that 
SIG_CHILD wasn't handled properly. But while we were talking about it, 
thinking the build was dead in the water, we were surprised to see the build 

After fixing the communication between PBS and the build processes, I made a 
test run on my three boxes compiling around 200K lines of code in 500 source 
files. It build in 45s with my little cluster (3GHz + 1 GHz + 700 MHz) while 
taking 70s on the 3Ghz only. I ran the test 200 times while looking at the 
news. I got a surprise though. I monitored the CPU loads and I was surprised 
to see that one of the boxes didn't compile after some time. distcc did the 
right thing as it would not use that node on the next build. The problem was 
that distcc had filled my $TMP directories with 4000 rti file (those produced 
by gcc). If i removed the files, the compiling just ran smoothly again. This 
didn't hapend on the other 2 computers. Does any one have an idea why this 

Our build system knows how to distribute build with the help of a distributed 
filesystem. I saw that you had not figures to show (in the FAQ). I could 
provide some if you wish.

Cheers, Nadim and Anders.

PS: Our monitoring tool :-)
#! /usr/bin/perl -w

        print scalar(gmtime()) . "\n" ;
        print `distccmon-text` ;
        sleep(1) ;

More information about the distcc mailing list