[distcc] Using distcc with a new build system

Martin Pool mbp at sourcefrog.net
Sun Dec 19 03:44:10 GMT 2004


nadim wrote:
> We think this has to do with the coloring codes 'colorgcc' adds but we don't 
> understand what the problem is here. Is distcc using 'gcc -E' for 
> pre-processing? 'colorgcc' happily mixes STDOUT and STDERR. This is a rather 
> obvious 'colorgcc' error if the pre-processed code is output to STDOUT. If 
> you tell us how the preprocessor is used, we can' see how to modify 
> 'colorgcc' so all distcc users can use it too.

Yes, I think it is a colorgcc bug relating to -E.  If you set 
DISTCC_VERBOSE=1 you can see how it's being invoked.

> 
> B/Sometimes, distcc/our build system  hangs in a few strange ways
> (this is a problem I have fixed in our code, still I wonder why I got the 
> messages before)
> 
> We get this:
> 
> Mon Dec 13 14:18:06 2004
>  31080  Receive     hdi_widgettextinputcreate.c                172.31.4.103[0]
> 
> Mon Dec 13 14:18:07 2004
>  31080  Receive     hdi_widgettextinputcreate.c                172.31.4.103[0]
> 
> Mon Dec 13 14:18:08 2004
>  31080  Receive     hdi_widgettextinputcreate.c                172.31.4.103[0]
> Mon Dec 13 14:18:10 2004
> 
> Mon Dec 13 14:18:11 2004
> 
> 
> but the build system is still waiting for the command to finish!
> 
> Again this was an error in the build system communication with build 
> processes. What I don't understand is why distccmon-text writes 'receive'.

The compiler is recieving compilation results from the server.  (Or, 
possibly, it was killed while in that state.)

> 
> This was also surprising:
> 
> [ali at khemir obigo]$ ps aux | grep cc
> ali       1740  0.0  0.1  1972  516 pts0     SN   12:55   0:00 distcc -O2 
> -Wall -Wshadow -Wpointer-arith -I/devel/q04c/obigo/msf/msf_lib/intgr 
> -I/devel/q04c/obigo/msf/lib 
> -o /devel/q04c/obigo/projects/ali_grisar_runt/out_ali/msf/lib/hdi_widgetbargetvalues.o 
> -c /devel/q04c/obigo/msf/lib/hdi_widgetbargetvalues.c
> ali       1741  0.0  0.0     0    0 pts0     ZN   12:55   0:00 [cc] <defunct>
> ali       2845  0.0  0.1  1944  668 pts1     SN   12:57   0:00 grep cc
> 
> Is distcc waiting for a zombie process here? The first idea we had was that 
> SIG_CHILD wasn't handled properly. But while we were talking about it, 
> thinking the build was dead in the water, we were surprised to see the build 
> complete!!!!

I don't understand the problem.  What's wrong with having a zombie 
present for a short period of time?

> After fixing the communication between PBS and the build processes, I made a 
> test run on my three boxes compiling around 200K lines of code in 500 source 
> files. It build in 45s with my little cluster (3GHz + 1 GHz + 700 MHz) while 
> taking 70s on the 3Ghz only. I ran the test 200 times while looking at the 
> news. I got a surprise though. I monitored the CPU loads and I was surprised 
> to see that one of the boxes didn't compile after some time. distcc did the 
> right thing as it would not use that node on the next build. The problem was 
> that distcc had filled my $TMP directories with 4000 rti file (those produced 
> by gcc). If i removed the files, the compiling just ran smoothly again. This 
> didn't hapend on the other 2 computers. Does any one have an idea why this 
> occurred?

Maybe gcc leaves temp files around if it's interrupted, or maybe you 
were invoking gcc with -preserve-temps (or whatever it's called).


--
Martin


More information about the distcc mailing list