[distcc] Using distcc with a new build system
Martin Pool
mbp at sourcefrog.net
Sun Dec 19 03:44:10 GMT 2004
nadim wrote:
> We think this has to do with the coloring codes 'colorgcc' adds but we don't
> understand what the problem is here. Is distcc using 'gcc -E' for
> pre-processing? 'colorgcc' happily mixes STDOUT and STDERR. This is a rather
> obvious 'colorgcc' error if the pre-processed code is output to STDOUT. If
> you tell us how the preprocessor is used, we can' see how to modify
> 'colorgcc' so all distcc users can use it too.
Yes, I think it is a colorgcc bug relating to -E. If you set
DISTCC_VERBOSE=1 you can see how it's being invoked.
>
> B/Sometimes, distcc/our build system hangs in a few strange ways
> (this is a problem I have fixed in our code, still I wonder why I got the
> messages before)
>
> We get this:
>
> Mon Dec 13 14:18:06 2004
> 31080 Receive hdi_widgettextinputcreate.c 172.31.4.103[0]
>
> Mon Dec 13 14:18:07 2004
> 31080 Receive hdi_widgettextinputcreate.c 172.31.4.103[0]
>
> Mon Dec 13 14:18:08 2004
> 31080 Receive hdi_widgettextinputcreate.c 172.31.4.103[0]
> Mon Dec 13 14:18:10 2004
>
> Mon Dec 13 14:18:11 2004
>
>
> but the build system is still waiting for the command to finish!
>
> Again this was an error in the build system communication with build
> processes. What I don't understand is why distccmon-text writes 'receive'.
The compiler is recieving compilation results from the server. (Or,
possibly, it was killed while in that state.)
>
> This was also surprising:
>
> [ali at khemir obigo]$ ps aux | grep cc
> ali 1740 0.0 0.1 1972 516 pts0 SN 12:55 0:00 distcc -O2
> -Wall -Wshadow -Wpointer-arith -I/devel/q04c/obigo/msf/msf_lib/intgr
> -I/devel/q04c/obigo/msf/lib
> -o /devel/q04c/obigo/projects/ali_grisar_runt/out_ali/msf/lib/hdi_widgetbargetvalues.o
> -c /devel/q04c/obigo/msf/lib/hdi_widgetbargetvalues.c
> ali 1741 0.0 0.0 0 0 pts0 ZN 12:55 0:00 [cc] <defunct>
> ali 2845 0.0 0.1 1944 668 pts1 SN 12:57 0:00 grep cc
>
> Is distcc waiting for a zombie process here? The first idea we had was that
> SIG_CHILD wasn't handled properly. But while we were talking about it,
> thinking the build was dead in the water, we were surprised to see the build
> complete!!!!
I don't understand the problem. What's wrong with having a zombie
present for a short period of time?
> After fixing the communication between PBS and the build processes, I made a
> test run on my three boxes compiling around 200K lines of code in 500 source
> files. It build in 45s with my little cluster (3GHz + 1 GHz + 700 MHz) while
> taking 70s on the 3Ghz only. I ran the test 200 times while looking at the
> news. I got a surprise though. I monitored the CPU loads and I was surprised
> to see that one of the boxes didn't compile after some time. distcc did the
> right thing as it would not use that node on the next build. The problem was
> that distcc had filled my $TMP directories with 4000 rti file (those produced
> by gcc). If i removed the files, the compiling just ran smoothly again. This
> didn't hapend on the other 2 computers. Does any one have an idea why this
> occurred?
Maybe gcc leaves temp files around if it's interrupted, or maybe you
were invoking gcc with -preserve-temps (or whatever it's called).
--
Martin
More information about the distcc
mailing list