[distcc] PCH Headers and distcc (again!)

Mon May 26 05:39:39 GMT 2008

Hi!

I also spent some time getting PCH to work with distcc and I think I've
got a fairly complete implementation here. We're running a local compile
farm with my PCH patch, pretty successful so far. Problem: I still don't
have an OK from management to release the code :(  I'll try to resolve
this till the end of the week.

Here's a brief description of what I did and what the results were.

First of all PCHs can be huge (typically more than 100MB for our code),
so caching them on the compile server is required. I did that by
identifying a PCH by its MD5 sum - the PCH name on the server is simply
(MD5-HEX).h.gch. Files are compiled locally with -fpch-preprocess, the
distcc client looks for the PCH pragma, computes the MD5 and replaces
the pragma line with a reference to the MD5-named PCH. After that, I
added an additional protocol ping-pong, where the server may ask the
client to send over the PCH, in case it is not already cached on the
server. That works well when client and server are running the same
compiler binary.

The next problem (already mentioned by Jamie) is the PCH binary
compatibility check. In our case were cross compiling to some target
platform X, servers are Linux, most clients are Windows machines - so no
way to get the same compiler binary running on server and client. (I've
tried a hack that simply patches the checksum in the PCH -> compiler
crash, even though both compiler binaries are compiled from the same
source, only for different host platforms).

My solution is not nice, but works. The client enforces that PCHs are
compiled on the server (fallback disabled). When creating the
preprocessed file for the PCH compilation, the option -dD is added and
all #defines and #undefs are captured. The client creates a local proxy
PCH file which contains the PCH pragma and _all_ #defines and #undefs
from the the preprocessed PCH, _except_ for those specified on the
command line and built-in macros (these can be identified by looking at
the previous line in the PP output). The proxy PCH file is placed in a
directory that is scanned for includes before the directory containing
the original PCH include file. The C++ files are then compiled without
the -fpch-preprocess option.

Maybe somebody has a smarter idea of how to handle the PCH binary
compatibility problem.

I found that PCH improves the build performance to some degree. However,
getting this to work in the most general case (where server is platform
A, client is platform B, target is platform C) is really tricky and
requires considerable changes to the distcc code base. Even if I can
make my changes public, I would not be surprised if the distcc
maintainers vote against it.

Kind regards,
Sascha

Jamie Kirkpatrick wrote:
> Hi all
>
> This topic hasn't be raised for some time, but having spent a whole
> weekend playing with distcc, only to be massively disappointed by this
> stumbling block, I think its worth revisting.
>
> I spent a couple of days setting up various cross-compilers etc on the
> boxes I have at home to see if I couldn't get some kind of compiler
> farm on the go.  After much playing around I finally got it all
> running, but then spent a further day trying to work out why I could
> not get distcc to improve my compile times at all.
>
> In the end the answer turned out to be that I was using PCH headers
> which are disabled by the flags used to preprocess code by distcc. 
> Bummer.  So next I tried to see if I could get the two to play nice
> together.  There is a suggestion in an earlier post that one should
> use the "-fpch-preprocess" flag to force the inclusion of a special
> pragma that tells GCC how to use the PCH header.  The problem, as also
> previously noted, is that the flag tells GCC to include the PCH file
> when dealing with the preprocessed code.  Only the PCH header is not
> on the target machine.
>
> I wasn't going to give up there however.  I know from reading
> elsewhere that Apple allow the use of PCH headers in combination with
> distcc and I wanted to know how.  So, after some time spent poring
> over their modified version of distcc, as well as the GCC code I know how.
>
> They have altered distcc to transfer the PCH headers to the target
> machine, and then in turn taught GCC how to read the files from
> special temporary locations.  The solution allows you ot get the
> benefits of PCH headers as well as using distcc.
>
> The question I have is whether or not we can find some way to
> integrate the code that does the transfering, and find some other way
> to get GCC to read the files?  This would be of massive benefit to
> projects where PCH files are in use already, and the idea of switching
> away to use distcc does not appeal.
>
> The only other snag I've hit is that GCC does some checksumming of the
> PCH files, working out if they are compiled with the same version,
> which means you need *exactly* the same binary as the system that
> compiled it.  Again, not ideal but workable for those that need it.
>
> Thoughts would be welcome.  I spent a long time getting to this point
> and it seems a shame to let my findings go to waste: most of the code
> for a fix is already written so it would make sense to use it.
>
> -- 
> Jamie Kirkpatrick
> 07818 422311