[distcc] avoiding work

Martin Pool mbp at samba.org
Fri Jan 30 00:18:45 GMT 2004


On 29 Jan 2004, "Benjamin S. Scarlet" <bsfdccl0 at greynode.net> wrote:
> It seems to me that it _might_ be possible to get more for less by
> hooking distcc into gcc differently, at the expense of not distributing
> the assembly.
> 
> The actual gcc binary, after all, is just a driver the only purpose of
> which is to process arguments.  It, in turn, invokes the individual
> compiler passes based on its arguments.  distcc goes to considerable
> trouble to separate out the compilation and assembly phase from
> preprocessing before it the assembly and?) linking after it.  The gcc
> driver is not only better at that argument processing and pass handling
> task, it defines it.
> 
> So, if it would be acceptable not to distribute the assembly pass (how
> much does it cost?):
> What if distcc were hooked in not at the gcc/g++ level, but at the
> cc1/cc1plus level behind it?  It could then invoke cc1 in turn on the
> remote machine, doing for cc1 what it does for gcc now, but with less
> work.

Where is the much less work?

> *) gcc supports several facilities for such a substitution (either the
> GCC_EXEC_PREFIX or COMPILER_PATH environment variables, or the -B flag).

This might be more intrusive to install than the current masquerade
setup.

> *) If distcc were invoked by gcc in place of cc1, the other passes
> wouldn't be distcc's problem -- it would get precisely the input it
> wants without having to think so much.

No, it would still have to work out whether the compiler could be
remote.  For example, invocations which write out profile data cannot
be remote.  The amount of argument processing would be smaller though.

> *) In this way, much less argument processing would be necessary (I
> think there're only one or two flags at the cc1 level which should
> prohibit distribution -- most of the weird output cases happen at a
> higher level). More compilations would be distributed, like
> gcc -xc++ foo.c -o foo.o
> or
> gcc foo.c -o foo
> *) As a possible side benefit, the number of process invocations per
> file would also decrease: Now, it looks like
> [distcc [gcc [cpp]] -remote-> [gcc [cc1] [gas] ] -local-> ... ]
> after such a change it would look like
> [gcc [cpp] [distcc -remote-> [cc1]] [gas]]

I don't think the process overheads are sufficiently expensive to
matter.  fork is cheap[0].

I don't think the gcc wrapper is very expensive either.  Or have you
measured it?

I agree that this would be another possible way to write it.  You
lose a bit (or maybe a lot) of ease of installation, but the argument
scanner can be simpler.  You might also lose the possibility of using
compilers other than gcc, but I'm not sure that anyone has ever
succeeded in doing that so it might be moot.

I don't see any reason to rewrite it now though.  Do you think there
is one?

> It seems to me that if assembly is quick enough to do locally, such a
> rearrangement might be a win.

Assembly can be relatively cheap.  On the other hand, the assembler
input can be larger than the object code, and in some cases this might
be decisive.

-- 
Martin 

[0] Well, at least on Linux.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.samba.org/archive/distcc/attachments/20040130/8c77c2cf/attachment.bin


More information about the distcc mailing list