[distcc] Re: separating compiler and assembler: benchmarks

Tue Mar 4 00:53:12 GMT 2003

On  3 Mar 2003, "Stuart D. Gathman" <stuart at bmsi.com> wrote:

> Gcc writes assembler output after each toplevel item.  Assembler output 
> is produces after each static of global variable (initialized or not), 
> after each implemented class for C++ (writes vtbl and auto methods), and 
> after each function definition.  Unless your C source consists of one 
> gigantic function, there is plenty of pipelining of input / output.

I tried compiling a file under strace with gcc-3.2, and it does in
fact seem to wait right until the end of execution before starting to
write out the assembly code.

I tried with and without -pipe, and I also tried preprocessing
separately.

  strace -f gcc-3.2 -pipe -S distcc.c
  strace -f gcc-3.2 -S distcc.i
  strace -f gcc-3.2 -pipe -S distcc.i

I also tried gradually feeding gcc from a fifo with assembly sent to
stdout, and observed that there was no output until the input file was
closed.  

Perhaps there are buffering effects that just make it look like it's
waiting until the end, when in fact for a larger file it would come
out earlier?

Perhaps it's different in older versions of gcc?

> Which is why I tested it.  There is still plenty of "meat" in non
> trivial C source.  There is a 2x speedup to be gained (without using
> -j2), and that is with the local assembler on a machine that is 10x
> slower.

By the way, what networking gear are you using?

So there are a few related possibilities:

 1 - Piping source into cc1
   1a - cpp can overlap with network transit
   1b - cc1 can overlap with network transit

 2 - Piping assembly from cc1 into as
   2a - cc1 can overlap with network transit
   2b - as can overlap with network transit

1a and 2a require a protocol change to allow distcc to start sending a
file before the length is known.  Not too hard.

1b can be added back in fairly easily, and it would be good to do so.

2 requires a change to do assembly locally, either by hooking into
gcc's path or (perhaps cleaner) just passing the .s file to gcc.

1a and 2b are probably not very important because cpp and as are quite
cheap.

2a is only any good if cc1 starts producing assembly a significant
time before it finishes, which seems not to be the case in my
experiment above but may be true in some cases.

I think Stuart's experiment with hooking cc1 is measuring both 1 and
2.  It might be good to try to measure only 2 before putting it into
distcc.

> Call me picky, but I expect distcc to perform at its best even without 
> parallel make.

Yes, that would be good.  It can only be an improvement when the
client is very slow relative to another machine, but that's not an
uncommon case.

-- 
Martin