[distcc] Re: distcc and wine

Tue Jan 14 03:06:02 GMT 2003

On 14 Jan 2003, Paul Millar <paulm at astro.gla.ac.uk> wrote:

> I've just tried distcc and its wonderful!

Thanks. :-)

> I haven't benchmarked it yet,
> but subjectively (from watching the nodes' load-avr) distcc seems to give
> *much* more even loading that just running OpenMOSIX.  If anyone has more
> than one machine at their disposal, I'd strongly recommend investigating
> distcc.
> 
> On Thu, 9 Jan 2003, Martin Pool wrote:
> > I haven't tried it myself but I would have expected the short, intense
> > jobs generated by compilation to be a problem [for OpenMOSIX]
> 
> Yes, I agree.  But, I think OM still has a role to play.  Running OM
> underneath distcc should help improve the mean performance (in a
> heterogeneous cluster).  Whenever a faster node is unloaded whilst a
> slower node is busy compiling (and this situation lasts for any length of
> time) OM should migrate that process to the faster node, speeding up
> compilation.  That might occur just before linking, for example.

The argument against it is this: compiler processes have a large
working set (>20MB, say, though it varies), do a lot of IO, and only
run for a few seconds.  On a 100Mbps network migration of a running
process will take a few seconds, after which time many other processes
may have started and stopped, so the load pattern may be very
different.  I think it may be difficult for OpenMOSIX to react fast
enough to handle the condition you describe.

I have not personally benchmarked OpenMOSIX for this, so you should
take the above with a pinch of salt.

I'd very much like to work with somebody with a >8 machine cluster to
run distcc benchmarks and comparisons to SSI clusters.

The "grainy" nature of the workload makes it difficult for distcc to
schedule optimally, although it should improve somewhat in the next
few months.

This paper describes good results using MOSIX for software building:

  http://www.mosix.cs.huji.ac.il/ftps/usenix.ps.gz

Thought they did spend USD $390,000, which is more than I can manage.  :-)

> I guess the improvement will depend strongly on the composition of the
> cluster and the code you're compiling.  The improvement (from running OM)
> might be marginal in certain cases, but I don't think it would make things
> worse.

Well, it may use network bandwidth and CPU cycles for migration or
overhead that might be better spent on either cc or distcc.

It's great that there is good free single-image clustering software.
The point of distcc is just that you can distribute the particular
task of compilation with a much simpler and less intrusive program.

-- 
Martin