[distcc] distcc over slow net links

Mon Aug 25 13:00:25 GMT 2003

The slow machines are at the end of the list. When there's a large build
going, all the machines get stuff scheduled. While the fast ones will do
most of the files (say 80%), the slower ones will do 20%.

So it's likely that the last batch of files to compile, all the fast
machines will be done, build system waiting to start linking, and one or
two objects remaining being compiled on the slow machines.

I realize now that the interesting speedup situation is when there are few
files to rebuild, as it's usually when you are actively working on the
source and want to wait as little as possible. When you start a large full
rebuild, you don't really care about saving an extra 15 secs, you just go
get more java :)

I could go around the problem by using the local cluster when it's a small
rebuild (therefore avoiding a wait on slow remote machine), and throwing
the remote cluster in when I do a full build only. That's more work for
the end user though.

I agree with Martin that 'in general' distcc should gather information
about the speed of the various machines and use that when doing work
distribution heuristics.

When you send the same compile to several machines, you will likely want
to configure things so that you don't eat resources for other users. But
I'm not sure it's a big issue, it's mostly a social thing between
developers ... afaik it'd be fairly easy to "DoS" a distcc cluster with
the current distcc implementation.

TTimo

On Mon, 25 Aug 2003 22:14:54 +1000
Martin Pool <mbp at sourcefrog.net> wrote:

> On Mon 2003-08-25, Dag Wieers wrote:
> 
> > The way I thought Timothee meant it, was like this:
> > 
> > 	Whenever a host has finished processing jobs and distcc (make)
> > 	is out of jobs but still waiting for results on some jobs. It
> > 	could resend the (already preprocessed) jobs to any idle
> > 	machines and use the result from the fastest machines that can
> > 	deliver it. (and finish the other ones)
> > 
> > Of course this means that the distcc instances have to have some
> > shared knowledge about what jobs are still ongoing and access to
> > preprocessed output.
> > 
> > But I like the idea, especially in environments where some of the
> > servers in your cluster are sometimes used for heavy duty. If you're
> > waiting, you might as well bet on another horse (especially when it is
> > at no extra cost)
> 
> But there is a cost: other jobs which might arrive in the future or
> which might be sent by another user will not be able to use those
> servers.  So we want to do this only when the possible gain is so great
> as to justify the risk.  
> 
> One can imagine a naive algorithm wasting a lot of time.
> 
> To turn the question around: why don't we just schedule the job on the
> nearby machine in the first place?
> 
> -- 
> Martin
>