[distcc] Re: keeping localhost busy

Tue Sep 30 03:43:06 GMT 2003

On 29 Sep 2003, Scott Lystig Fritchie <nospam at snookles.com> wrote:
> >>>>> "mp" == Martin Pool  <mbp at samba.org> writes:
> 
> >> On 26 Sep 2003, Jeff <rustysawdust at yahoo.com> wrote:
> >> Last month there was an interesting thread entitled "distcc over
> >> slow net links". I have a similar problem in that two of the
> >> "animals" on my farm have very old CPUs (although they are on a
> >> local 100baseT network).  [...]
> 
> I have that same situation: distcc servers are heterogenous hardware,
> some being much slower than others.  AFAIK, the best thing you can do
> is to put the slowest machines at the end of your DISTCC_HOSTS list.

The distcc scheduler could probably do better here: at the moment,
with n hosts and n jobs it will usually run one on each.  Assume all
of them have one CPU, and the first machine is four times faster than
the last.  It would probably be better to run two tasks on the fast
machine, and none on the last.  

We ought to only use the slowest machines when they are faster than
the fractional MIPS that we would get from 'overloading' a faster
machine, or when all of the faster machines are running their
configured maximum number of jobs.  

But even then, in some cases it might be better to just leave the job
idle until a fast CPU slot is available.  On the other hand, if the
slow machine is slow slow that it is often better to wait rather than
using it, perhaps you should not use it at all.

Clearly we could schedule heterogenous better than we do at the
moment, at least with perfect knowledge.  It is not practical to know
what jobs will arrive in the future, or exactly how fast each machine
is, allowing for competing processes and varying job difficulty.  

So the question is, is there a simple scheduler that is a better
approximation than the current one that is practical to build?
Probably.

I have to say I hesitate to e.g. run redundant jobs as Timothee and
Jeff suggested until the scheduler is as smart as it can be.

> Many months ago I wrote a TCP load balancer specifically for use with
> distcc.  It gives jobs to the fastest distcc server that is currently
> idle.  Hosts are configured in fastest-to-slowest order.  Using the
> balancer, several developers in the same office get fairest access to
> the fastest distcc servers that are available that instant, rather
> than all developers using the same "DISTCC_HOSTS='box1:4 box2:4 box3:2
> ...'"  and grinding box1 into dust while leaving the others idle.  It
> also avoids naive round-robin assignment where a job is given to slow
> "box8" when a faster "box2" is currently idle.
> 
> See http://www.snookles.com/erlang/tcpbalance/ for details.

Thanks, I added a link to the web site.

> mp> Timothee suggested killing the job on B and re-running it on
> mp> localhost, but for at least this case it would be wasteful because
> mp> B is as fast as localhost.  For C++ code, transit time is
> mp> relatively small.
> 
> This is the path to madness ... or the path to a great deal of
> complexity: e.g. keeping track of past compilation times for certain
> files to know whether it would be more profitable to abort the remote
> compilation and restart it locally.  Ick.
> 
> mp> I think the real problem here is that recursive Make is harmful.
> mp> The correct fix would be for Make to start additional jobs while
> mp> it is waiting for B.
> 
> I agree: recursive make is harmful.  That is expressly the reason why
> I have been using SCons, http://www.scons.org/, to replace a *very*
> large recursive Make build scheme (many millions of lines of C & C++
> code).  SCons builds a single dependency tree and can walk it in
> parallel and keep multiple CPUs/distcc backends busy regardless of how
> the source is laid out.  I understand (but have not verified) that
> Boost Jam's syntax is Make-like but creates a single dependency tree.
> You can even create a single dependency tree using plain Make (though
> it's difficult to do well).  IMO, if you don't have a global view of
> dependencies, you won't be able to fully exploit parallelism of "make
> -j", or whatever tool you're using.

Yes, I think that is the best thing to do. 

What I was trying to say before is that if you have several related
projects using Make, you may not have a full dependency graph, but you
may know at at a very coarse level where the dependencies are *not*.
Building independent subtrees in parallel is a cheap and nasty way to
reduce the damage caused by recursive make.  

Switching to something like Scons is even better.

> -Scott

By the way, I liked your Judy/Erlang paper.

-- 
Martin