[distcc] Working on several distcc enhancements (take 2)

Victor Norman vtnpgh at yahoo.com
Mon Nov 21 13:51:30 GMT 2005

Dan, et al., 
  Recall that I wrote a system in python that uses distcc to provide
  1. load balancing, with multiple servers and multiple users, taking into consideration the loads on the server machines,
  2. multi-os support -- use linux, cygwin, solaris, whatever on your servers,
  3. multiple simultaneous users -- when multiple users are using the  same server farm, they will not step on each others' toes, will not  overuse a single server, etc.,
  4. monitoring -- the system had a application to show which machines were in use in real-time.
  5. auto-discovery of servers.
  I posted this to the list about 12 months ago, in a posting called "A  system that uses distcc to do massively parallel compilations".   It has been in use here at Marconi for all that time, and works nearly  flawlessly.  I've also rewritten it recently in C++, so it is now  easier to install and maintain.  Actually, that re-write is not  quite done, but if there is enough user interest I could get that done  and post it soon.  The system is now called DMUCS -- the  Distributed Multi-User Compilation System.
  Let me know if you are interested in seeing the code posted, in its current form, or only after I've cleaned it up a bit.
  I'm also considering putting it up on freshmeat.org or some other place for OSS.

Daniel Kegel <dank at kegel.com> wrote:  I'm working on the following enhancements to distcc,
all motivated by observing shortcomings in real use
in a demanding environment:

1. gcc-2.95.3 sometimes spins on invalid input.  The user eventually
    aborts the build, but distccd does not then kill the compile job.
    Distccd should kill the compile job on timeout (say, 20 minutes)
    or if the client disconnects.

2. Hung servers make users very, very unhappy,
    and unfortunately, distcc servers tend to hang (or appear to hang)
    much more often than one would like, but not often enough to
    be easy to debug.
    To insulate users from hung servers, there should be
    a simple way to prequalify distcc servers before a build
    run.  I am extending the lsdistcc program I posted
    earlier to actually run a trivial compilation on each
    server; it will only list servers which complete the trivial
    compilation by a deadline (say, 1 second).
    (And, of course, lsdistcc lets you autodiscover
    distcc servers listed in DNS, which makes deploying at
    large sites much easier.)

3. There is no ready-made way to monitor a distcc cluster's health.
    There should be a simple way to measure compile latency
    of all machines in a cluster, and an example crontab script
    showing how to use it to trigger email alerts if a machine goes bad.
    Likewise, distccd should keep statistics of its own health and activity,
    and make them available via HTTP for easy remote access.

4. When a distccd server is full up on active jobs, and other nearby
    servers are not, it's a shame that clients which connect to the
    wrong server have to wait.   Perhaps the server should actively
    turn away compile requests, so the client could do a local compilation
    or try another server.  Or perhaps a (set of redundant) load balancers
    would be appropriate.

5. If Alice has already compiled everything on client A, and Bob starts a job
    to compile the same everything on client B, it's a shame that Bob has to wait;
    perhaps distccd (or a load balancer!) should (carefully) cache results.

6. distccd is a known insecure service.  Even with the IP address access control list,
    Bad Guys could potentially use it to subvert a network.  A tighter access
    control scheme might be appropriate for some sites, e.g. using kerberos
    to restrict access to just the people allowed to submit code to the revision
    control system (who can subvert everything anyhow).

I have preliminary code for the first three, haven't started on the
load-balancing cache yet, and only have a little demo code for
kerberos access control.
I'm being helped on and off by a number of folks, including Thomas
Kho, Jeff Evarts, and Dongmin Zhang.

If I do decide to do a load-balancing cache, I'll probably
start by writing nonblocking versions of the dcc_* networking functions.
Ideally I'd end up with a library that would let you plug
in caching on the client, in the proxy, or on the server.

Just thought I'd post to see if anyone else was using distcc heavily
and was interested in testing any of the above (or even helping code it).
distcc mailing list            http://distcc.samba.org/
To unsubscribe or change options: 

 Yahoo! FareChase - Search multiple travel sites in one click.  
-------------- next part --------------
HTML attachment scrubbed and removed

More information about the distcc mailing list