[distcc] Contributing to distCC / Massively parallel compilations

Daniel Kegel dank at kegel.com
Tue Dec 14 17:58:34 GMT 2004


Victor Norman wrote:
> If you look at that previous posting, you'll see my goals for the project,
> which include (as I recall):
> 
> o having a heterogenous system (solaris, linux, etc.)
> o having the system be load-balancing.
> o being able to add and remove hosts from the compilation according to the
> machine's load average, whether or not it is in use as a desktop, etc.
> o giving out the fastest machines first, so that compilations are the fastest
> possible.
> o supporting many compilations simultaneously from multiple machines.
> 
> These goals have all been met by my implementation.
> 
> Now, here is the crucial part that may interest you, Assaf and friends: the
> code is all written in python, which is a wonderful language, IMO.  But, as Dan
> Kegel has suggested, the system may be more widely used, and easier to install
> if it were written in C.  I agree with him.
> 
> So, perhaps you would like to take this python code and rewrite it as C code. 

Yes, I think something like that would be a good idea!
But rather than rewriting exactly what Victor has done,
we might want to shuffle it around a bit and fit it into the
existing distcc protocol (or make it part of the next version
Martin has been working on).

> In some cases, that will be no small task.  A nice line in python like this:
> 
> random.shuffle(avail_cpu_tiers[t]) 
> 
> would probably require hundreds of lines of C code...

Nah, probably only ten lines :-)

> o host-server: the main server.  This code listens on 4 TCP ports:

IMHO a single port would be better for various reasons.

> Next: how distcc comes into play:
> 
> In your Makefile/SConstruct/Construct file, you replace 
> 
> CC = gcc
> 
> with
> 
> CC = gethost distcc gcc
> 
> "gethost" is a python script that connects to the host-server, gets a host from
> the server, puts it in the DISTCC_HOSTS environment variable, and then runs
> "distcc gcc <args>".  Thus, distcc gets its list of hosts from the environment
> variable.  What I like about this solution is that it required no changes to
> distcc at all.

But given that we have a crew of people ready to work, IMHO it might be
better to incorporate some of the work into distcc itself.
And as I mentioned earlier, to match the distcc paradigm, we might
want to consider the 'central server' really a 'proxy server', and
allow several of them in the system.

> So: summary: Here are some items you might do/research:
> 
> o rewrite the system in C/C++.

For maximal portibility, C is probably the right choice.

- Dan


More information about the distcc mailing list