[distcc] Re: Simple autodiscovery of distcc servers

Jeremy Glazman JGlazman at itsgames.com
Tue Aug 23 17:27:28 GMT 2005

Sure Tobias, I can't exactly post my entire scripts as I wrote them for my
employers, but I'll gladly explain the process. There's a lot going on so it
looks kind of complicated, but really it's fairly simple. The goal is a
compiler farm, where each developer has access to a group of machines
devoted to speedy compiling, but several developers using the farm at once
won't step on each other.

Here's the breakdown of my setup:

One machine is designated the central farm 'hub', and its hostname is known
to all of the other machines, which are designated as 'clients' if they are
a developer's workstation, or 'nodes' if they are only meant to compile
(slaves to the farm, shared by all developers). The farm distributes each
developer's cache using samba, which has a messaging feature that allows you
to run scripts and is the crux of the setup. Just set this in your smb.conf:

	message command = /bin/bash -c '/path/to/myscript %s' &

You can read more details about this feature in the samba man, but I'll
explain what's going on. What samba actually does when it receives a message
from another samba client is create a local file and stuff the message into
it. We in turn send the path of this file (given by %s, see the man) to our
script, which will interpret the message inside. Samba messages also give us
access to all kinds of info like IP addresses and hostnames and whatnot, so
we will stuff some of that info into our message too.

Here's the process:

When we turn on a 'node' machine (a slave on the farm) it sends a WAKE
message to the hub saying that it is available. The hub stores away its IP
address into a list of available slaves, which I call the 'nodelist'.
Similarly when the node turns off, a SLEEP message is sent removing itself
from the list.

When we turn our workstation on, its distcc HOSTS file is emptied, and a new
one is created listing first the machine itself, and then the hub, so by
default at least 2 machines are available for compiling (including itself).
Then a WAKE message is sent to the hub. The hub mounts the client's cache to
a unique path so that each developer has their own cache on the farm. This
message is then relayed to all of the hosts in the nodelist and the process
is repeated, thus distributing the client's cache among the slaves on the
farm. Finally, the hub returns a message to the client containing all of the
hosts in the nodelist. The client appends this list of available slaves into
its distcc HOSTS file, and the farm is complete!

So the end result is each developer has sole access to their own machine,
but they all share the slaves on the farm (and the hub itself) for
compiling. The -j# flag is set after retrieving the nodelist, simply by
counting the total number of available hosts.

Tobias, I hope this info is useful! Your filtering script and 'stupid
load-balancing' is a good idea, my farm right now is a mess of various
machines around the network, so that might be helpful for me in sorting the
nodelist. If you're interested in any more details I can post some portions
of my scripts, sharing the ccache can be tricky.

 - Jeremy

More information about the distcc mailing list