[distcc] Re: distcc host selection algorithm is too naive

Peter Hawkins peter at hawkins.emu.id.au
Mon Feb 17 03:40:21 GMT 2003

On Mon, Feb 17, 2003 at 12:43:15PM +1100, Martin Pool wrote:
> On 17 Feb 2003, Martin Pool <mbp> wrote:
> Is it perhaps the case that the state directory is stored on an NFS
> disk where locking is not working properly?  In a verbose trace, can
> you see it finding any locks to be busy?
> If locking was failing altogether then I'd expect the kind of speedup
> you report from just doing random distribution.

Ok, turns out I mistook the problem completely. You're right. The
problem is that the way I am talking to my client machines is via a ssh
tunnel. This is done for two reasons - (a) security (the distccds can
then listen on localhost, not a public network interface) and (b) there is
an ethernet transceiver bug that seems to be hit on opening a TCP
connection (but that's a totally seperate issue).

ie. I am using distcc like this:
(started from a shell script)
ssh -c blowfish -f -N -L 40000:localhost:3632 machine01 
ssh -c blowfish -f -N -L 40001:localhost:3632 machine02 
ssh -c blowfish -f -N -L 40002:localhost:3632 machine03 

export DISTCC_HOSTS="localhost:40000 localhost:40001 localhost:40002

make -j20 CC=distcc

However, as you correctly observed, the locking isn't done correctly -
because the lock name does not include the port number. Hence all of the
hosts in the list compete for the same set of locks (and the first
machine wins in almost every case). A random selection algorithm helped,
because it meant that the jobs were usually spread over 4 machines, not
just 1.

A patch to add the port number to the lock name is attached.

-------------- next part --------------
--- distcc-1.1/src/lock.c	2003-01-28 00:27:30.000000000 +1100
+++ distcc-1.1.new/src/lock.c	2003-02-17 14:25:01.000000000 +1100
@@ -85,16 +85,17 @@
 struct dcc_hostdef *dcc_hostdef_local = &_dcc_local;
-static char * dcc_make_lock_filename(const char *host, int iter)
+static char * dcc_make_lock_filename(const char *host, int port, int iter)
     int need_len;
     char * buf;
     const char *tempdir;
     tempdir = dcc_get_tempdir();
-    need_len = strlen(tempdir) + 6 + strlen(host) + 1 + 7 + 1;
+    need_len = strlen(tempdir) + 6 + strlen(host) + 1 + 5 + 1 + 7 + 1;
     buf = malloc(need_len);
-    if (snprintf(buf, need_len, "%s/lock_%s_%07d", tempdir, host, iter)
+    if (snprintf(buf, need_len, "%s/lock_%s_%05d_%07d", tempdir, host, port,
+                 iter)
         != need_len - 1) {
         rs_fatal("wrong length??");
@@ -138,7 +139,7 @@
     int ret;
     tempdir = dcc_get_tempdir();
-    fname = dcc_make_lock_filename(host->hostname, slot);
+    fname = dcc_make_lock_filename(host->hostname, host->port, slot);
     /* Create if it doesn't exist.  We don't actually do anything with
      * the file except lock it.*/

More information about the distcc mailing list