[distcc] Job count issue

Jean Delvare khali at linux-fr.org
Sun Jul 25 15:12:52 GMT 2004


Hi all,

I have been investigating an issue about job counts for the last few
days. It may or may not be a problem in distcc, it's still too early to
conclude, but at least it is related to distcc.

Here's the problem in details:

I noticed that sometimes, I would compile something using distcc, and
observe (in distccmon-gnome, or -text, for that matter) less jobs that I
asked for. I know that it sometimes happens that there are not enough
jobs to distribute because of dependency issues, but that wasn't the
case. This is a project I compile very often, and which would normally
use 4 jobs when I asked for 4. Except that, from times to times, I would
get only 3 on one given compilation run. Or even only 2, although less
frequently. Or even 1, although even less frequently.

Facts:

* I can only observe the phenomenon when compiling a Linux 2.6 kernel
tree.

* I can only observe the problem when *running* a Linux 2.6 kernel.

* The problem happens randomly. I can do "make CC=distcc -j4" and see 4
jobs, interrupt the compilation, restart it, and have only 3. Or the
other way around.

* Once a job is "missing" it will not come back for a given compilation
run. Likewise, when the compilation starts with all the requested jobs,
jobs won't disappear. Looks like a make-init issue (see below).

* I observe the problem on two different machines (the two machines of
my farm), both running Slackware 9.1, and hand-compiled distcc 2.16.

* I could reproduce the problem with DISTCC_HOSTS="localhost" and -j2.
After several tries, one given compilation run would show a single job.

* Of course, any test case in which I couldn't see the problem doesn't
mean it couldn't have happen. It may be simply less frequent so I
wouldn't catch it with a limited number of tries.

Guesses:

I would suspect (GNU) make more than distcc since the problem is either
there or not there for a whole compilation run. My distribution comes
with make 3.80. I tried compiling it myself, didn't change a thing. I
tried compiling 3.79.1 myself, didn't help either. I cannot try older
versions since Linux 2.6 is said to require 3.79.1.

However, the use of distcc somewhat seems to trigger the problem. I
think I observed it once with a gcc-only compilation, but am unable to
reproduce it now, so I'm not sure. Of course it's easier to spot the
problem with distcc because it's meant to monitor the compilation jobs.

The only point about which my distribution is not "Linux 2.6 compilant"
is procps. Since neither make nor distcc is linked with libproc, I
suppose that it isn't the problem, but I may try to upgrade if someone
things it could be.

Questions:

1* Was this problem ever heard of?

2* Could someone try to reproduce it? Basically, you have to run Linux
2.6, compile a 2.6 kernel tree using distcc while running
distccmon-gnome, and interrupt the compilation and restart it over and
over again. In my case, there will regularly be runs with 3 jobs instead
of the expected 4. Failure frequency is variable. Sometimes I need a
dozen runs before I see the problem. Sometimes I need several runs to
*not* fall into in.

3* Any idea what the problem could be? How would I investigate? I tried
distcc's verbose mode, but can't see anything relevant in the logs.
Maybe I just don't know what to look for?

Thanks,

-- 
Jean "Khali" Delvare
http://khali.linux-fr.org/



More information about the distcc mailing list