[distcc] Distcc crashes Linux (Intel e1000 ethernet driver?)

mhuhtala at abo.fi mhuhtala at abo.fi
Tue Aug 3 10:47:27 GMT 2004


This is probably a Linux e1000 driver problem, but I figured I'd ask on
this list whether anyone else has seen it.

We run distcc in a server cluster. Running a large distcc compilation on
4 to 8 cluster nodes via rsh causes about half of the nodes running
distccd to crash, seemigly at random. The entire system goes down, the
crashed nodes do not respond to ping etc. Sometimes the e1000 network
driver module fails to start upon reboot. A second reboot always brings
the system and the e1000 interface up correctly. The same distcc and OS
version work ok on desktop systems that use fast ethernet and other
network drivers.

Each node is a dual Pentium 4 Xeon with Intel 82544GC Gigabit Ethernet
Controller (rev 02) integrated on the motherboard. The OS is Fedora Core
1 (kernel 2.4.22, gcc 3.3.2, glibc 2.3.2).  We repeated the crashes
several times running distcc versions 2.11 and 2.16 and e1000 driver
versions 5.1.13-k1 (included with the Fedora kernel) and 5.3.19 (the
latest). The Fedora kernel package is the latest update for FC1:
kernel-smp-2.4.22-1.2197.nptl.

Otherwise the e1000 driver, even the older versions included in the
Fedora kernel, has worked without problems on this hardware. Distcc
seems to be the only application triggering a crash (if the problem
really is e1000, that is). I have not tried running a uniprocessor
kernel. I guess the problem might be specific to SMP. Has anyone
experienced anything like this?

Mikko



More information about the distcc mailing list