[distcc] Lock before preprocess?

Martin Pool mbp at sourcefrog.net
Wed Aug 25 03:03:41 GMT 2004

On 24 Aug 2004, Jake McGuire <jake at boom.net> wrote:
> On Aug 24, 2004, at 6:56 PM, Martin Pool wrote:
> >On 24 Aug 2004, Jake McGuire <jake at boom.net> wrote:
> >>What's the rationale for getting the CPU lock before starting to
> >>preprocess?
> >
> >The rationale is that "3 processes" means "3 processes", not "3
> >compilers."  It covers linking too.
> But the CPU resources are not equivalent.  Preprocessing and linking 
> happens on the local machine, and is limited by the -j argument to 
> make.  Compiling happens on the remote machine, and is limited by the 
> number after the slash in the entry in the DISTCC_DIR/hosts file.  The 
> lock seems much more closely associated with the remote machine's CPU; 
> it's named for the thing, after all.  It just seems that the current 
> locking scheme makes it harder to administer a cluster and also limits 
> total throughput in off-nominal cases, sometimes severely.

Oh, OK.  I think I understand your question now.

It locks the volunteer before starting the preprocessor because the
choice of remote or local determines whether we should run the
preprocessor or not.  If we decided to run locally we just run the
whole compiler.

Normally running the preprocessor is so cheap compared to running the
remote compiler that it makes no difference.

Also the preprocessor is overlapped with opening the connection.

> >I think the real issue here is that the poll-based locking is just not
> >very efficient when there is contention.  It was designed using file
> >locks to be simple, portable and robust, and efficient for moderately
> >low contention.  I'm not surprised it dies when 5000 processes contend
> >for a couple of locks.  (SysV semaphores do badly when distcc is
> >abruptly killed, which often happens to compilers.)
> It wasn't -j5000, it was a bunch of clients all doing -j10 (or so), for 
> a net -j200 (or so) with 30 cpus (or so).  170 processes contending, 30 
> files to try to lock -> 5000 locks per second.  Luckily we figured out 
> that it was slow network storage causing the performance problems, 
> because if we'd added another 50 cpus (like we almost did), we'd then 
> be getting nearly 10,000 locks per second ((200 - 80) * 80) for no 
> increase in speed.

When you say "a bunch of client" I take it you mean a bunch of client
machines, not just a bunch of different make processes on the same
client machine?

I think the real problem there is that they're sharing a single
distcc_dir, and the locks are not qualified by client name. 

I think the immediate fix for you is to give each client its own
distcc_dir on a local (possibly tmpfs) filesystem.

> >A better design would let clients sleep while waiting.  I have some
> >ideas but suggestions would be welcome.
> Yeah.  Some sort of central lock distribution scheme seems nice - even 
> just have one queue file, when distcc want a CPU you lock the file and 
> append your PID then go to sleep for n seconds, when you're done 
> compiling you read the first PID and send it a signal to wake it up, 
> then die.  If there was a way to implement counting semaphores with 
> file locks, that would be ideal.
> -jake
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.samba.org/archive/distcc/attachments/20040825/3e14f370/attachment.bin

More information about the distcc mailing list