fcntl spinlock in Linux?

Thu Jan 24 17:16:08 MST 2013

Volker Lendecke <Volker.Lendecke at SerNet.DE> writes:
> On Thu, Jan 24, 2013 at 02:05:00PM +0100, Stefan (metze) Metzmacher wrote:
>> > If that interpretation is right, then I guess with many
>> > cores in a NUMA configuration fcntl locks are just a VERY,
>> > VERY bad idea at all if you want to scale.
>> > 
>> > Please tell me that I am wrong :-)
>> 
>> Reading through
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=Documentation/DocBook/kernel-locking.tmpl
>> indicates to me that this could be solved using RCU to protect the
>> global lock list.
>
> Follow-up question: The process shared robust mutexes, do
> they scale better? Or do we also sit on a central spinlock
> when we really start beating the futex API?

They have a slightly different scalability problem: there are a limited
number of hashes (256) we use for sleeping locks.  But the fcntl lock
design has three major bottlenecks:

1) The single lock across all locked files.  This simplifies deadlock
   detection.

2) The single linked list for all offsets.  An augmented rbtree would
   make more sense here, for files with many locks in different places.

3) The linked lists includes sleepers as well as lock holders.  This
   means if we're really backed up, we fall off a cliff.  This wants
   a separate rbtree, but noone wants to add a second pointer to the
   file struct so we'll need some indirection (perhaps a struct with
   6 rbtrees, two each for posix, flocks and leases).

Note that the locking code is fairly complex, as it handle POSIX locks,
flock locks and leases in the same list.  The rewrite will be nasty.  I
mean, *fun*!

Cheers,
Rusty.