fcntl spinlock in Linux?

J. Bruce Fields bfields at fieldses.org
Tue Jan 29 14:37:03 MST 2013


On Fri, Jan 25, 2013 at 10:46:08AM +1030, Rusty Russell wrote:
> Volker Lendecke <Volker.Lendecke at SerNet.DE> writes:
> > On Thu, Jan 24, 2013 at 02:05:00PM +0100, Stefan (metze) Metzmacher wrote:
> >> > If that interpretation is right, then I guess with many
> >> > cores in a NUMA configuration fcntl locks are just a VERY,
> >> > VERY bad idea at all if you want to scale.
> >> > 
> >> > Please tell me that I am wrong :-)
> >> 
> >> Reading through
> >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=Documentation/DocBook/kernel-locking.tmpl
> >> indicates to me that this could be solved using RCU to protect the
> >> global lock list.
> >
> > Follow-up question: The process shared robust mutexes, do
> > they scale better? Or do we also sit on a central spinlock
> > when we really start beating the futex API?
> 
> They have a slightly different scalability problem: there are a limited
> number of hashes (256) we use for sleeping locks.  But the fcntl lock
> design has three major bottlenecks:
> 
> 1) The single lock across all locked files.  This simplifies deadlock
>    detection.

Volker said:

	"My assumption was that doing fcntl locks on independent
	files scales."

So I assume this is what he was running into.

It's probably also easiest to fix.

On a quick skim, the only other global thing I see is the /proc/locks
implementation.

> 2) The single linked list for all offsets.  An augmented rbtree would
>    make more sense here, for files with many locks in different places.
> 
> 3) The linked lists includes sleepers as well as lock holders.  This
>    means if we're really backed up, we fall off a cliff.

I'm not sure exactly what you mean here.  Waiters are on the fl_block
list of the lock that they're waiting on, and on a global list (for
deadlock detection).  They're not on the i_flock list.

Another possible problem is that waiters are all woken up on unlock or
downgrade, when most of them may end up just blocking again immediately.

--b.

>    This wants
>    a separate rbtree, but noone wants to add a second pointer to the
>    file struct so we'll need some indirection (perhaps a struct with
>    6 rbtrees, two each for posix, flocks and leases).
> 
> Note that the locking code is fairly complex, as it handle POSIX locks,
> flock locks and leases in the same list.  The rewrite will be nasty.  I
> mean, *fun*!
> 
> Cheers,
> Rusty.


More information about the samba-technical mailing list