fcntl spinlock in Linux?
J. Bruce Fields
bfields at fieldses.org
Tue Jan 29 14:37:03 MST 2013
On Fri, Jan 25, 2013 at 10:46:08AM +1030, Rusty Russell wrote:
> Volker Lendecke <Volker.Lendecke at SerNet.DE> writes:
> > On Thu, Jan 24, 2013 at 02:05:00PM +0100, Stefan (metze) Metzmacher wrote:
> >> > If that interpretation is right, then I guess with many
> >> > cores in a NUMA configuration fcntl locks are just a VERY,
> >> > VERY bad idea at all if you want to scale.
> >> >
> >> > Please tell me that I am wrong :-)
> >>
> >> Reading through
> >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=Documentation/DocBook/kernel-locking.tmpl
> >> indicates to me that this could be solved using RCU to protect the
> >> global lock list.
> >
> > Follow-up question: The process shared robust mutexes, do
> > they scale better? Or do we also sit on a central spinlock
> > when we really start beating the futex API?
>
> They have a slightly different scalability problem: there are a limited
> number of hashes (256) we use for sleeping locks. But the fcntl lock
> design has three major bottlenecks:
>
> 1) The single lock across all locked files. This simplifies deadlock
> detection.
Volker said:
"My assumption was that doing fcntl locks on independent
files scales."
So I assume this is what he was running into.
It's probably also easiest to fix.
On a quick skim, the only other global thing I see is the /proc/locks
implementation.
> 2) The single linked list for all offsets. An augmented rbtree would
> make more sense here, for files with many locks in different places.
>
> 3) The linked lists includes sleepers as well as lock holders. This
> means if we're really backed up, we fall off a cliff.
I'm not sure exactly what you mean here. Waiters are on the fl_block
list of the lock that they're waiting on, and on a global list (for
deadlock detection). They're not on the i_flock list.
Another possible problem is that waiters are all woken up on unlock or
downgrade, when most of them may end up just blocking again immediately.
--b.
> This wants
> a separate rbtree, but noone wants to add a second pointer to the
> file struct so we'll need some indirection (perhaps a struct with
> 6 rbtrees, two each for posix, flocks and leases).
>
> Note that the locking code is fairly complex, as it handle POSIX locks,
> flock locks and leases in the same list. The rewrite will be nasty. I
> mean, *fun*!
>
> Cheers,
> Rusty.
More information about the samba-technical
mailing list