process shared robust mutexes for tdb

Mon Jan 7 00:13:30 MST 2013

On Mon, Jan 07, 2013 at 09:57:56AM +1030, Rusty Russell wrote:
> I like the performance improvements, which implies we're doing *way* too
> much locking; WTF is going on?  If it's due to multiple processes
> locking the same records, which is the worst case for fcntl locks as
> implemented in Linux today, we need mutexes to improve like this.

We have seen a workload where thousands of clients try to
access the same directory simultaneously. They do not have
conflicting share modes, so it will work, but still Samba
has to coordinate that with a single locking.tdb record. We
can easily provide artificial tests where we run into that
thundering herd.

To coordinate those share modes, we need to do some form of
communication between the smbds, and our current way to do
that is via tdb, thus we do fcntl locks and contend on that
one record. If you have a better idea how to do that
coordination, please tell me! I have already thought about
not doing a blocking lock and do the fallback via some
daemon process.

> Implementing the global lock is pretty difficult, especially since Linux
> has an arbitrary limit (2048) on how many locks it will recover.   We
> might group chains such that there are only 2048 locks, then lock all of
> them?  It wouldn't be all that slow in practice, I think, but it risks
> increasing contention (there's still a benefit to having more hash
> buckets, due to shorter chains, but it's not as clear).

The patches I proposed (which are not finished yet) only
cover the hash chains. We don't do the per-record or the
allrecord lock. This means we can't do transactions on those
tdb files, which for locking.tdb-like workload is ok. We
also still do some fcntl locks for traverses, but as they
are relatively rare on locking.tdb and also not heavily
contended usually, I would be okay with those as well. Also,
I would expect some increased load when trying to lock 2048
mutexes simultaneously. Next, the number 2048 is pretty
arbitrary and not available as a /proc file, right? So if
the kernel decides to reduce this to something lower, we are
dead and we do not know it.

With best regards,

Volker Lendecke

-- 
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:kontakt at sernet.de