process shared robust mutexes for tdb

Tue Dec 25 10:26:09 MST 2012

On Tue, 2012-12-25 at 11:37 +0100, Volker Lendecke wrote:
> On Mon, Dec 24, 2012 at 02:58:01PM -0500, simo wrote:
> > On Mon, 2012-12-24 at 09:43 -0500, Ira Cooper wrote:
> > > Mutexes over mmap is actually fairly portable.  It is the robust
> > > mutexes over mmap that become less so.  But even then, I know 2
> > > platforms that support them.  The Solaris/illumos family, and Linux.
> > > That's not TOO bad.
> > 
> > Well I do not really care for any other platform than Linux, but we
> > historically cared for having samba run on at least Linux, *BSD and
> > Solaris and some other Unix, although all the others seem to be dead or
> > moribund now.
> 
> For all other platforms we just would not do this
> optimization. The fcntl code is not removed.
> 
> > I believe those numbers are quite real indeed, context switches are a
> > huge slow down with a pattern like the one used in our TDB files.
> 
> It's not only the context switches. It is the one single
> linear list which needs to be mutexed in the kernel, and it
> is the thundering herd that kicks in when a lock is
> released.
> 
> 
> > Shouldn't it just be a matter of replacing every fcntl lock with an
> > equivalent mutex ? What's harder about chain locks ?
> 
> It is the lock range. fcntl is a completely separate
> offset/len space to the data itself. The current patch
> allocates memory after the hash chains for the mutexes. You
> can't do that for every record, on 64-bit linux mutexes are
> 24 bytes in size. You could add that to every record. We
> might need to play with that when the basic code is in.
> Regarding transactions it is more difficult: We need the
> allrecord lock, which is a fcntl lock covering the whole
> file. You can't do that with mutexes, you have to lock all
> individual ones. There a reasonable linux kernel limitation
> kicks in: The robustness code will only clean up 2048
> mutexes when a process dies. That's why I decided to skip
> transactional persistent tdbs for this.

Well you can simply have one 'global' mutex that you take and represents
the whole file. However that means that you have to take it first, then
wait that all current users have released all other mutexes, before you
start doing any operation. In turn all other users will have to always
check the global mutex before every operation. That means the global
mutex would see a lot of contention.
We could ease contention a bit by using memory barriers instead for this
kind of global lock, it is unclear though that they would necessarily be
faster, they still require synchronization which can cause the same
overhead of a mutex at times.

Anyway, time to celebrate, talk to you one of the next few days :)

Simo.

-- 
Simo Sorce
Samba Team GPL Compliance Officer <simo at samba.org>
Principal Software Engineer at Red Hat, Inc. <simo at redhat.com>