process shared robust mutexes for tdb

Ira Cooper ira at
Mon Dec 24 13:39:17 MST 2012

On Mon, Dec 24, 2012 at 2:58 PM, simo <idra at> wrote:

> On Mon, 2012-12-24 at 09:43 -0500, Ira Cooper wrote:
> > Mutexes over mmap is actually fairly portable.  It is the robust
> > mutexes over mmap that become less so.  But even then, I know 2
> > platforms that support them.  The Solaris/illumos family, and Linux.
> > That's not TOO bad.
> Well I do not really care for any other platform than Linux, but we
> historically cared for having samba run on at least Linux, *BSD and
> Solaris and some other Unix, although all the others seem to be dead or
> moribund now.

We care about them all, but if an optimization worked on platforms X,Y and
Z... no reason not to use it.  I care about 2 platforms... Linux and

> > The numbers he's showing are not that unrealistic at all.  I've ran
> > similar benchmarks.  The ability to decrease the cost of the
> > locking primitives, and potentially avoid the context switch is pretty
> > darn big.
> I believe those numbers are quite real indeed, context switches are a
> huge slow down with a pattern like the one used in our TDB files.

Never mind that you take "1" lock, the vnode lock on the tdb and turn it
into N locks, your parallelism skyrockets.  (And for people like us, who
like to throw massive machines at file-serving, it matters!)

> If you are wondering about "Why not transactions" look at my tdb.git
> > and the locking branch.  The work is incomplete, and I'll admit a
> > mess, in an attempt to make transactions work.  Volker wisely chose
> > against this approach.  (Or he had the benefit of reading my
> > disaster.  One of the two!)
> Transactions require fsyncs, which are inherently orders of magnitude
> slower as they need to hit platters. It's true that with SSDs and other
> Flash tech, disks are not too slow, but I do not think TDB is optimized
> for SSDs access patterns, so there will be other slowdown factors there.

For some of the data we put in tdbs, ramdisks even work fine ;).

 > My only comment is: Volker, how are you handling "record" locking and
> > not chain locking?  That was one part I never got 100% right.  (And on
> > solaris using a mutex for everything is going to make someone cry...
> > probably me.)
> Shouldn't it just be a matter of replacing every fcntl lock with an
> equivalent mutex ? What's harder about chain locks ?

The chainlocks are easier! :).  The problem is, you can lock all of a
database in one fcntl call now.  Where with mutexes, you need to use N
calls to cover each mutex.  So there is actually a crossover point where
mutexes can slow you down.  Having to take 10,000 mutexes vs. 1 fcntl
call... well, the fcntl call can win ;).  Locking every record in a 1M
record DB... You see the issue.  The whole concept of a
"semi-transactional" database like TDB is a bit wacky.  Mind you I see why
each part ended up the way it did... but still.

> simo: Does LDB use transactions?  I thought it did?
> It does by default on every write operation.

So until someone does the work to support transactions, and make them fast,
it wouldn't see any improvements.  I strongly encourage Volker's initial

Get something in there, and working... then improve it.



More information about the samba-technical mailing list