process shared robust mutexes for tdb

simo idra at
Mon Dec 24 16:33:10 MST 2012

On Mon, 2012-12-24 at 15:39 -0500, Ira Cooper wrote:
> On Mon, Dec 24, 2012 at 2:58 PM, simo <idra at> wrote:
> > On Mon, 2012-12-24 at 09:43 -0500, Ira Cooper wrote:
> > > Mutexes over mmap is actually fairly portable.  It is the robust
> > > mutexes over mmap that become less so.  But even then, I know 2
> > > platforms that support them.  The Solaris/illumos family, and Linux.
> > > That's not TOO bad.
> >
> > Well I do not really care for any other platform than Linux, but we
> > historically cared for having samba run on at least Linux, *BSD and
> > Solaris and some other Unix, although all the others seem to be dead or
> > moribund now.
> We care about them all, but if an optimization worked on platforms X,Y and
> Z... no reason not to use it.  I care about 2 platforms... Linux and
> illumos.
> > > The numbers he's showing are not that unrealistic at all.  I've ran
> > > similar benchmarks.  The ability to decrease the cost of the
> > > locking primitives, and potentially avoid the context switch is pretty
> > > darn big.
> >
> > I believe those numbers are quite real indeed, context switches are a
> > huge slow down with a pattern like the one used in our TDB files.
> Never mind that you take "1" lock, the vnode lock on the tdb and turn it
> into N locks, your parallelism skyrockets.  (And for people like us, who
> like to throw massive machines at file-serving, it matters!)
> > If you are wondering about "Why not transactions" look at my tdb.git
> > > and the locking branch.  The work is incomplete, and I'll admit a
> > > mess, in an attempt to make transactions work.  Volker wisely chose
> > > against this approach.  (Or he had the benefit of reading my
> > > disaster.  One of the two!)
> >
> > Transactions require fsyncs, which are inherently orders of magnitude
> > slower as they need to hit platters. It's true that with SSDs and other
> > Flash tech, disks are not too slow, but I do not think TDB is optimized
> > for SSDs access patterns, so there will be other slowdown factors there.
> For some of the data we put in tdbs, ramdisks even work fine ;).
>  > My only comment is: Volker, how are you handling "record" locking and
> > > not chain locking?  That was one part I never got 100% right.  (And on
> > > solaris using a mutex for everything is going to make someone cry...
> > > probably me.)
> >
> > Shouldn't it just be a matter of replacing every fcntl lock with an
> > equivalent mutex ? What's harder about chain locks ?
> The chainlocks are easier! :).  The problem is, you can lock all of a
> database in one fcntl call now.  Where with mutexes, you need to use N
> calls to cover each mutex.  So there is actually a crossover point where
> mutexes can slow you down.  Having to take 10,000 mutexes vs. 1 fcntl
> call... well, the fcntl call can win ;).  Locking every record in a 1M
> record DB... You see the issue.  The whole concept of a
> "semi-transactional" database like TDB is a bit wacky.  Mind you I see why
> each part ended up the way it did... but still.

Well, you can clearly take shortcuts if you need to lock the whole DB,
by using a special mutex I guess.
You could actually use a completely separate mutex space to represent
locking data if you really needed to.
Of course locking each record seprately makes it for multiple
read/writers at the same time which is what you want on something like
locking.tdb You do not want to have to always take the mutex for every
operation or you just serialize everything.

> > simo: Does LDB use transactions?  I thought it did?
> >
> > It does by default on every write operation.
> So until someone does the work to support transactions, and make them fast,
> it wouldn't see any improvements.  I strongly encourage Volker's initial
> approach!

Well the idea with transactions is that you can have only one of them
running at any given time. It turns your code into serializing writes
which means you need a single mutex to represent a global write lock.

Basically the point is that it is an either-or thing. So there isn't
really much to do to 'make transactions faster' as transaction (from the
pov of locking) are a write lock on the while db.

> Get something in there, and working... then improve it.



Simo Sorce
Samba Team GPL Compliance Officer <simo at>
Principal Software Engineer at Red Hat, Inc. <simo at>

More information about the samba-technical mailing list