process shared robust mutexes for tdb
ira at samba.org
Mon Dec 24 13:39:17 MST 2012
On Mon, Dec 24, 2012 at 2:58 PM, simo <idra at samba.org> wrote:
> On Mon, 2012-12-24 at 09:43 -0500, Ira Cooper wrote:
> > Mutexes over mmap is actually fairly portable. It is the robust
> > mutexes over mmap that become less so. But even then, I know 2
> > platforms that support them. The Solaris/illumos family, and Linux.
> > That's not TOO bad.
> Well I do not really care for any other platform than Linux, but we
> historically cared for having samba run on at least Linux, *BSD and
> Solaris and some other Unix, although all the others seem to be dead or
> moribund now.
We care about them all, but if an optimization worked on platforms X,Y and
Z... no reason not to use it. I care about 2 platforms... Linux and
> > The numbers he's showing are not that unrealistic at all. I've ran
> > similar benchmarks. The ability to decrease the cost of the
> > locking primitives, and potentially avoid the context switch is pretty
> > darn big.
> I believe those numbers are quite real indeed, context switches are a
> huge slow down with a pattern like the one used in our TDB files.
Never mind that you take "1" lock, the vnode lock on the tdb and turn it
into N locks, your parallelism skyrockets. (And for people like us, who
like to throw massive machines at file-serving, it matters!)
> If you are wondering about "Why not transactions" look at my tdb.git
> > and the locking branch. The work is incomplete, and I'll admit a
> > mess, in an attempt to make transactions work. Volker wisely chose
> > against this approach. (Or he had the benefit of reading my
> > disaster. One of the two!)
> Transactions require fsyncs, which are inherently orders of magnitude
> slower as they need to hit platters. It's true that with SSDs and other
> Flash tech, disks are not too slow, but I do not think TDB is optimized
> for SSDs access patterns, so there will be other slowdown factors there.
For some of the data we put in tdbs, ramdisks even work fine ;).
> My only comment is: Volker, how are you handling "record" locking and
> > not chain locking? That was one part I never got 100% right. (And on
> > solaris using a mutex for everything is going to make someone cry...
> > probably me.)
> Shouldn't it just be a matter of replacing every fcntl lock with an
> equivalent mutex ? What's harder about chain locks ?
The chainlocks are easier! :). The problem is, you can lock all of a
database in one fcntl call now. Where with mutexes, you need to use N
calls to cover each mutex. So there is actually a crossover point where
mutexes can slow you down. Having to take 10,000 mutexes vs. 1 fcntl
call... well, the fcntl call can win ;). Locking every record in a 1M
record DB... You see the issue. The whole concept of a
"semi-transactional" database like TDB is a bit wacky. Mind you I see why
each part ended up the way it did... but still.
> simo: Does LDB use transactions? I thought it did?
> It does by default on every write operation.
So until someone does the work to support transactions, and make them fast,
it wouldn't see any improvements. I strongly encourage Volker's initial
Get something in there, and working... then improve it.
More information about the samba-technical