[PATCH] TDB traverse lock changes for massive AD DC perf improvement
Stefan Metzmacher
metze at samba.org
Wed Apr 5 13:40:38 UTC 2017
Hi Andrew,
>>> Please review. If reviewed, I'll push with a patch that adds new
>>> performance tests that I'm keen to get in.
>>
>> I'm wondering about all the readonly checks in
>> _tdb_transaction_prepare_commit(),
>> we already handle that in _tdb_transaction_start().
>>
>> I'm a bit nervous about the solaris10 problem.
>
> I am to. I only got game to formally propose it when Jeremy
> essentially proclaimed it dead :-)
I discussed this with Volker and we think we have an understanding
of what the solaris problem might be.
The pattern with the traverse_read and prepare_commit interaction is
the following:
1. transaction_start got the allrecord lock with F_RDLCK.
2. the traverse_read code walks the database in a sequence like this
(per chain):
2.1 chainlock(chainX, F_RDLCK)
2.2 recordlock(chainX.record1, F_RDLCK)
2.3 chainunlock(chainX, F_RDLCK)
2.4 callback(chainX.record1)
2.5 chainlock(chainX, F_RDLCK)
2.6 recordunlock(chainX.record1, F_RDLCK)
2.7 recordlock(chainX.record2, F_RDLCK)
2.8 chainunlock(chainX, F_RDLCK)
2.9 callback(chainX.record2)
2.10 chainlock(chainX, F_RDLCK)
2.11 recordunlock(chainX.record2, F_RDLCK)
2.12 chainunlock(chainX, F_RDLCK)
2.13 goto next chain
So it has always one record locked in F_RDLCK mode and tries to
get the 2nd one before it releases the first one.
3. prepare_commit tries to upgrade the allrecord lock to F_RWLCK
If that happens at the time of 2.4, the operation of
2.5 may deadlock with the allrecord lock upgrade.
On Linux step 2.5 works in order to make some progress with the
locking, but on solaris it might fail because the kernel
wants to satisfy the 1st lock requester before the 2nd one.
I think the first step is a standalone test that does this:
process1: F_RDLCK for ofs=0 len=2
process2: F_RDLCK for ofs=0 len=1
process1: upgrade ofs=0 len=2 to F_RWLCK (in blocking mode)
process2: F_RDLCK for ofs=1 len=1
process2: unlock ofs=0 len=2
process1: should continue at that point
Once we have such a test we can run it on several solaris, freebsd,
linux or whatever.
Then we can decide if we want a configure and/or runtime check for this.
And only avoid the transaction F_RDLCK lock in traverse_read if the kernel
behaves as expected.
Can you write such a standalone test?
metze
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20170405/a07156d9/signature.sig>
More information about the samba-technical
mailing list