[WIP][PATCH] Safe LDB locking for better replication
Stefan Metzmacher
metze at samba.org
Fri Jun 9 07:55:58 UTC 2017
Am 09.06.2017 um 07:02 schrieb Andrew Bartlett:
> I think this patch is almost finished (but I've thought that for a long
> time).
>
> Here is my patch to do whole-DB locking and TDB locking during the
> search. I have it under test right now, which will almost certainly
> blow up, but I want to post it here for comment in the meantime.
>
> If correct, it should fix a number of 'missing objectclass' replication
> errors and flapping tests in autobuild.
>
> I still need to make the tests die (rather than hang) when used without
> the fixed TDB. No doubt I've missed other things too, but here is is.
I guess the wrong dsdb_replace() for the index mod_msg is one of the
most important bugs, which we can push and backport now?
Regarding the db lock around the high level search, I'd have a better
feeling if we would make sure that any modifications would give an error
while a search is running.
Similar to tdb_traverse_read() where the
callbacks are not allowed to do any modifications.
I'm not sure it helps to track this on the ldb_context, as we can
have multiple ldb_contexts within one process which make use of the same
low level tdb_context.
I'm also not sure if we have to get the read locks on all partitions,
wouldn't it be enough to get the lock just on the main sam.ldb file?
As every search and transaction would always go through this first,
it should be enough to have that as a guard against modifications
and it will reduce the processing and syscall overhead.
Finally given that the code in lib/ldb/common/ldb.c will be
the only place that does the LOCK_DB and UNLOCK_DB calls,
I'd say it's better to add new read_lock() and read_unlock() hooks to
the module stack instead of using an extended operation, that would also
simplify the code a lot.
We could then have the coordination checks which make sure that the
following constraints are enforced:
1. Only one top level search can run at a time! So the
top level callbacks are not allowed to do any other searches.
This guarantees that we just read_lock()/unlock() for one single
search and don't get into a possible recursion/nesting hell again.
2. Once a top level read_lock() is done transaction_start() would fail
3. If a transaction is already started we skip the read_lock() call to
the tdb level, but still make sure only one top level search can run
at a time.
One thing to watch out for are the modify triggers in the rootdse
module, which require the usage of the global event context and
operate async using IRPC. We need to make sure that we don't
have a tdb_transaction open while these are running.
metze
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20170609/4be6699d/signature.sig>
More information about the samba-technical
mailing list