Read corruption possible during ldb_search
metze at samba.org
Fri Apr 28 08:46:15 UTC 2017
Am 28.04.2017 um 06:15 schrieb Andrew Bartlett:
> On Wed, 2017-04-26 at 08:32 +0200, Stefan Metzmacher wrote:
>> Am 26.04.2017 um 05:23 schrieb Andrew Bartlett:
>>> On Wed, 2017-04-26 at 10:03 +1200, Garming Sam wrote:
>>>> As far as I know, all the known deadlocks and data corruption
>>>> fixed in this patchset.
>>>> What I think Andrew was trying to say was that there was no
>>>> of my patch ([PATCH 07/17] ldb_tdb: Ensure we correctly decrement
>>>> ltdb->read_lock_count) which fixes the original read consistency
>>>> we were attempting to resolve. Instead he wrote a cmocka test to
>>>> that separately (prior to that we only detected it through
>>>> deadlocks during make test).
>>> Thanks. That was the case last night, sadly in more autobuilds I'm
>>> still seeing deadlock detected so I'm continuing to investigate.
>>> ldb: ltdb:
>>> am.ldb): tdb_transaction_prepare_commit: failed to upgrade hash
>>> locks: Locking error
>>> ldb: ltdb:
>>> am.ldb): tdb_transaction_cancel: no transaction
>>> ldb: dsdb_set_schema() failed: 51:Busy: Failure during
>>> tdb_transaction_prepare_commit(): Locking error -> Busy
>> I also got this:
>> [1997(12157)/2099 at 2h5m31s]
>> WARNING: The "lsa over netlogon" option is deprecated
>> baseDN: DC=samba,DC=example,DC=com
>> tdb_transaction_prepare_commit: failed to upgrade hash locks: Locking
>> tdb_transaction_cancel: no transaction
>> dsdb_set_schema() failed: 51:Busy: Failure during
>> tdb_transaction_prepare_commit(): Locking error -> Busy
>> REASON: Exception: Exception: Traceback (most recent call last):
>> line 176, in test_attributeSchema_object
>> LdbError: (1, 'ldb_wait from
>> ../source4/dsdb/samdb/ldb_modules/util.c:369 with LDB_WAIT_ALL:
>> Operations error (1)')
>> FAILED (0 failures, 1 errors and 0 unexpected successes in 0
>>> I think we have this pattern:
>>> (1) start search (allrecord read lock)
>>> (1) transaction start (transaction read lock)
>> This gets the transaction write lock
> OK, I'll try again, perhaps:
> (1) start transaction (transaction write lock)
> (1) (all-record read lock)
> (2) start search (all-record read lock)
> (1) traverse (record write lock - blocked)
A traverse never gets record write locks
only tdb_do_delete() uses tdb_write_lock_record().
tdb_[un]lock_record() are a noop during a transaction
because we have the allrecord lock.
tdb_write_lock_record() will always fail during a transaction
because we only have the allrecord read lock.
> (2) start transaction (transaction write lock - deadlock)
This would already fail the tdb_have_extra_locks() check,
before trying the transaction write lock.
> Clearly I'll need to catch one of the failing autobuilds and check the
> lslocks, and/or reproduce in another cmocka test.
Remember the bug we're after happens during prepare_commit(),
I'm wondering if 3 processes need to be involved for the bug to happen.
> The about might happen when the dsdb_get_schema() code writes out a new
> @ATTRIBUTES entry. A workaround might be to only write that out from
> the schema_update_now hook and on DB load, not in each and every
> process that notices the schema has changed, often during a search.
Yes, that would be better I guess, or we also store the value from
schemaInfo into a @SCHEMAINFO object, so that we can skip flushing
the loaded schema if it's already the same.
I guess dsdb_schema_set_indices_and_attributes()
would be the place to check that.
dsdb_schema_set_indices_and_attributes() should also
use a transaction for the update in order to consistently
update all objects at the same time.
And even if this schema change helps we need to fix the original
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 836 bytes
Desc: OpenPGP digital signature
More information about the samba-technical