Read corruption possible during ldb_search

Stefan Metzmacher metze at samba.org
Fri Apr 28 08:46:15 UTC 2017


Am 28.04.2017 um 06:15 schrieb Andrew Bartlett:
> On Wed, 2017-04-26 at 08:32 +0200, Stefan Metzmacher wrote:
>> Am 26.04.2017 um 05:23 schrieb Andrew Bartlett:
>>> On Wed, 2017-04-26 at 10:03 +1200, Garming Sam wrote:
>>>> As far as I know, all the known deadlocks and data corruption
>>>> should
>>>> be
>>>> fixed in this patchset.
>>>>
>>>> What I think Andrew was trying to say was that there was no
>>>> direct
>>>> test
>>>> of my patch ([PATCH 07/17] ldb_tdb: Ensure we correctly decrement
>>>> ltdb->read_lock_count) which fixes the original read consistency
>>>> issue
>>>> we were attempting to resolve. Instead he wrote a cmocka test to
>>>> show
>>>> that separately (prior to that we only detected it through
>>>> spurious
>>>> deadlocks during make test).
>>>
>>> Thanks.  That was the case last night, sadly in more autobuilds I'm
>>> still seeing deadlock detected so I'm continuing to investigate. 
>>>
>>> ldb: ltdb:
>>> tdb(/home/ubuntu/autobuild/b27129/samba/bin/ab/fl2008r2dc/private/s
>>> am.ldb): tdb_transaction_prepare_commit: failed to upgrade hash
>>> locks: Locking error
>>>
>>> ldb: ltdb:
>>> tdb(/home/ubuntu/autobuild/b27129/samba/bin/ab/fl2008r2dc/private/s
>>> am.ldb): tdb_transaction_cancel: no transaction
>>>
>>> ldb: dsdb_set_schema() failed: 51:Busy: Failure during
>>> tdb_transaction_prepare_commit(): Locking error -> Busy
>>
>> I also got this:
>>
>> [1997(12157)/2099 at 2h5m31s]
>> samba4.urgent_replication.python(ad_dc_ntvfs)(ad_dc_ntvfs:local)
>> WARNING: The "lsa over netlogon" option is deprecated
>> baseDN: DC=samba,DC=example,DC=com
>>
>> ltdb:
>> tdb(/memdisk/metze/W/b51706/samba/bin/ab/ad_dc_ntvfs/private/sam.ldb)
>> :
>> tdb_transaction_prepare_commit: failed to upgrade hash locks: Locking
>> error
>>
>> ltdb:
>> tdb(/memdisk/metze/W/b51706/samba/bin/ab/ad_dc_ntvfs/private/sam.ldb)
>> :
>> tdb_transaction_cancel: no transaction
>>
>> dsdb_set_schema() failed: 51:Busy: Failure during
>> tdb_transaction_prepare_commit(): Locking error -> Busy
>> UNEXPECTED(error):
>> samba4.urgent_replication.python(ad_dc_ntvfs).__main__.UrgentReplicat
>> ionTests.test_attributeSchema_object(ad_dc_ntvfs:local)
>> REASON: Exception: Exception: Traceback (most recent call last):
>>   File
>> "/memdisk/metze/W/b51706/samba/source4/dsdb/tests/python/urgent_repli
>> cation.py",
>> line 176, in test_attributeSchema_object
>>     self.ldb.modify(m)
>> LdbError: (1, 'ldb_wait from
>> ../source4/dsdb/samdb/ldb_modules/util.c:369 with LDB_WAIT_ALL:
>> Operations error (1)')
>>
>> FAILED (0 failures, 1 errors and 0 unexpected successes in 0
>> testsuites)
>>
>>> I think we have this pattern:
>>>
>>> (1) start search (allrecord read lock)
>>> (1) transaction start (transaction read lock)
>>
>> This gets the transaction write lock
> 
> OK, I'll try again, perhaps:
> 
> (1) start transaction (transaction write lock)
> (1)                   (all-record read lock)
> (2) start search      (all-record read lock)
> (1) traverse          (record write lock - blocked)

A traverse never gets record write locks
only tdb_do_delete() uses tdb_write_lock_record().

tdb_[un]lock_record() are a noop during a transaction
because we have the allrecord lock.

tdb_write_lock_record() will always fail during a transaction
because we only have the allrecord read lock.

> (2) start transaction (transaction write lock - deadlock)

This would already fail the tdb_have_extra_locks() check,
before trying the transaction write lock.

> Clearly I'll need to catch one of the failing autobuilds and check the
> lslocks, and/or reproduce in another cmocka test. 

Remember the bug we're after happens during prepare_commit(),
I'm wondering if 3 processes need to be involved for the bug to happen.

> The about might happen when the dsdb_get_schema() code writes out a new
> @ATTRIBUTES entry.  A workaround might be to only write that out from
> the schema_update_now hook and on DB load, not in each and every
> process that notices the schema has changed, often during a search.

Yes, that would be better I guess, or we also store the value from
schemaInfo into a @SCHEMAINFO object, so that we can skip flushing
the loaded schema if it's already the same.

I guess dsdb_schema_set_indices_and_attributes()
would be the place to check that.
dsdb_schema_set_indices_and_attributes() should also
use a transaction for the update in order to consistently
update all objects at the same time.

And even if this schema change helps we need to fix the original
bug...

metze

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20170428/950f1b5a/signature.sig>


More information about the samba-technical mailing list