Read corruption possible during ldb_search

Stefan Metzmacher metze at samba.org
Wed Apr 26 06:32:18 UTC 2017


Am 26.04.2017 um 05:23 schrieb Andrew Bartlett:
> On Wed, 2017-04-26 at 10:03 +1200, Garming Sam wrote:
>> As far as I know, all the known deadlocks and data corruption should
>> be
>> fixed in this patchset.
>>
>> What I think Andrew was trying to say was that there was no direct
>> test
>> of my patch ([PATCH 07/17] ldb_tdb: Ensure we correctly decrement
>> ltdb->read_lock_count) which fixes the original read consistency
>> issue
>> we were attempting to resolve. Instead he wrote a cmocka test to show
>> that separately (prior to that we only detected it through spurious
>> deadlocks during make test).
> 
> Thanks.  That was the case last night, sadly in more autobuilds I'm
> still seeing deadlock detected so I'm continuing to investigate. 
> 
> ldb: ltdb: tdb(/home/ubuntu/autobuild/b27129/samba/bin/ab/fl2008r2dc/private/sam.ldb): tdb_transaction_prepare_commit: failed to upgrade hash locks: Locking error
> 
> ldb: ltdb: tdb(/home/ubuntu/autobuild/b27129/samba/bin/ab/fl2008r2dc/private/sam.ldb): tdb_transaction_cancel: no transaction
> 
> ldb: dsdb_set_schema() failed: 51:Busy: Failure during tdb_transaction_prepare_commit(): Locking error -> Busy

I also got this:

[1997(12157)/2099 at 2h5m31s]
samba4.urgent_replication.python(ad_dc_ntvfs)(ad_dc_ntvfs:local)
WARNING: The "lsa over netlogon" option is deprecated
baseDN: DC=samba,DC=example,DC=com

ltdb:
tdb(/memdisk/metze/W/b51706/samba/bin/ab/ad_dc_ntvfs/private/sam.ldb):
tdb_transaction_prepare_commit: failed to upgrade hash locks: Locking error

ltdb:
tdb(/memdisk/metze/W/b51706/samba/bin/ab/ad_dc_ntvfs/private/sam.ldb):
tdb_transaction_cancel: no transaction

dsdb_set_schema() failed: 51:Busy: Failure during
tdb_transaction_prepare_commit(): Locking error -> Busy
UNEXPECTED(error):
samba4.urgent_replication.python(ad_dc_ntvfs).__main__.UrgentReplicationTests.test_attributeSchema_object(ad_dc_ntvfs:local)
REASON: Exception: Exception: Traceback (most recent call last):
  File
"/memdisk/metze/W/b51706/samba/source4/dsdb/tests/python/urgent_replication.py",
line 176, in test_attributeSchema_object
    self.ldb.modify(m)
LdbError: (1, 'ldb_wait from
../source4/dsdb/samdb/ldb_modules/util.c:369 with LDB_WAIT_ALL:
Operations error (1)')

FAILED (0 failures, 1 errors and 0 unexpected successes in 0 testsuites)

> I think we have this pattern:
> 
> (1) start search (allrecord read lock)
> (1) transaction start (transaction read lock)

This gets the transaction write lock

> (2) transaction start (transaction read lock)

And this blocks.

> (1) traverse (requesting transaction write lock, then write locks)
> (2) some operation (likely attempting to write new schema index list to
> @INDEXLIST)

Can't happen as transaction_start blocks...

> (2) transaction prepare commit (requesting all-record lock held by (1))
> -> DEADLOCK detected, because the 
> 
> I'll write another cmocka test when I get a moment to prove it.  I
> think the fix is to not request the transaction write lock in the
> traverse.

While debugging a gencache_stabilize() problem, I also thought about
just getting the transaction read lock during a traverse.

But first we have to understand the above problem.

metze

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20170426/217d3f59/signature.sig>


More information about the samba-technical mailing list