Read corruption possible during ldb_search

Andrew Bartlett abartlet at samba.org
Fri Apr 28 04:15:18 UTC 2017


On Wed, 2017-04-26 at 08:32 +0200, Stefan Metzmacher wrote:
> Am 26.04.2017 um 05:23 schrieb Andrew Bartlett:
> > On Wed, 2017-04-26 at 10:03 +1200, Garming Sam wrote:
> > > As far as I know, all the known deadlocks and data corruption
> > > should
> > > be
> > > fixed in this patchset.
> > > 
> > > What I think Andrew was trying to say was that there was no
> > > direct
> > > test
> > > of my patch ([PATCH 07/17] ldb_tdb: Ensure we correctly decrement
> > > ltdb->read_lock_count) which fixes the original read consistency
> > > issue
> > > we were attempting to resolve. Instead he wrote a cmocka test to
> > > show
> > > that separately (prior to that we only detected it through
> > > spurious
> > > deadlocks during make test).
> > 
> > Thanks.  That was the case last night, sadly in more autobuilds I'm
> > still seeing deadlock detected so I'm continuing to investigate. 
> > 
> > ldb: ltdb:
> > tdb(/home/ubuntu/autobuild/b27129/samba/bin/ab/fl2008r2dc/private/s
> > am.ldb): tdb_transaction_prepare_commit: failed to upgrade hash
> > locks: Locking error
> > 
> > ldb: ltdb:
> > tdb(/home/ubuntu/autobuild/b27129/samba/bin/ab/fl2008r2dc/private/s
> > am.ldb): tdb_transaction_cancel: no transaction
> > 
> > ldb: dsdb_set_schema() failed: 51:Busy: Failure during
> > tdb_transaction_prepare_commit(): Locking error -> Busy
> 
> I also got this:
> 
> [1997(12157)/2099 at 2h5m31s]
> samba4.urgent_replication.python(ad_dc_ntvfs)(ad_dc_ntvfs:local)
> WARNING: The "lsa over netlogon" option is deprecated
> baseDN: DC=samba,DC=example,DC=com
> 
> ltdb:
> tdb(/memdisk/metze/W/b51706/samba/bin/ab/ad_dc_ntvfs/private/sam.ldb)
> :
> tdb_transaction_prepare_commit: failed to upgrade hash locks: Locking
> error
> 
> ltdb:
> tdb(/memdisk/metze/W/b51706/samba/bin/ab/ad_dc_ntvfs/private/sam.ldb)
> :
> tdb_transaction_cancel: no transaction
> 
> dsdb_set_schema() failed: 51:Busy: Failure during
> tdb_transaction_prepare_commit(): Locking error -> Busy
> UNEXPECTED(error):
> samba4.urgent_replication.python(ad_dc_ntvfs).__main__.UrgentReplicat
> ionTests.test_attributeSchema_object(ad_dc_ntvfs:local)
> REASON: Exception: Exception: Traceback (most recent call last):
>   File
> "/memdisk/metze/W/b51706/samba/source4/dsdb/tests/python/urgent_repli
> cation.py",
> line 176, in test_attributeSchema_object
>     self.ldb.modify(m)
> LdbError: (1, 'ldb_wait from
> ../source4/dsdb/samdb/ldb_modules/util.c:369 with LDB_WAIT_ALL:
> Operations error (1)')
> 
> FAILED (0 failures, 1 errors and 0 unexpected successes in 0
> testsuites)
> 
> > I think we have this pattern:
> > 
> > (1) start search (allrecord read lock)
> > (1) transaction start (transaction read lock)
> 
> This gets the transaction write lock

OK, I'll try again, perhaps:

(1) start transaction (transaction write lock)
(1)                   (all-record read lock)
(2) start search      (all-record read lock)
(1) traverse          (record write lock - blocked)
(2) start transaction (transaction write lock - deadlock)

Clearly I'll need to catch one of the failing autobuilds and check the
lslocks, and/or reproduce in another cmocka test. 

The about might happen when the dsdb_get_schema() code writes out a new
@ATTRIBUTES entry.  A workaround might be to only write that out from
the schema_update_now hook and on DB load, not in each and every
process that notices the schema has changed, often during a search.

> But first we have to understand the above problem.

Indeed.

Andrew Bartlett
-- 
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba




More information about the samba-technical mailing list