[PATCH] Re: Proposed ldb 1.1.30 and tdb 1.3.13 (improve AD DC search performance, make multi-process)

Tue Apr 25 11:18:02 UTC 2017

Am 11.04.2017 um 22:23 schrieb Andrew Bartlett:
> On Tue, 2017-04-11 at 21:40 +0200, Stefan Metzmacher wrote:
>> Hi Andrew,
>>
>>>> Here's an update on current master.
>>>>
>>>> I've added the run-fcntl-deadlock as test.
>>>>
>>>> I still want to run the standalone make test of tdb and ldb on
>>>> solaris,
>>>> before we push this and have a closer look at the ldb changes.
>>>
>>> Thanks!
>>>
>>> Finally, this is may also be a 'data corruption' issue:  
>>>
>>> My theory is that because there is no read lock held, the index
>>> might
>>> refer to different records compared with the data.  Sometimes that
>>> will
>>> fall back to a traverse (we do that on errors), but sometimes that
>>> will
>>> mean we don't return all the records, as we essentially check
>>> twice,
>>> both in the index and in the final search filter.  
>>>
>>> If a search took a long time, a modify could happen between those
>>> two
>>> points, and while the modify would be atomic, the search would not
>>> be
>>> atomic.
>>
>> I don't understand what you're trying to say...
>>
>> Do you mean we need to call ltdb_lock_read() in a wider window?
>> E.g. when starting the ldb_search() from the top level module stack
>> until end of that search? Currently only searches are only atomic
>> within ltdb_search(), but not ldb_search().
> 
> They are not even that atomic!  The bug that started all this was that
> ltdb_search() does not actually hold a lock (after the first read).  
> 
> I've not considered the implications further up the stack at this
> point...
> 
>>> On the flip side, with this patch reads will block all writes for
>>> longer.
>>
>> And here...?
> 
> If we actually hold a lock in ltdb_search(), then we will block writes
> more often.  That is, what we fix in '[PATCH 07/17] ldb_tdb: Ensure we
> correctly decrement ltdb->read_lock_count'.

>>> BTW, A test for the LDB changes is in the cmocka thread.  Because
>>> LDB
>>> is buggy it passes right now, but fails if we do have the ldb fix
>>> but
>>> don't have the tdb fix. 
>>
>> And here...
>>
>> Be explicit please and refer to specific commits and tests.
> 
> https://lists.samba.org/archive/samba-technical/2017-April/119857.html
> has 
> http://lists.samba.org/pipermail/samba-technical/attachments/20170407/4
> a60faea/0002-ldb-Add-test-for-transaction-deadlock-detected-when-.bin
> 
> I'll merge the other cmocka tests shortly, then hope to work with
> Andreas to get this on merged.  
> 
> When we Garming tried to fix the refcounting ([PATCH 07/17] ldb_tdb:
> Ensure we correctly decrement ltdb->read_lock_count) and so hold the
> lock in ltdb_search(), make test failed with deadlock detected.  This
> test re-creates this situation manually, for easy validation. 

I'm totally lost sorry...

My understanding of your arguments in this thread is that
you try to tell me that there's still a bug even with the patches applied.

And I don't yet understand why...

metze

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20170425/08a5a865/signature.sig>