BUG# 12754 [WIP] Avoid replication lockup by using USN from the start of the DRS cycle

Fri Apr 21 21:24:33 UTC 2017

I wrote this up with Garming earlier this week based on his analysis of
our flapping tests over Easter. 

If we use the USN of an object at the time we fetch the full object to
calculate the up-to-dateness vector, we risk ignoring objects that
should appear later in the replication cycle.

This can happen if objects A B and C have USN:

 A 100
 B 200
 C 300

but during replicaiton of 3 pages of results, B is modified, getting
USN 400

Then we send:
 A 100
 B 400

(and ignore)
 C 300

This is because the server sets an uptodateness vector of 400 at B, and
client sends it back, causing the server to ignore C at 300, even when
the USN check (alone) would have sent it.

The patch instead only sends an uptodatenss vector matching the USN
seen at the time the cycle starts, this means we re-send the object
later for the higher USN.

We need a test for this (shouldn't take long to write).  We think this
causes the WRITE_FAULT errors during DRS tests.  It would also be safer
with the ldb locking patches and without nested event loops for ldb
searches (make the search for GUIDs more atomic). 

Andrew Bartlett
-- 
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba
-------------- next part --------------
A non-text attachment was scrubbed...
Name: samba.git-ca0077ee7326eea6bed6eb2f313ceb68922241f4.patch
Type: text/x-patch
Size: 8208 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20170422/a97fd659/samba.git-ca0077ee7326eea6bed6eb2f313ceb68922241f4-0001.bin>