[TEST][PATCH] Replication errors with Samba4

Andrew Bartlett abartlet at samba.org
Tue Jul 31 02:43:32 MDT 2012


On Tue, 2012-07-31 at 10:37 +0200, Stefan (metze) Metzmacher wrote:
> Am 31.07.2012 08:37, schrieb Andrew Bartlett:
> > On Tue, 2012-07-31 at 16:09 +1000, Andrew Bartlett wrote:
> >> On Tue, 2012-07-31 at 08:04 +0200, Stefan (metze) Metzmacher wrote:
> >>> Hi Andrew,
> >>>
> >>>>> In my repl-devel branch I have a series of patches to better test our
> >>>>> replication and conflict resolution handling.
> >>>>>
> >>>>> https://git.samba.org/?p=abartlet/samba.git/.git;a=shortlog;h=refs/heads/repl-devel
> >>>>>
> >>>>> Currently we have a number of issues in this area.  The test I added
> >>>>> there shows that we do not consistently handle the conflict resolution.
> >>>>> This is particularly the case with conflicting renamed. 
> >>>>>
> >>>>> The attempts at modification of the replication code I've included try
> >>>>> to handle some of this, but it still doesn't work.  
> >>>>>
> >>>>> However, this code remains dizzyingly complex, and I wondered if,
> >>>>> particularly as I now have a reasonable testsuite, you might be able toa
> >>>>> assist me in making this more reliable?
> >>>>
> >>>> I've found some of the issues here, but I still can't make the conflict
> >>>> handling reliable.  I've put in the test simply asserting that one or
> >>>> other record becomes a conflict, until we can get back to this.  It
> >>>> would be very helpful to me if you could look at this area, as this
> >>>> should be deterministic :-(.
> >>>>
> >>>> Still, at least is no longer stops or crashes. 
> >>>
> >>> Does it randomly fail make test (if so what's the test name?)
> >>> or do you see the strange behavior in normal operation?
> >>
> >> What happens is that the additional tests I added in
> >> samba4.drs.replica_sync.python fail randomly.  
> >>
> >> To get the rest of the patch into mater (and to ensure we have any
> >> coverage of this codepath at all), I've modified the tests to accept
> >> that one DN or the other is made into a conflict, but not to assert on
> >> which one in particular is the conflict.   This is in autobuild now. 
> >>
> >> On that branch, It is clear that it's random because if you run it
> >> twice, the line number (corresponding to unit tests) of the assertions
> >> changes. 
> >>
> >> Once these are in master, I'll update that branch with just the stricter
> >> test. 
> > 
> > I've updated the branch.  To reproduce, just run:
> > 
> > make test TESTS=samba4.drs.replica_sync.python 
> 
> I guess it's related to the fact that the conflict resolution also depends
> on the invocationId. The timestamps are in 1 sec intervals, in the protocol!

Ouch!  Does that mean I would cause damage with this patch:
https://git.samba.org/?p=abartlet/samba.git/.git;a=commitdiff;h=862b26518a0629f6112fb7e6270c0b98ef71a855

(or would the NDR layer just remove the partial seconds anyway?)

It seems better to always work with NTTIME - if it's not harmful I'll
just change the commit message to clarify. 

> I think you should find out the invocationId and define the dc with the
> lower
> invocationId as dc1 and the other as dc2.

I can just put some sleep into the tests to get times different if
that's what is going on. 

(I've stopped my autobuild, which includes the next beta because it was
due today, pending resolving this)

Andrew Bartlett

-- 
Andrew Bartlett                                http://samba.org/~abartlet/
Authentication Developer, Samba Team           http://samba.org



More information about the samba-technical mailing list