A plan

Fri Jan 12 22:33:51 UTC 2018

Am 12.01.2018 um 23:25 schrieb Andrew Bartlett via samba-technical:
> On Fri, 2018-01-12 at 20:32 +0100, Stefan Metzmacher wrote:
>> Am 12.01.2018 um 19:49 schrieb Andrew Bartlett:
>>> G'Day All.
>>>
>>> I ran some tests overnight as promised.  
>>>
>>> The first thing to say is that we (sadly) need to drop Douglas'
>>> visualisation patches.  There are some python errors in the error cases
>>> which show up only at the end of a full run (because the DB has junk in
>>> it) that are not handled.
>>>
>>> Then I think we need to run tests on less than this full branch.  
>>>
>>> I'll try:
>>>  - master plus the flapping additions
>>>  - metze's branch minus Douglas' patches
>>
>> I fixed Douglas' patches and your talloc patches.
>>
>>>  - asn's branch with the flapping additions (but not whoami)
>>
>> I think these can wait.
>>
>>> We historically have always got into a muddle when we combine
>>> everybody's patch into one push, it feels like it would save time but
>>> actually takes longer:  This is because it assumes that all the patches
>>> work, and for example I've put in good, tested code that failed, but
>>> should have just failed its own autobuild, not held up yours. 
>>>
>>> For master, I think some builds with just the flapping tests marked
>>> would be good, then put that in.  Then do the rest by topic, owned by
>>> the author.
>>>
>>> In the medium term, Jamie (one of my new developers at Catalyst) is
>>> working to untangle our testsuite inter-dependences.  The aim here is
>>> to find sets of tests that:
>>>  - are reliable
>>>  - do no depend on each other
>>>  - consume < 4GB of RAM
>>>  - take less than 1 hour
>>>
>>> (And then to split these into parallel test environments)
>>>
>>> At Catalyst, running cloud builds for test is quite normal, often
>>> before posting and generally before pushing.  But I've noticed that
>>> even for me that the closer I get to the release deadline, the less
>>> likely I am to wait for a full 5 hour build for the absolute final
>>> patch.  I'm more likely to do what I did with the talloc patch: trust
>>> earlier tests on different code and the newly written tests and aim at
>>> autobuild.
>>>
>>> What I would like to get to is a norm where when posting patches for
>>> review, we post them to (say) gitlab by habit, and by the time they are
>>> reviewed a clear 'passed/failed' flag is shown so we don't waste time
>>> on patches that won't pass.  
>>
>> It would be nice to have that.
>>
>>> In the meantime I'll run our 5-hour testsuite a few more times in hope
>>> of getting the data on what can safely land for 4.8.
>>
>> Please you my latest autobuild branch.
> 
> 3 of your autobuild and 2 of 'no-catalyst-for-4.8' of these just failed
> with:
> 
> [1533(9721)/2234 at 2h13m38s] samba4.nbt.dgram(ad_dc_ntvfs)
> netlogon reply from 127.0.0.33:138
> netlogon reply from 127.0.0.33:138
> netlogon reply from 127.0.0.33:138
> netlogon reply from 127.0.0.33:138
> UNEXPECTED(failure): samba4.nbt.dgram.netlogon2(ad_dc_ntvfs)
> REASON: Exception: Exception: ../source4/torture/nbt/dgram.c:396:
> response->data.samlogon.data.nt5_ex.command was 21 (0x15), expected 19
> (0x13): Got incorrect netlogon response command
> 
> or
> 
> [1533(9721)/2234 at 2h12m36s] samba4.nbt.dgram(ad_dc_ntvfs)
> smbtorture 4.9.0pre1-DEVELOPERBUILD
> Using seed 1515794453
> UNEXPECTED(failure): samba4.nbt.dgram.netlogon(ad_dc_ntvfs)
> REASON: Exception: Exception: ../source4/torture/nbt/dgram.c:148:
> Expression `response != ((void *)0)' failed: Failed to receive a
> netlogon reply packet

We also got something like this 2 times.

>> It just failed with some really rare flapping tests, e.g.
>> samba.nbt.dgram. We also saw some pam_winbindd failures,
>> while setting up the ad_member env.
> 
> Two of the tests I did of Andreas's patch set (with the flapping
> patches on top), and one on the catalyst-for-4-8 branch failed with:
> 
> [64(1161)/2230 at 49m50s] samba.tests.pam_winbind(local)(ad_member)
> ERROR: Testsuite[samba.tests.pam_winbind(local)(ad_member)]
> REASON: unable to set up environment ad_member - exiting
> 
> I think we should revert the change to make ad_member use ad_dc.  I'll
> test master with such a revert (and the flappy tests changes).

We also got this a few times.

>> I'll try a few more times with the whole branch, then I'll
>> start pushing just the first chunks.
> 
> I'll put the autobuild results all in this link so you don't have to
> guess from my summary, you can check the full details:
> 
>  https://seafile.catalyst.net.nz/d/92adffd354044ffaa3c4/
> 
> I remain at your disposal to work on builds for this (in between
> spending the sunny weekend with the family).  

Ok, I'm now trying just the talloc/tevent/tdb + flapping changes.

metze

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20180112/2a8a3e92/signature.sig>