samba_dnsupdate timeouts (was Re: [PATCH] python indent bugfix in dns_hub.py)

Stefan Metzmacher metze at samba.org
Thu Feb 7 14:54:34 UTC 2019


Am 07.02.19 um 10:37 schrieb Stefan Metzmacher via samba-technical:
> Hi Tim,
> 
>> I just wanted to say I think dns_hub has been a good addition to the
>> selftest framework. Anything that allows us to test DNS more
>> realistically is a good thing, and is worth a few teething problems.
> 
> Exactly we should make more use of it and remove the need for the
> RESOLV_WRAPPER_HOSTS file.
> 
>> It at least highlighted the real problem, which was that we were
>> starting to hit the CI runner limits, and that could've dragged on for
>> months with CI failing intermittently for no obvious reason. For the
>> record, the CI limit seems to be around 8 DCs, although obviously this
>> varies somewhat depending on the process model overhead. Tearing down
>> testenvs once we're done with them seems like a good idea in the long run.
> 
> BTW: there's at least one additional problem I noticed while cleaning up
> my patchset to wait for the dns_update_cache file to be filled.
> 
> On an RODC samba_dnsupdate calls DsrUpdateReadOnlyServerDnsRecords via
> IRPC to the local winbindd, which calls
> DsrUpdateReadOnlyServerDnsRecords via netlogon to the RWDC.
> The netlogon server calls dnsupdate_RODC via IRPC to the dnsupdate
> task on the RWDC, which calls samba_dnsupdate with a temporary config
> on behalf of the RODC.
> 
> Currently samba_dnsupdate (on the RODC) constantly recreates its irpc
> handle (and the messaging context and the dgram socket). This causes
> problems when winbindd tries to send back the result to samba_dnsupdate,
> as winbindd's messaging context caches connected dgram sockets per
> target pid for 1 second. As the target (samba_dnsupdate) constantly
> recreates its socket, winbindd very likely hits ECONNREFUSED when
> the socket is recreated multiple times within 1 second.
> As a result samba_dnsupdate hits a 10 second irpc timeout, so the
> whole samba_dnsupdate hits the 20 second timeout on the RODC.
> 
> The solution to this is to cache the irpc handle in samba_dnsupdate.
> 
>> I'll raise a bug and backport the autobuild change to 4.10.
> 
> I'm currently testing the attached additional patches here:
> https://gitlab.com/samba-team/devel/samba/pipelines/46503335
> and
> https://gitlab.com/samba-team/devel/samba/pipelines/46503395

The results look good, every thing passes, but the time reduction
provided by
https://git.samba.org/?p=metze/samba/wip.git;a=commitdiff;h=9f808d4e1e5d
is only ~ 25 mins.

samba_ad_dc_ntvfs takes 201 mins:
https://gitlab.com/samba-team/devel/samba/-/jobs/157703665
and samba only 27 mins:
https://gitlab.com/samba-team/devel/samba/-/jobs/157703659

Before we had samba with 227 mins:
https://gitlab.com/samba-team/devel/samba/-/jobs/157703448

A real reduction is provided by this commit:
https://git.samba.org/?p=metze/samba/wip.git;a=commitdiff;h=173e69dd4f5e
so that we only run ad_dc_ntvfs in the new environment

Now we have samba with 83 mins:
https://gitlab.com/samba-team/devel/samba/-/jobs/157804397
and samba_dc_dc_ntvfs with a failure at 635/636 tests after 94 mins:
https://gitlab.com/samba-team/devel/samba/-/jobs/157804407

The failure is reproducable:

[635(4196)/636 at 1h21m26s]
samba4.blackbox.dbcheck(ad_dc_ntvfs)(ad_dc_ntvfs:local)
UNEXPECTED(failure):
samba4.blackbox.dbcheck(ad_dc_ntvfs).dbcheck(ad_dc_ntvfs:local)
REASON: Exception: Exception: WARNING: The "lsa over netlogon" option is
deprecated
WARNING: The "server schannel" option is deprecated
Checking 10192 objects
NOTE: old (due to rename or delete) DN string component for
defaultObjectCategory in object
CN=schemaInfo-Class-1549544055-NEW,CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com
-
CN=schemaInfo-Class-1549544055,CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com
Not fixing old string component
NOTE: old (due to rename or delete) DN string component for
lastKnownParent in object CN=Servers

FAILED (1 failures, 0 errors and 0 unexpected successes in 0 testsuites)

A summary with detailed information can be found in:
  ./bin/ab/summary

Or in a private autobuild:

[635(4196)/636 at 1h41m17s]
samba4.blackbox.dbcheck(ad_dc_ntvfs)(ad_dc_ntvfs:local)
UNEXPECTED(failure):
samba4.blackbox.dbcheck(ad_dc_ntvfs).dbcheck(ad_dc_ntvfs:local)
REASON: Exception: Exception: WARNING: The "lsa over netlogon" option is
deprecated
WARNING: The "server schannel" option is deprecated
Checking 10196 objects
NOTE: old (due to rename or delete) DN string component for
defaultObjectCategory in object
CN=schemaInfo-Class-1549542279-NEW,CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com
-
CN=schemaInfo-Class-1549542279,CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com
Not fixing old string component
NOTE: old (due to rename or delete) DN string component for
lastKnownParent in object CN=NTDS Settings

FAILED (1 failures, 0 errors and 0 unexpected successes in 0 testsuites)

I'm currently running a private autobuild with
autobuild-private.sh samba-ad-dc-ntvfs --nocleanup
and try to debug it there.

Any ideas why this fails when we remove rodc, vampire_dc and promoted_dc.

metze

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20190207/0d8e955b/signature.sig>


More information about the samba-technical mailing list