Massively excessive DNS lookups in ads_XXXX code.

Jeremy Allison jra at samba.org
Wed Jul 22 22:29:45 UTC 2020


Hi all,

Inside ADS_STRUCT we have:

        struct {
                char *realm;
                char *workgroup;
                char *ldap_server;
                bool gc;     /* Is this a global catalog server? */
                bool no_fallback; /* Bail if the ldap_server is not available */
        } server;

and also:

        /* info derived from the servers config */
        struct {
                uint32_t flags; /* cldap flags identifying the services. */
                char *realm;
                char *bind_path;
                char *ldap_server_name;
                char *server_site_name;
                char *client_site_name;
                time_t current_time;
                char *schema_path;
                char *config_path;
                int ldap_page_size;
        } config;

Inside server we have:

	server.ldap_server

which I'm assuming is the DNS name of
the server we should be talking to.

But we *also* have:

	config.ldap_server_name

which I'm assuming is the DNS name
of the LDAP server we're told to use
by a configuration parameter.

When we're looking for DC's we get
a list of possible DC IP addresses
and then do:

cldap_ping_list()
	which calls ads_try_connect()

for all the IP addresses we found in
turn.

Inside ads_try_connect() we call

ads_cldap_netlogon_5()

and if it's successful (i.e. it
passes muster as a DC we should
be talking to) then in ads_try_connect we do:

 292         /* Fill in the ads->config values */
 293
 294         SAFE_FREE(ads->config.realm);
 295         SAFE_FREE(ads->config.bind_path);
 296         SAFE_FREE(ads->config.ldap_server_name);
 297         SAFE_FREE(ads->config.server_site_name);
 298         SAFE_FREE(ads->config.client_site_name);
 299         SAFE_FREE(ads->server.workgroup);

...

307         ads->config.ldap_server_name   = SMB_STRDUP(cldap_reply.pdc_dns_name);
308         ads->config.realm              = SMB_STRDUP(cldap_reply.dns_domain);

...
323         ads->server.workgroup          = SMB_STRDUP(cldap_reply.domain_name);
324
325         ads->ldap.port = gc ? LDAP_GC_PORT : LDAP_PORT;
326         ads->ldap.ss = *ss;

Note how we've carefully set the values of

ads->config.ldap_server_name
ads->ldap.port
ads->ldap.ss

from the CLDAP ping - *everything* we need to
then do an immediate successful connect to
the server that retuned the CLDAP ping.

But note - we *NEVER* set the value of

ads->server.ldap_server.

This is unfortunate, as what it means is ads->server.ldap_server
is left as NULL even after a successful CLDAP ping.

Problem is, ads->server.ldap_server is the value
that is checked inside:

ads_connect()

which is what finally does (another) DNS -> IP
lookup and tries to connect.

Inside ads_connect() we have:

612         if (ads->server.ldap_server) {
613                 bool ok = false;
614                 struct sockaddr_storage ss;
615
616                 ok = resolve_name(ads->server.ldap_server, &ss, 0x20, true);
617                 if (!ok) {
618                         DEBUG(5,("ads_connect: unable to resolve name %s\n",
619                                  ads->server.ldap_server));
620                         status = ADS_ERROR_NT(NT_STATUS_NOT_FOUND);
621                         goto out;
622                 }
623                 ok = ads_try_connect(ads, ads->server.gc, &ss);

....

(another note - checking carefully ads->server.gc
is *never* set to true anywhere, so I think we
can remove it and remove this bool parameter
to ads_try_connect()).

Note we don't first check if we already have
a perfectly good ads->ldap.ss field and have
already looked up the ads->server.ldap_server -> ads->ldap.ss
but ignore it and do it again.

If you look carefully, the *only* place

ads->server.ldap_server

is *actually* set is in source3/libads/ads_struct.c:

130 /*
131   initialise a ADS_STRUCT, ready for some ads_ ops
132 */
133 ADS_STRUCT *ads_init(const char *realm,
134                      const char *workgroup,
135                      const char *ldap_server,
136                      enum ads_sasl_state_e sasl_state)
137 {
138         ADS_STRUCT *ads;
139         int wrap_flags;
140
141         ads = SMB_XMALLOC_P(ADS_STRUCT);
142         ZERO_STRUCTP(ads);
143
144         ads->server.realm = realm? SMB_STRDUP(realm) : NULL;
145         ads->server.workgroup = workgroup ? SMB_STRDUP(workgroup) : NULL;
146         ads->server.ldap_server = ldap_server? SMB_STRDUP(ldap_server) : NULL;

...

So what this means is after the CLDAP ping,
ads->server.ldap_server == NULL

and inside ads_connect(), because ads->server.ldap_server == NULL
we call:

643         ntstatus = ads_find_dc(ads);
644         if (NT_STATUS_IS_OK(ntstatus)) {
645                 goto got_connection;
646         }
647
648         status = ADS_ERROR_NT(ntstatus);
649         goto out;
650
651 got_connection:
652
653         print_sockaddr(addr, sizeof(addr), &ads->ldap.ss);
654         DEBUG(3,("Successfully contacted LDAP server %s\n", addr));

which does *ANOTHER* round of SRV -> server name -> DNS
lookups. This is *REALLY* expensive !!!!!!

We *finally* end up in ads_connect():

688         /* Otherwise setup the TCP LDAP session */
689
690         ads->ldap.ld = ldap_open_with_timeout(ads->config.ldap_server_name,
691                                               &ads->ldap.ss,
692                                               ads->ldap.port, lp_ldap_timeout());
693         if (ads->ldap.ld == NULL) {
694                 status = ADS_ERROR(LDAP_OPERATIONS_ERROR);
695                 goto out;
696         }
697         DEBUG(3,("Connected to LDAP server %s\n", ads->config.ldap_server_name));

Hurrah - we've finally connected...

We've gotten away with this working, as most DNS
setups are sane and work quickly.

However I'm working on an issue where the DNS
is broken, and takes *far* too long to perform
these multi-lookups.

Correct me if I'm wrong - but what we need to
do here is:

(a). Remove ads->server.gc and remove the 'bool gc'
parameter from ads_try_connect().

(b). Inside ads_try_connect() add the lines:

diff --git a/source3/libads/ldap.c b/source3/libads/ldap.c
index d431156912f..1b969b75371 100755
--- a/source3/libads/ldap.c
+++ b/source3/libads/ldap.c
@@ -297,6 +297,7 @@ static bool ads_try_connect(ADS_STRUCT *ads, bool gc,
        SAFE_FREE(ads->config.server_site_name);
        SAFE_FREE(ads->config.client_site_name);
        SAFE_FREE(ads->server.workgroup);
+       SAFE_FREE(ads->server.ldap_server);
 
        if (!check_cldap_reply_required_flags(cldap_reply.server_type,
                                              ads->config.flags)) {
@@ -321,6 +322,7 @@ static bool ads_try_connect(ADS_STRUCT *ads, bool gc,
                        SMB_STRDUP(cldap_reply.client_site);
        }
        ads->server.workgroup          = SMB_STRDUP(cldap_reply.domain_name);
+       ads->server.ldap_server = SMB_STRDUP(cldap_reply.pdc_dns_name);

(c). Inside ads_connect() add the lines:

@@ -609,6 +611,14 @@ ADS_STATUS ads_connect(ADS_STRUCT *ads)
                TALLOC_FREE(s);
        }
 
+       if (!is_zero_addr(&ads->ldap.ss)) {
+               /* We already know who to talk to. */
+               book ok = ads_try_connect(ads, &ads->ldap.ss);
+               if (ok) {
+                       goto got_connection;
+               }
+       }
+
        if (ads->server.ldap_server) {
                bool ok = false;
                struct sockaddr_storage ss;

Do you concur ?

I think doing this will lead to an *ORDER OF MAGNITUDE*
speedup in ads_XXXXX() connections (used by winbindd
as well as the 'net' tools) in flaky DNS situations ?

I'm going to test this in CI and on the customer site,
but I just wanted other eyes to look at this to make
sure my analysis was correct.

Cheers,

Jeremy.



More information about the samba-technical mailing list