Massively excessive DNS lookups in ads_XXXX code.
Jeremy Allison
jra at samba.org
Wed Jul 22 22:29:45 UTC 2020
Hi all,
Inside ADS_STRUCT we have:
struct {
char *realm;
char *workgroup;
char *ldap_server;
bool gc; /* Is this a global catalog server? */
bool no_fallback; /* Bail if the ldap_server is not available */
} server;
and also:
/* info derived from the servers config */
struct {
uint32_t flags; /* cldap flags identifying the services. */
char *realm;
char *bind_path;
char *ldap_server_name;
char *server_site_name;
char *client_site_name;
time_t current_time;
char *schema_path;
char *config_path;
int ldap_page_size;
} config;
Inside server we have:
server.ldap_server
which I'm assuming is the DNS name of
the server we should be talking to.
But we *also* have:
config.ldap_server_name
which I'm assuming is the DNS name
of the LDAP server we're told to use
by a configuration parameter.
When we're looking for DC's we get
a list of possible DC IP addresses
and then do:
cldap_ping_list()
which calls ads_try_connect()
for all the IP addresses we found in
turn.
Inside ads_try_connect() we call
ads_cldap_netlogon_5()
and if it's successful (i.e. it
passes muster as a DC we should
be talking to) then in ads_try_connect we do:
292 /* Fill in the ads->config values */
293
294 SAFE_FREE(ads->config.realm);
295 SAFE_FREE(ads->config.bind_path);
296 SAFE_FREE(ads->config.ldap_server_name);
297 SAFE_FREE(ads->config.server_site_name);
298 SAFE_FREE(ads->config.client_site_name);
299 SAFE_FREE(ads->server.workgroup);
...
307 ads->config.ldap_server_name = SMB_STRDUP(cldap_reply.pdc_dns_name);
308 ads->config.realm = SMB_STRDUP(cldap_reply.dns_domain);
...
323 ads->server.workgroup = SMB_STRDUP(cldap_reply.domain_name);
324
325 ads->ldap.port = gc ? LDAP_GC_PORT : LDAP_PORT;
326 ads->ldap.ss = *ss;
Note how we've carefully set the values of
ads->config.ldap_server_name
ads->ldap.port
ads->ldap.ss
from the CLDAP ping - *everything* we need to
then do an immediate successful connect to
the server that retuned the CLDAP ping.
But note - we *NEVER* set the value of
ads->server.ldap_server.
This is unfortunate, as what it means is ads->server.ldap_server
is left as NULL even after a successful CLDAP ping.
Problem is, ads->server.ldap_server is the value
that is checked inside:
ads_connect()
which is what finally does (another) DNS -> IP
lookup and tries to connect.
Inside ads_connect() we have:
612 if (ads->server.ldap_server) {
613 bool ok = false;
614 struct sockaddr_storage ss;
615
616 ok = resolve_name(ads->server.ldap_server, &ss, 0x20, true);
617 if (!ok) {
618 DEBUG(5,("ads_connect: unable to resolve name %s\n",
619 ads->server.ldap_server));
620 status = ADS_ERROR_NT(NT_STATUS_NOT_FOUND);
621 goto out;
622 }
623 ok = ads_try_connect(ads, ads->server.gc, &ss);
....
(another note - checking carefully ads->server.gc
is *never* set to true anywhere, so I think we
can remove it and remove this bool parameter
to ads_try_connect()).
Note we don't first check if we already have
a perfectly good ads->ldap.ss field and have
already looked up the ads->server.ldap_server -> ads->ldap.ss
but ignore it and do it again.
If you look carefully, the *only* place
ads->server.ldap_server
is *actually* set is in source3/libads/ads_struct.c:
130 /*
131 initialise a ADS_STRUCT, ready for some ads_ ops
132 */
133 ADS_STRUCT *ads_init(const char *realm,
134 const char *workgroup,
135 const char *ldap_server,
136 enum ads_sasl_state_e sasl_state)
137 {
138 ADS_STRUCT *ads;
139 int wrap_flags;
140
141 ads = SMB_XMALLOC_P(ADS_STRUCT);
142 ZERO_STRUCTP(ads);
143
144 ads->server.realm = realm? SMB_STRDUP(realm) : NULL;
145 ads->server.workgroup = workgroup ? SMB_STRDUP(workgroup) : NULL;
146 ads->server.ldap_server = ldap_server? SMB_STRDUP(ldap_server) : NULL;
...
So what this means is after the CLDAP ping,
ads->server.ldap_server == NULL
and inside ads_connect(), because ads->server.ldap_server == NULL
we call:
643 ntstatus = ads_find_dc(ads);
644 if (NT_STATUS_IS_OK(ntstatus)) {
645 goto got_connection;
646 }
647
648 status = ADS_ERROR_NT(ntstatus);
649 goto out;
650
651 got_connection:
652
653 print_sockaddr(addr, sizeof(addr), &ads->ldap.ss);
654 DEBUG(3,("Successfully contacted LDAP server %s\n", addr));
which does *ANOTHER* round of SRV -> server name -> DNS
lookups. This is *REALLY* expensive !!!!!!
We *finally* end up in ads_connect():
688 /* Otherwise setup the TCP LDAP session */
689
690 ads->ldap.ld = ldap_open_with_timeout(ads->config.ldap_server_name,
691 &ads->ldap.ss,
692 ads->ldap.port, lp_ldap_timeout());
693 if (ads->ldap.ld == NULL) {
694 status = ADS_ERROR(LDAP_OPERATIONS_ERROR);
695 goto out;
696 }
697 DEBUG(3,("Connected to LDAP server %s\n", ads->config.ldap_server_name));
Hurrah - we've finally connected...
We've gotten away with this working, as most DNS
setups are sane and work quickly.
However I'm working on an issue where the DNS
is broken, and takes *far* too long to perform
these multi-lookups.
Correct me if I'm wrong - but what we need to
do here is:
(a). Remove ads->server.gc and remove the 'bool gc'
parameter from ads_try_connect().
(b). Inside ads_try_connect() add the lines:
diff --git a/source3/libads/ldap.c b/source3/libads/ldap.c
index d431156912f..1b969b75371 100755
--- a/source3/libads/ldap.c
+++ b/source3/libads/ldap.c
@@ -297,6 +297,7 @@ static bool ads_try_connect(ADS_STRUCT *ads, bool gc,
SAFE_FREE(ads->config.server_site_name);
SAFE_FREE(ads->config.client_site_name);
SAFE_FREE(ads->server.workgroup);
+ SAFE_FREE(ads->server.ldap_server);
if (!check_cldap_reply_required_flags(cldap_reply.server_type,
ads->config.flags)) {
@@ -321,6 +322,7 @@ static bool ads_try_connect(ADS_STRUCT *ads, bool gc,
SMB_STRDUP(cldap_reply.client_site);
}
ads->server.workgroup = SMB_STRDUP(cldap_reply.domain_name);
+ ads->server.ldap_server = SMB_STRDUP(cldap_reply.pdc_dns_name);
(c). Inside ads_connect() add the lines:
@@ -609,6 +611,14 @@ ADS_STATUS ads_connect(ADS_STRUCT *ads)
TALLOC_FREE(s);
}
+ if (!is_zero_addr(&ads->ldap.ss)) {
+ /* We already know who to talk to. */
+ book ok = ads_try_connect(ads, &ads->ldap.ss);
+ if (ok) {
+ goto got_connection;
+ }
+ }
+
if (ads->server.ldap_server) {
bool ok = false;
struct sockaddr_storage ss;
Do you concur ?
I think doing this will lead to an *ORDER OF MAGNITUDE*
speedup in ads_XXXXX() connections (used by winbindd
as well as the 'net' tools) in flaky DNS situations ?
I'm going to test this in CI and on the customer site,
but I just wanted other eyes to look at this to make
sure my analysis was correct.
Cheers,
Jeremy.
More information about the samba-technical
mailing list