[PATCH] finddcs_ldap: make sure we process all the addresses

Wed May 20 02:54:21 MDT 2015

Thanks Alex for chasing this one down and fixing it. I am sure the same 
problem will exist elsewhere, should we do a search for similar code 
scenarios?

David
------------------------------------------------------------------------
Dr David Holder CEng FIET MIEEE

Erion Ltd, An Cala, Inverkirkaig, Lochinver, Sutherland, IV27 4LR, UK

Reception: +44 (0)1422 207000

Direct Dial: +44 (0)131 2026317

http://www.erion.co.uk

Registered in England and Wales. Registered Number 3521142
Registered Office: Newtons, 49 Coniscliffe Road, Darlington, County 
Durham, DL3 7EH, UK
VAT Number: GB 698 3633 78

On 20/05/2015 09:43, Stefan (metze) Metzmacher wrote:
> Hi Alexander,
>
>> this bug was found by Red Hat's QE few years ago but we never were able
>> to find out why things failed apart from a vague notion of 'it is IPv6
>> related'. In last few weeks its head raised again and I think I finally
>> found the reason.
>>
>> When we do CLDAP-based resolution of the AD DCs, one can specify exact
>> server to probe or rely on SRV records from the DNS. If SRV records
>> return few DCs, we are supposed to try them one by one, until we get
>> successful repsonse. We start by running (essentially) getaddrinfo() and
>> then creating a UDP socket to send the CLDAP request. We then switch to
>> the next AD DC to try if returned result from the current request is not
>> what we wanted.
>>
>> However, if CLDAP request wasn't sent in the first place, the whole
>> tevent request is cancelled. As result, no attempt is made to look at
>> the other AD DCs.
>>
>> This may happen, for example, if one of AD DCs is running on IPv6
>> address and the machine which is initiating CLDAP ping has no IPv6
>> connectivity. In this case we would get 'network unreachable' or a
>> similar error and would fail to communicate. A similar behavior could be
>> due to a firewall configuration, for example.
>>
>> The solution is to make sure we are switching to the next address in
>> case of failure. I've discussed this with Simo, Guenther, and David
>> Holder, and the general consensus is to do just that for network-related
>> issues but I also think we should be skipping to the next address even
>> in the case of memory failures.
>>
>> Attached patch should fix the issue.
>> -- / Alexander Bokovoy
>>
>>
>> 0001-s4-libcli-finddcs_cldap-continue-processing-CLDAP-un.patch
>>
>>
>>  From f5e10e64f7426fa063c495cc3310b0a1b01179a3 Mon Sep 17 00:00:00 2001
>> From: Alexander Bokovoy <ab at samba.org>
>> Date: Wed, 20 May 2015 11:17:38 +0300
>> Subject: [PATCH] s4: libcli/finddcs_cldap: continue processing CLDAP until all
>>   addresses are used
>>
>> This is a subtle bug that causes CLDAP pings to fail if SRV records
>> discovered cannot be resolved or connection to them cannot be
>> established. The code that fires up CLDAP ping will silently cancel
>> the whole tevent request without going to the next server in the queue.
>>
>> This may happen, for example, when connection to IPv6 addresses couldn't
>> be established, or when IPv4 address is not online or blocked by
>> firewall.
>>
>> Signed-off-by: Alexander Bokovoy <ab at samba.org>
>> ---
>>   source4/libcli/finddcs_cldap.c | 10 ++++++++++
>>   1 file changed, 10 insertions(+)
>>
>> diff --git a/source4/libcli/finddcs_cldap.c b/source4/libcli/finddcs_cldap.c
>> index ce0e1c7..02513f6 100644
>> --- a/source4/libcli/finddcs_cldap.c
>> +++ b/source4/libcli/finddcs_cldap.c
>> @@ -264,17 +264,23 @@ static void finddcs_cldap_next_server(struct finddcs_cldap_state *state)
>>   		status = map_nt_error_from_unix_common(errno);
>>   	}
>>   	if (tevent_req_nterror(state->req, status)) {
>> +		state->srv_address_index++;
>> +		finddcs_cldap_next_server(state);
>>   		return;
>>   	}
> I guess this should be:
>
>   	if (!NT_STATUS_IS_OK(status)) {
> +		state->srv_address_index++;
> +		finddcs_cldap_next_server(state);
>    		return;
>    	}
>
> Otherwise tevent_req_nterror() or tevent_req_nomem() already finish the
> request
> and notify the callers callback which calls finddcs_cldap_recv() which
> destroys
> 'state'.
>
> We may want to record 'status' in the state so that
> finddcs_cldap_next_server() can report it if there're no more addresses
> to try.
>
> metze
>