How to detect a client-closed connection during a write from our LDAP server?

Fri Oct 14 14:03:47 UTC 2022

Am 14.10.22 um 15:52 schrieb Tom Talpey:
> On 10/14/2022 9:45 AM, Stefan Metzmacher wrote:
>> Hi Tom,
>>
>>>> It means RCV_SHUTDOWN gets set as well as TCP_CLOSE_WAIT, but
>>>> sk->sk_err is not changed to indicate an error.
>>>
>>> This is correct, because the TCP connection is in "half-closed" state.
>>> The peer has closed, but the outgoing stream is still open. The TCP
>>> protocol has supported this since forever.
>>>
>>> This is not a transitory state. The connection can remain in it forever.
>>> The peer is now in FIN_WAIT_2 and will send no further data. It's
>>> waiting for our FIN, and in turn the local socket is waiting for a
>>> close() call to do so. But pretty much any other socket operation
>>> can still be performed.
>>
>> Thanks for the explanation!
>>
>>>> It means if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)) doesn't
>>>> hit as we only have RCV_SHUTDOWN and sk_stream_wait_memory returns -EAGAIN.
>>>
>>> Probably because the peer has stopped reading the socket. FIN_WAIT_2 is
>>> a super-problematic state, because the only way to exit it is to receive
>>> a FIN or RST, which we're evidently not sending. Most implementations
>>> run a timer as failsafe, but it's always rather long (minutes).
>>
>> Yes, we need 'socket options' with TCP_KEEPCNT, TCP_KEEPIDLE, TCP_KEEPINTVL and/or TCP_USER_TIMEOUT
>> and/or a user space timer in order to have lower timeouts.
> 
> That won't help. The peer is there, and the connection is up.
> The keepalive will succeed! Even if it failed, it's not prompt,
> and reducing the KEEPINTVL is a very bad idea. Servers should not
> be pinging their clients in any event.
> 
> What peer is doing this? Most Windows clients will perform an
> abortive close, but this one is doing it  gracefully. The
> server should deal with either, of course, so I'm mostly just
> curious.

I guess the client is gone or it waits for our FIN,ACK
but it no longers acks the data from our sendqueue, which we most likely try
to send out before sending out FIN,ACK.

But I only have the information from the public mails and I haven't
tried to reproduce it.

 From https://lists.samba.org/archive/samba/2022-September/241873.html:
 > As clients we have some NetAPP-FAS running which doing the auth. via LDAP. On NetApp timeouts for LDAP are set to 3sec per default.
 >
 > Some queries seem to need more time to answer so the client tries to close the connection but the (samba-)server-part leaves the socket open in CLOSE_WAIT.
 >
 > In some of such cases the corresponding process (ldap-worker) runs forever(?) with 100% cpu. A strace shows the ldap-worker pushing some info (the answer?)
 > to the socket. If one let it go the server slows down gradually while more and more connections stay in CLOSE_WAIT.

metze