How to detect a client-closed connection during a write from our LDAP server?
Tom Talpey
tom at talpey.com
Fri Oct 14 13:20:26 UTC 2022
On 10/13/2022 11:23 AM, Stefan Metzmacher via samba-technical wrote:
> Hi Andrew,
>
>>> Ah, we register tstream_bsd_fde_handler() with
>>> TEVENT_FD_READ|TEVENT_FD_WRITE
>>> and have this logic:
>>>
>>> static void tstream_bsd_fde_handler(struct tevent_context *ev,
>>> struct tevent_fd *fde,
>>> uint16_t flags,
>>> void *private_data)
>>> {
>>> struct tstream_bsd *bsds =
>>> talloc_get_type_abort(private_data,
>>> struct tstream_bsd);
>>>
>>> if (flags & TEVENT_FD_WRITE) {
>>> bsds->writeable_handler(bsds->writeable_private);
>>> return;
>>> }
>>> if (flags & TEVENT_FD_READ) {
>>> if (!bsds->readable_handler) {
>>> if (bsds->writeable_handler) {
>>>
>>> =============> here we have the loop
>>>
>>
>> Awesome debugging!
>
>
> I created https://bugzilla.samba.org/show_bug.cgi?id=15202 for the problem.
>
> And I understand what happens in the kernel...
>
> tcp_fin() has this:
> struct tcp_sock *tp = tcp_sk(sk);
>
> inet_csk_schedule_ack(sk);
>
> sk->sk_shutdown |= RCV_SHUTDOWN;
> sock_set_flag(sk, SOCK_DONE);
>
> switch (sk->sk_state) {
> case TCP_SYN_RECV:
> case TCP_ESTABLISHED:
> /* Move to CLOSE_WAIT */
> tcp_set_state(sk, TCP_CLOSE_WAIT);
> inet_csk_enter_pingpong_mode(sk);
> break;
>
>
> It means RCV_SHUTDOWN gets set as well as TCP_CLOSE_WAIT, but
> sk->sk_err is not changed to indicate an error.
This is correct, because the TCP connection is in "half-closed" state.
The peer has closed, but the outgoing stream is still open. The TCP
protocol has supported this since forever.
This is not a transitory state. The connection can remain in it forever.
The peer is now in FIN_WAIT_2 and will send no further data. It's
waiting for our FIN, and in turn the local socket is waiting for a
close() call to do so. But pretty much any other socket operation
can still be performed.
> tcp_sendmsg_locked has this:
> ...
> err = -EPIPE;
> if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
> goto do_error;
>
> while (msg_data_left(msg)) {
> int copy = 0;
>
> skb = tcp_write_queue_tail(sk);
> if (skb)
> copy = size_goal - skb->len;
>
> if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
> bool first_skb;
>
> new_segment:
> if (!sk_stream_memory_free(sk))
> goto wait_for_space;
>
> ...
>
> wait_for_space:
> set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
> if (copied)
> tcp_push(sk, flags & ~MSG_MORE, mss_now,
> TCP_NAGLE_PUSH, size_goal);
>
> err = sk_stream_wait_memory(sk, &timeo);
> if (err != 0)
> goto do_error;
>
> It means if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)) doesn't
> hit as we only have RCV_SHUTDOWN and sk_stream_wait_memory returns -EAGAIN.
Probably because the peer has stopped reading the socket. FIN_WAIT_2 is
a super-problematic state, because the only way to exit it is to receive
a FIN or RST, which we're evidently not sending. Most implementations
run a timer as failsafe, but it's always rather long (minutes).
> And tcp_poll has this:
>
> if (sk->sk_shutdown & RCV_SHUTDOWN)
> mask |= EPOLLIN | EPOLLRDNORM | EPOLLRDHUP;
>
> So we'll get EPOLLIN | EPOLLRDNORM | EPOLLRDHUP triggering TEVENT_FD_READ
> and writev/sendmsg keep getting EAGAIN.
>
> metze
I think the code needs to detect the half-close and give up. It's not
going to happen promptly any other way.
I may have missed some other message - is a fix proposed?
Tom.
>
>
>>> bsds->writeable_handler(bsds-
>>>> writeable_private);
>>> return;
>>> }
>>> TEVENT_FD_NOT_READABLE(bsds->fde);
>>> return;
>>> }
>>> bsds->readable_handler(bsds->readable_private);
>>> return;
>>> }
>>> }
>>>
>>> We call the writeable handler (which is non-blocking) when we get
>>> TEVENT_FD_READ
>>> because we assume it will get an error if TEVENT_FD_READ was
>>> generated by
>>> EPOLLERR. I think moving TEVENT_FD_NOT_READABLE(bsds->fde); before
>>> if (bsds->writeable_handler) would make sure we only try that path
>>> once
>>> and not many times in a busy loop.
>>
>> How would we then, in this case, detect that the socket is closed?
>>
>> While we would start processing other connections, wouldn't we change
>> to a not-busy wait never closing the socket as there is still data to
>> send but never getting to read the EOF?
>>
>> How can we proceed to closing the socket?
>>
>>> And in ldapsrv_call_writev_start() we may want to use
>>> tevent_req_set_endtime() on
>>> the subreq of tstream_writev_queue_send(), so that
>>> ldapsrv_call_writev_done()
>>> will get ETIMEDOUT from tstream_writev_queue_recv() and call
>>> ldapsrv_terminate_connection().
>>
>> Do we have a good precedent for what the send timeout should be?
>>
>> I can't see any good examples elsewhere in the code sadly.
>>
>> One option is to reuse the search timeout, so that a query could at
>> most take 240sec, eg 2x 120sec, once for the search itself and once to
>> send over the network. This would give only one knob to change (which
>> is good and bad).
>>
>> In the meantime I'm going to try and mock this up in a self-contained
>> cmocka test.
>>
>> Andrew Bartlett
>
>
>
More information about the samba-technical
mailing list