[Samba] Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #2 - "accept: Software caused connection abort".

Thu Dec 12 08:47:18 MST 2013

On Thu, Dec 12, 2013 at 4:11 AM, Volker Lendecke
<Volker.Lendecke at sernet.de> wrote:
> On Wed, Dec 11, 2013 at 04:36:44PM -0800, Richard Sharpe wrote:
>> > # (dig one of the ECONNABORTED messages out; they're all of the same form)
>> > ...
>> > 224:    pollsys(0x08088D48, 8, 0xFEFFDF58, 0x00000000) (sleeping...)
>> > 224:            fd=39 ev=POLLIN|POLLHUP rev=0
>> > 224:            fd=38 ev=POLLIN|POLLHUP rev=0
>> > 224:            fd=34 ev=POLLIN|POLLHUP rev=0
>> > 224:            fd=36 ev=POLLIN|POLLHUP rev=0
>> > 224:            fd=37 ev=POLLIN|POLLHUP rev=0
>> > 224:            fd=35 ev=POLLIN|POLLHUP rev=0
>> > 224:            fd=33 ev=POLLIN|POLLHUP rev=0
>> > 224:            fd=6  ev=POLLIN|POLLHUP rev=0
>> > 224:            timeout: 49.964000000 sec
>> > 224:    pollsys(0x08088D48, 8, 0xFEFFDF58, 0x00000000)  = 1
>> > 224:            fd=39 ev=POLLIN|POLLHUP rev=0
>> > 224:            fd=38 ev=POLLIN|POLLHUP rev=0
>> > 224:            fd=34 ev=POLLIN|POLLHUP rev=0
>> > 224:            fd=36 ev=POLLIN|POLLHUP rev=0
>> > 224:            fd=37 ev=POLLIN|POLLHUP rev=POLLIN
>> > 224:            fd=35 ev=POLLIN|POLLHUP rev=0
>> > 224:            fd=33 ev=POLLIN|POLLHUP rev=0
>> > 224:            fd=6  ev=POLLIN|POLLHUP rev=0
>> > 224:            timeout: 49.964000000 sec
>> > 224:    accept(37, 0xFEFFDE0C, 0xFEFFDDF8, SOV_DEFAULT) Err#130 ECONNABORTED
>> > ...
>> > #
>> >
>> > So it's *always* happening on a socket listening on port 445, never on 139
>> > (or any other port) and it's happening on *every* socket listening on port
>> > 445 (3 interfaces, 3 sockets listening on that port).  That makes it
>> > unlikely to be a resource issue (at client or server end) or it would be
>> > unlikely to be that port specific, so perhaps some protocol corner-case?
>> > Or, of course, just Solaris being moronic (wouldn't be unheard of ...)?
>> >
>> > I don't know whether that helps anyone at all?
>> >
>> > Cheers, and season's greetings (and all that ...),
>>
>> We've seen this on FreeBSD. We increased the listen backlog to be the
>> same as the system max and the messages went away.
>
> Is the system doing this in a 100% busy loop, or is this
> happening only under overload? Is smbd just not fast enough
> accepting connections? Did the client do a RST while waiting
> for smbd to do the accept()?

>From browsing the code we think it might be a slight bug in the
FreeBSD TCP code, in that it acked the syns up to 128, but then
sometime when processing listen discovered that the number of
outstanding connections was too large and dropped the difference
between the application backlog and the system max or something like
that.

> What does the system expect from an application as a proper
> reply to this error message?

As far as I know there is nothing you can do at that point because the
socket is gone. So the current informational message is about the best
that can be done.

> On FreeBSD, in listen(2) I read about kern.ipc.somaxconn,
> which on 9.2 seems to default to 128. Is that the one we are
> supposed to read?

Yes, I think so. We just hard-coded it to 128 because we did not think
we were going to change that value.

-- 
Regards,
Richard Sharpe
(何以解憂？唯有杜康。--曹操)