[Samba] Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #2 - "accept: Software caused connection abort".

Thu Dec 12 12:18:44 MST 2013

Volker Lendecke wrote:
> Is the system doing this in a 100% busy loop, or is this happening only under overload?

	Nope - system (server) consistently 98.9% idle.
	Lots of files opened via Samba (typically around the 100-200 mark or so) but not a massive amount of activity with any of them.
	Clients are generally idle as well, as is the network connection between them (typically runs at a few percent utilisation).

> Is smbd just not fast enough accepting connections?

	Nope - not being hit particularly hard so there shouldn't be any issues there.

> Did the client do a RST while waiting for smbd to do the accept()?

	That I can't tell.
	I could try and capture some traffic to see, but it's typically quite difficult to catch this problem when it's occurring, unfortunately.

> What does the system expect from an application as a proper reply to this error message?

	As Richard Sharpe said, not a lot that can be done except log it - the connection has gone away at this point.

	My concerns are:
	1.  It could well be indicative of some actual problem in the code, which it would be beneficial to address.
	2.  It typically causes a very large amount of messages to be logged, so at the very least could these only be logged at a debug level higher than zero?  It's apparently a perfectly recoverable error, so perhaps it doesn't need to be logged all the time.

I'll see whether I can capture some network traffic when the problem is occurring to see whether it is actually a client RST or whether it's a problem (of whatever sort) on the server itself.
Then at least we'll know whether something has triggered the connection closing (that the server didn't expect) or whether it's the server itself being silly about dropping a perfectly valid connection.

Many thanks, and regards,

Tris.

-------- Original Message --------
On Wed, Dec 11, 2013 at 04:36:44PM -0800, Richard Sharpe wrote:
> > # (dig one of the ECONNABORTED messages out; they're all of the same 
> > form) ...
> > 224:    pollsys(0x08088D48, 8, 0xFEFFDF58, 0x00000000) (sleeping...)
> > 224:            fd=39 ev=POLLIN|POLLHUP rev=0
> > 224:            fd=38 ev=POLLIN|POLLHUP rev=0
> > 224:            fd=34 ev=POLLIN|POLLHUP rev=0
> > 224:            fd=36 ev=POLLIN|POLLHUP rev=0
> > 224:            fd=37 ev=POLLIN|POLLHUP rev=0
> > 224:            fd=35 ev=POLLIN|POLLHUP rev=0
> > 224:            fd=33 ev=POLLIN|POLLHUP rev=0
> > 224:            fd=6  ev=POLLIN|POLLHUP rev=0
> > 224:            timeout: 49.964000000 sec
> > 224:    pollsys(0x08088D48, 8, 0xFEFFDF58, 0x00000000)  = 1
> > 224:            fd=39 ev=POLLIN|POLLHUP rev=0
> > 224:            fd=38 ev=POLLIN|POLLHUP rev=0
> > 224:            fd=34 ev=POLLIN|POLLHUP rev=0
> > 224:            fd=36 ev=POLLIN|POLLHUP rev=0
> > 224:            fd=37 ev=POLLIN|POLLHUP rev=POLLIN
> > 224:            fd=35 ev=POLLIN|POLLHUP rev=0
> > 224:            fd=33 ev=POLLIN|POLLHUP rev=0
> > 224:            fd=6  ev=POLLIN|POLLHUP rev=0
> > 224:            timeout: 49.964000000 sec
> > 224:    accept(37, 0xFEFFDE0C, 0xFEFFDDF8, SOV_DEFAULT) Err#130 ECONNABORTED
> > ...
> > #
> >
> > So it's *always* happening on a socket listening on port 445, never 
> > on 139 (or any other port) and it's happening on *every* socket 
> > listening on port
> > 445 (3 interfaces, 3 sockets listening on that port).  That makes it 
> > unlikely to be a resource issue (at client or server end) or it 
> > would be unlikely to be that port specific, so perhaps some protocol corner-case?
> > Or, of course, just Solaris being moronic (wouldn't be unheard of ...)?
> >
> > I don't know whether that helps anyone at all?
> >
> > Cheers, and season's greetings (and all that ...),
> 
> We've seen this on FreeBSD. We increased the listen backlog to be the 
> same as the system max and the messages went away.

Is the system doing this in a 100% busy loop, or is this happening only under overload? Is smbd just not fast enough accepting connections? Did the client do a RST while waiting for smbd to do the accept()?

What does the system expect from an application as a proper reply to this error message?

On FreeBSD, in listen(2) I read about kern.ipc.somaxconn, which on 9.2 seems to default to 128. Is that the one we are supposed to read?

Thanks,

Volker