[Samba] Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #2 - "accept: Software caused connection abort".

Jeremy Allison jra at samba.org
Thu Aug 29 13:52:19 MDT 2013


On Thu, Aug 29, 2013 at 10:10:38AM +0100, Tris Mabbs wrote:
> Hiya Andrew,
> 
> Many thanks for the typically helpful and comprehensive reply :-)
> 
> > I think that's probably the right track :-)
> >
> > The code here is triggered when poll() indicates that the socket is
> readable.
> > This socket should only be readable when a new connection is being made,
> and accept() should succeed.
> > ...
> > So, my only conclusion is that your box momentarily does not have the
> resources to accept the connection,
> > and because there isn't the sleep() in the source3 code, it prints this in
> a loop until the resources become available.
> 
> Absolutely, and on any normal Unix implementation I'd agree entirely.  That
> sort of "poll()"/"accept()"/... code is perfectly normal and exactly what
> you'd expect - I've written plenty of very similar code myself over the
> years ...
> However this is "Solaris" :-(
> 
> Caught in the act:
> 
> ...
> 16327:     pollsys(0x0809B4D0, 8, 0xFEFFDF18, 0x00000000)  = 1
> 16327:          fd=39 ev=POLLIN|POLLHUP rev=0
> 16327:          fd=38 ev=POLLIN|POLLHUP rev=0
> 16327:          fd=34 ev=POLLIN|POLLHUP rev=0
> 16327:          fd=36 ev=POLLIN|POLLHUP rev=0
> 16327:          fd=37 ev=POLLIN|POLLHUP rev=POLLIN
> 16327:          fd=35 ev=POLLIN|POLLHUP rev=0
> 16327:          fd=33 ev=POLLIN|POLLHUP rev=0
> 16327:          fd=6  ev=POLLIN|POLLHUP rev=0
> 16327:          timeout: 59.999000000 sec
> 16327:     accept(37, 0xFEFFDDCC, 0xFEFFDDB8, SOV_DEFAULT) = 41
> 16327:          AF_INET  name = X.X.X.X  port = 28986
> 16327:     forkx(0)                        = 26942
> 16327:     lwp_sigmask(SIG_SETMASK, 0x00011080, 0x00000000, 0x00000000,
> 0x00000000) = 0xFFBFFEFF [0xFFFFFFFF]
> 16327:     close(41)                       = 0
> 16327:     pollsys(0x0809B4D0, 8, 0xFEFFDF18, 0x00000000)  = 1
> 16327:          fd=39 ev=POLLIN|POLLHUP rev=0
> 16327:          fd=38 ev=POLLIN|POLLHUP rev=0
> 16327:          fd=34 ev=POLLIN|POLLHUP rev=0
> 16327:          fd=36 ev=POLLIN|POLLHUP rev=0
> 16327:          fd=35 ev=POLLIN|POLLHUP rev=POLLIN
> 16327:          fd=33 ev=POLLIN|POLLHUP rev=0
> 16327:          fd=6  ev=POLLIN|POLLHUP rev=0
> 16327:          fd=37 ev=POLLIN|POLLHUP rev=0
> 16327:          timeout: 44.696000000 sec
> 16327:     accept(35, 0xFEFFDDCC, 0xFEFFDDB8, SOV_DEFAULT) Err#130
> ECONNABORTED
> ...
> 
> So there's nothing odd about the "poll()".  Typically Solaris will flag
> POLLERR in "revents" if it's out of resources, and POLLHUP if the remote end
> closed the connection before it was fully established (remote NAKed, or
> ignored, the connection SYN; terminally low on resources at t'other end of
> the socket; ...).  Neither is happening here which would suggest things are
> proceeding as normal for the connection establishment.
> 
> The server darn' well shouldn't be out of any resources either.  In terms of
> physical resources, at the point that occurred the CPUs were at 99.9% idle,
> there was 15Gb of free RAM (so not out of kernel memory then ...) and only a
> total of about 400 sockets (TCP, Unix, ...) in use across the entire system,
> as reported by "netstat -na | wc -l" - well below peak levels seen on this
> system.
> 
> So it's going to be that hypothetical Solaris specific
> SO_DONT_RANDOMLY_ABORT_CONNECTIONS socket() option, isn't it :-)
> 
> So could I request please, that in the source3 code, either:
> 	a. The same "sleep()" is added as in the source4 code; -and/or-
> 	b. If errno == ECONNABORTED then only log the error if the debug
> level is (substantially?) higher than zero.

So your problem is the debug statement being triggered repeatedly ?

Adding a sleep is (IMHO) the wrong thing to do. Once the accept()
has failed the 'POLLIN' event should not be triggered repeatedly
on the polled socket. Your truss trace doesn't show enough. Does
a subsequent pollsys() keep returning fd=35 ev=POLLIN|POLLHUP rev=POLLIN
after the:

 accept(35, 0xFEFFDDCC, 0xFEFFDDB8, SOV_DEFAULT) Err#130 > ECONNABORTED

?

Jeremy.


More information about the samba-technical mailing list