[Samba] Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #2 - "accept: Software caused connection abort".

Wed Dec 11 17:36:44 MST 2013

On Wed, Dec 11, 2013 at 12:00 PM, Tris Mabbs
<TM-Samba201302 at firstgrade.co.uk> wrote:
> So I *finally* got around to looking into this in a bit more detail.
>
> For brevity I've chopped off the original message; it was basically that I'm
> getting hundreds (and hundreds, and ...) of "accept: Software caused
> connection abort" messages logged.  I questioned whether perhaps this should
> only be logged at higher debug levels; there was also the suggestion made of
> putting a "sleep (x);" in the error path, but the message does not
> repeatedly come out (the next "pollsys(...)" will typically not error, so
> there aren't repeated consecutive errors from the same socket) so that might
> not be appropriate.
> I said I'd look into it a bit more to see what was common about the errors
> (E.g., always on the same interface? ...).
>
> OK.
>
> So I've now looked into it a bit further, as I had an "smbd" process (PID
> 224) doing this basically all day.  So (summarised a bit):
>
> # truss -f -vall -p 224 >& /var/tmp/smbd.truss &
> [1] 12345
> # (wait quite a while ...)
> # kill %1
> Killed: truss ...
> # grep ECONNAB /var/tmp/smbd.truss | wc -l
>      141
> # (checked on which descriptors this was happening; always 33, 35 or 37)
> # grep ECONNAB /var/tmp/smbd.truss | grep -v '(3[357],' |wc -l
>     0
> # pfiles 224
> ...
>   33: S_IFSOCK mode:0666 dev:557,0 ino:56426 uid:0 gid:0 size:0
>       O_RDWR|O_NONBLOCK
>         SOCK_STREAM
>         SO_REUSEADDR,SO_KEEPALIVE,SO_SNDBUF(32768),SO_RCVBUF(33232)
>         sockname: AF_INET X.X.X.X  port: 445
>         congestion control: newreno
> ...
>   35: S_IFSOCK mode:0666 dev:557,0 ino:58053 uid:0 gid:0 size:0
>       O_RDWR|O_NONBLOCK
>         SOCK_STREAM
>         SO_REUSEADDR,SO_KEEPALIVE,SO_SNDBUF(32768),SO_RCVBUF(33232)
>         sockname: AF_INET X.X.X.Y  port: 445
>         congestion control: newreno
> ...
>   37: S_IFSOCK mode:0666 dev:557,0 ino:34369 uid:0 gid:0 size:0
>       O_RDWR|O_NONBLOCK
>         SOCK_STREAM
>         SO_REUSEADDR,SO_KEEPALIVE,SO_SNDBUF(32768),SO_RCVBUF(33232)
>         sockname: AF_INET X.X.X.Z  port: 445
>         congestion control: newreno
> ...
> # (dig one of the ECONNABORTED messages out; they're all of the same form)
> ...
> 224:    pollsys(0x08088D48, 8, 0xFEFFDF58, 0x00000000) (sleeping...)
> 224:            fd=39 ev=POLLIN|POLLHUP rev=0
> 224:            fd=38 ev=POLLIN|POLLHUP rev=0
> 224:            fd=34 ev=POLLIN|POLLHUP rev=0
> 224:            fd=36 ev=POLLIN|POLLHUP rev=0
> 224:            fd=37 ev=POLLIN|POLLHUP rev=0
> 224:            fd=35 ev=POLLIN|POLLHUP rev=0
> 224:            fd=33 ev=POLLIN|POLLHUP rev=0
> 224:            fd=6  ev=POLLIN|POLLHUP rev=0
> 224:            timeout: 49.964000000 sec
> 224:    pollsys(0x08088D48, 8, 0xFEFFDF58, 0x00000000)  = 1
> 224:            fd=39 ev=POLLIN|POLLHUP rev=0
> 224:            fd=38 ev=POLLIN|POLLHUP rev=0
> 224:            fd=34 ev=POLLIN|POLLHUP rev=0
> 224:            fd=36 ev=POLLIN|POLLHUP rev=0
> 224:            fd=37 ev=POLLIN|POLLHUP rev=POLLIN
> 224:            fd=35 ev=POLLIN|POLLHUP rev=0
> 224:            fd=33 ev=POLLIN|POLLHUP rev=0
> 224:            fd=6  ev=POLLIN|POLLHUP rev=0
> 224:            timeout: 49.964000000 sec
> 224:    accept(37, 0xFEFFDE0C, 0xFEFFDDF8, SOV_DEFAULT) Err#130 ECONNABORTED
> ...
> #
>
> So it's *always* happening on a socket listening on port 445, never on 139
> (or any other port) and it's happening on *every* socket listening on port
> 445 (3 interfaces, 3 sockets listening on that port).  That makes it
> unlikely to be a resource issue (at client or server end) or it would be
> unlikely to be that port specific, so perhaps some protocol corner-case?
> Or, of course, just Solaris being moronic (wouldn't be unheard of ...)?
>
> I don't know whether that helps anyone at all?
>
> Cheers, and season's greetings (and all that ...),

We've seen this on FreeBSD. We increased the listen backlog to be the
same as the system max and the messages went away.

-- 
Regards,
Richard Sharpe
(何以解憂？唯有杜康。--曹操)