[Samba] Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #2 - "accept: Software caused connection abort".

Tris Mabbs TM-Samba201302 at Firstgrade.Co.UK
Tue Aug 13 05:54:29 MDT 2013


                Caught it in the act!  As predicted, almost immediately
after sending my original message (!) .

 

                However unfortunately, it's not particularly useful .

 

...

16327:     write(7, "\0", 1)                    = 1

16327:     setcontext(0xFEFFD690)

16327:     waitid(P_ALL, 0, 0xFEFFDD70, WEXITED|WTRAPPED|WNOHANG) = 0

16327:           siginfo: SIGCLD CLD_EXITED pid=26937 status=0x0000

16327:     waitid(P_ALL, 0, 0xFEFFDD70, WEXITED|WTRAPPED|WNOHANG) = 0

16327:           siginfo: SIG#0

16327:     pollsys(0x0809B4D0, 8, 0xFEFFDF18, 0x00000000)  = 1

16327:          fd=39 ev=POLLIN|POLLHUP rev=0

16327:          fd=38 ev=POLLIN|POLLHUP rev=0

16327:          fd=34 ev=POLLIN|POLLHUP rev=0

16327:          fd=36 ev=POLLIN|POLLHUP rev=0

16327:          fd=37 ev=POLLIN|POLLHUP rev=0

16327:          fd=35 ev=POLLIN|POLLHUP rev=0

16327:          fd=33 ev=POLLIN|POLLHUP rev=0

16327:          fd=6  ev=POLLIN|POLLHUP rev=POLLIN

16327:          timeout: 59.999000000 sec

16327:     read(6, "\0", 16)                    = 1

16327:     pollsys(0x0809B4D0, 8, 0xFEFFDF18, 0x00000000) (sleeping...)

16327:          fd=39 ev=POLLIN|POLLHUP rev=0

16327:          fd=38 ev=POLLIN|POLLHUP rev=0

16327:          fd=34 ev=POLLIN|POLLHUP rev=0

16327:          fd=36 ev=POLLIN|POLLHUP rev=0

16327:          fd=37 ev=POLLIN|POLLHUP rev=0

16327:          fd=35 ev=POLLIN|POLLHUP rev=0

16327:          fd=33 ev=POLLIN|POLLHUP rev=0

16327:          fd=6  ev=POLLIN|POLLHUP rev=0

16327:          timeout: 59.999000000 sec

16327:     pollsys(0x0809B4D0, 8, 0xFEFFDF18, 0x00000000)  = 1

16327:          fd=39 ev=POLLIN|POLLHUP rev=0

16327:          fd=38 ev=POLLIN|POLLHUP rev=0

16327:          fd=34 ev=POLLIN|POLLHUP rev=0

16327:          fd=36 ev=POLLIN|POLLHUP rev=0

16327:          fd=37 ev=POLLIN|POLLHUP rev=POLLIN

16327:          fd=35 ev=POLLIN|POLLHUP rev=0

16327:          fd=33 ev=POLLIN|POLLHUP rev=0

16327:          fd=6  ev=POLLIN|POLLHUP rev=0

16327:          timeout: 59.999000000 sec

16327:     accept(37, 0xFEFFDDCC, 0xFEFFDDB8, SOV_DEFAULT) = 41

16327:          AF_INET  name = X.X.X.X  port = 28986

16327:     forkx(0)                        = 26942

16327:     lwp_sigmask(SIG_SETMASK, 0x00011080, 0x00000000, 0x00000000,
0x00000000) = 0xFFBFFEFF [0xFFFFFFFF]

16327:     close(41)                       = 0

16327:     pollsys(0x0809B4D0, 8, 0xFEFFDF18, 0x00000000)  = 1

16327:          fd=39 ev=POLLIN|POLLHUP rev=0

16327:          fd=38 ev=POLLIN|POLLHUP rev=0

16327:          fd=34 ev=POLLIN|POLLHUP rev=0

16327:          fd=36 ev=POLLIN|POLLHUP rev=0

16327:          fd=35 ev=POLLIN|POLLHUP rev=POLLIN

16327:          fd=33 ev=POLLIN|POLLHUP rev=0

16327:          fd=6  ev=POLLIN|POLLHUP rev=0

16327:          fd=37 ev=POLLIN|POLLHUP rev=0

16327:          timeout: 44.696000000 sec

16327:     accept(35, 0xFEFFDDCC, 0xFEFFDDB8, SOV_DEFAULT) Err#130
ECONNABORTED

16327:     fstat(40, 0xFEFFD1A0)                = 0

16327:         d=0x08880000 i=22079 m=0020000 l=0  u=0     g=0
rdev=0x02480248

16327:          at = Aug 12 17:30:31 BST 2013  [ 1376325031.000000000 ]

16327:          mt = Aug 12 17:30:31 BST 2013  [ 1376325031.000000000 ]

16327:          ct = Aug 12 17:30:31 BST 2013  [ 1376325031.000000000 ]

16327:         bsz=8192  blks=0     fs=BADVFS

16327:     time()                               = 1376345330

16327:     getpid()                        = 16327 [1]

16327:     putmsg(40, 0xFEFFDB80, 0xFEFFDB90, 0)      = 0

16327:          ctl:  maxlen=24   len=24   buf=0xFEFFD230: "98 0
tFE\0\010\0"..

16327:       98 0 tFE\0\010\0 XD2FFFE \CF iFE HFA\b\b1B\0\0\0

16327:          dat:  maxlen=1280 len=151  buf=0xFEFFD250: " A u g   1 2
2"..

16327:        A u g   1 2   2 3 : 0 8 : 5 0   s m b d [ 1 6 3 2 7 ] :   [ I
D

16327:          7 0 2 9 1 1   F A C I L I T Y _ A N D _ P R I O R I T Y ]
[

16327:        2 0 1 3 / 0 8 / 1 2   2 3 : 0 8 : 5 0 . 0 4 3 5 8 2 ,     0 ]


16327:        . . / s o u r c e 3 / s m b d / s e r v e r . c : 5 5 6 ( s m
b

16327:        d _ a c c e p t _ c o n n e c t i o n )\n\n\0

...

 

                However hopefully that might be of use to someone -
unfortunately it does just look like a Solaris specific nasty.  There should
be no reason for the abort; everything is directly connected on uncongested
gigabit Ethernet, so it's unlikely to be TCP timeout related and I can't see
a client aborting the connection before even it's established (that might
happen if the clients, or possibly the server were desperately short of IP
resources but that isn't going to be the case either).  It doesn't appear to
be related to the number of connections either, so I can't immediately think
of any sensible reason for the connection to abort.

 

                What might also be interesting to investigate (though
completely unrelated) is why there's an "fstat(40, 0xFEFFD1A0)" when writing
the message - descriptor 40 is the socket for the "syslog()" connection, so
performing an "fstat()" on it possibly isn't particularly useful (except
mebbee to obtain the block size, but that's probably not going to be
relevant on a socket anyway) .  That's doubtless why the stat structure
contains "fs=BADVFS".  Not any sort of problem, but curious minds wonder why
the code is bothering to "fstat()" the logging socket .

 

                Cheers,

 

                Tris.

 

From: Tris Mabbs [mailto:TM-Samba201302 at Firstgrade.Co.UK] 
Sent: 12 August 2013 23:08
To: 'samba at lists.samba.org'
Subject: Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client)
behaviour #2 - "accept: Software caused connection abort".

 

                Good day oh technical ones .

 

                I was running Samba 4 (client only, not using it as a DC so
effectively running Samba 3 code from the Samba 4 tree) and, other than a
little "Gotcha!" regarding decoding Kerberos PACs, it was all working
perfectly.

                Then recently I had to upgrade, to "4.2.0pre1-GIT-b505111"
(I had to upgrade the OS on the server running Samba - 'twas "OpenSolaris"
and is now "Solaris 11.1") so I recompiled it all up and installed afresh
(so no ".tdb"s from the previous installation or anything).

 

                But here's a funny thing (#2).  The log file gets absolutely
ridiculous numbers of messages thus:

 

Aug 12 22:45:01 Gateway smbd[16327]: [ID 702911 daemon.error] [2013/08/12
22:45:01.731562,  0] ../source3/smbd/server.c:556(smbd_accept_connection)

Aug 12 22:45:01 Gateway smbd[16327]: [ID 702911 daemon.error]   accept:
Software caused connection abort

Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] [2013/08/12
22:45:03.556423,  0] ../source3/smbd/server.c:556(smbd_accept_connection)

Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error]   accept:
Software caused connection abort

Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error] [2013/08/12
22:45:03.556688,  0] ../source3/smbd/server.c:556(smbd_accept_connection)

Aug 12 22:45:03 Gateway smbd[16327]: [ID 702911 daemon.error]   accept:
Software caused connection abort

 

                And so on.  These will come in spurts; there won't be any
such messages for several minutes then a whole load will come along all at
once.  Rather like busses .

 

                It doesn't seem to be affecting the operation so far as any
client is concerned.  Or rather it evidently will be having some effect, but
it's not a noticeable one.

 

                However it is really irritating having the system log
getting filled up with all these messages!

 

                Murphy's law, of course, states that trying to catch one of
these messages being created, so I can include a suitable system call trace
in this message, will be impossible - there will be no such messages logged
until the instant I click "Send" (at which point probably about half-a-dozen
will be logged all at once).  That does indeed seem to be the case - I've
now been trying to persuade one of these, normally very regularly occurring,
messages to be logged for about 20 minutes and still, stubbornly, nothing
continues to happen.

                I will catch "smbd" in the act at some point though, and
when I do I'll follow-up with a system call trace to show exactly what is
happening when this message gets triggered.  It will, of course, be
something bizarrely Solaris specific (you didn't set the
"SO_DONT_RANDOMLY_ABORT_CONNECTIONS" socket() option, did you?  Tsk tsk tsk
.).

 

                Cheers,

 

                Tris.



More information about the samba mailing list