Stale smbd processes (was: DOS: Clients can freeze other clients smbd)

Nicolas Williams Nicolas.Williams at
Tue Sep 7 12:21:50 GMT 1999

On Sat, Sep 04, 1999 at 12:29:44AM +0200, Mattias.Gronlund wrote:
> Nicolas Williams wrote:
> > 
> > On Fri, Sep 03, 1999 at 09:08:04AM +0200, Mattias Gronlund wrote:
> > > Nicolas Williams wrote:
> > With the 'keepalive' smb.conf parameter. I don't know how well it works
> > by itself however.
> > 
> > NOTE: This is NOT the SO_KEEPALIVE option. You cannot rely on
> >       SO_KEEPALIVE achieving what you want.
> >
> Ok, the keepalive parameters make smbd transmit smb-keepalive
> packets at an specified interval. The problem is that smbd uses
> blocking recv:s that has to timeout with SO_KEEPALIVE which
> for what I understand defaults to two hours
> (TCP-illustrated vol 1 Chap 23).

We're using Solaris 2.6 server.

> > > Is the script tested with multiuser Windows NT systems?
> > 


> Oh, I ment if the script was tested on a Unix-box with samba and the
> clients where multiuser Windows NT systems. If I understands it right
> the same user may have more than one connection to the server from
> that type of client.

The key is in the parameters passed to the script. Here's my root

root preexec  = /somepath/samba/libexec/chkStaleSession preexec  %d %I %h %S %U
root postexec = /somepath/samba/libexec/chkStaleSession postexec %d %I %h %S %U

chkStaleSession requires the following argument:

$1 - action (preexec or postexec)
$2 - smbd PID
$* - tokens which altogether identify a share connection

I'm assuming that NT won't try more than one share connection with all
of those parameters. I.e., if I've got a share mapped to, say, I: and
then map the same share as the same user to, say, J:, then I assume that
NT won't open a new share connection.

If this assumption is wrong then you could end up with a situation were
smbd keeps getting itself killed when it runs chkStaleSession. Still,
this can be avoided in chkStaleSession.

The chkStaleSession basically creates/deletes PID/lock files named by
its concatenated arguments (skipping $1 and $2). This is what
chkStaleSession does: if the given lock file already exists and it
contains a PID and its the PID for a different smbd and it's still
running, then stomp that stale smbd, otherwise create/overwrite the lock
file and store given PID in it.


> > NT clients reconnect after a 45 second timeout.
> Good, then there is never any need to wait for more that say 50 seconds
> before timing the server out if expected data hasn't arrived.
> This actually meens that we could clean up a lot of timeout-code and
> allways use read_socket_with_timeout() insted of read_socket_data().
> We should also remove the "blocking" code in read_socket_with_timeout
> and force the timeout to no more than 50 seconds.

Good idea.

> > Samba needs a way to deal with these stale smbd processes. I'm still not
> > exactly clear on what goes on that causes Samba to block on a socket,
> > that ought to be dead, waiting for input; I've not spent enough time
> > tracing the packets or the smbd processes so my analysis is partly based
> > on guessing (I had no idea about the FIN-FIN/ACK bug when I sent my very
> > first e-mail about this to the list); it could even be that there's a
> > bug in the way the NT clients abandon the old connection (i.e., maybe
> > they don't explicitly close it) or maybe there's a bug in NT's TCP/IP
> > stack that causes TCP shutdown to not be reliable.
> I have investigated it, it has always been receive_smb() calling
> read_socket_data() as of 2.0.5a source.

This doesn't tell us what goes wrong. I'm rather busy, but if I can I
may setup a test and sniff the wire, see what's wrong...

Once I figured out what was wrong as far as Samba was concerned it was
not hard to come up with a workaround. Once I had a workaround I lost
any curiosity about what was causing the problem in the first place. Now
I'm curious again.

Technically TCP ought to recover connection shutdowns from short term
packet loss network conditions. What's happening indicates that this is
not happening. This is why the other day I theorized that the
FIN-FIN/ACK bug may be to blame, but it could be other things. Whatever
it is we ought to find it and get Sun and/or Microsoft to fix it.

> Your system may have lower keepalive-timer in the TCP-stack I timed our
> Solaris 2.5.1-server to take 2 hours to recover.

Like I said, we're using Solaris 2.6. I know it's got many TCP/IP stack
changes with respect to 2.5.1 (like the routing code; 2.6 implements
VLSM/CIDR, for example, whereas 2.5.1 does not).

> > We've only got experience with Samba running on Solaris, so the above
> > might only apply to Solaris. I wonder what others' experiences on other
> > platforms have been.

> It looks like there is a missmatch between Solaris and M$...

I guess so, at least until someone tells us it happens under some other
server OS as well...

It's easy to test folks, connect to some [non-Solaris] Samba server from
an NT workstation, open a file on that share with some editor, yank
either the client's or the server's network connection, try to save,
wait 1 minute, put the network back, contionue trying to save the file.

If after you put the network back your editor hangs for long, then you
have the same problem we're talking about in this thread.

> /Mattias

Nicolas Williams	(x5220, Stamford, CT)
Stamford SysAdmin

This message contains confidential information and is intended only 
for the individual named.  If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail.  Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system.

E-mail transmission cannot be guaranteed to be secure or error-free 
as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses.  The sender therefore 
does not accept liability for any errors or omissions in the contents 
of this message which arise as a result of e-mail transmission.  If 
verification is required please request a hard-copy version.  This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities or 
related financial instruments.

More information about the samba-technical mailing list