Stale smbd processes (was: DOS: Clients can freeze other clients smbd)

Mattias.Gronlund Mattias.Gronlund at sa.erisoft.se
Fri Sep 3 22:29:44 GMT 1999


Nicolas Williams wrote:
> 
> On Fri, Sep 03, 1999 at 09:08:04AM +0200, Mattias Gronlund wrote:
> > Nicolas Williams wrote:
> With the 'keepalive' smb.conf parameter. I don't know how well it works
> by itself however.
> 
> NOTE: This is NOT the SO_KEEPALIVE option. You cannot rely on
>       SO_KEEPALIVE achieving what you want.
>

Ok, the keepalive parameters make smbd transmit smb-keepalive
packets at an specified interval. The problem is that smbd uses
blocking recv:s that has to timeout with SO_KEEPALIVE which
for what I understand defaults to two hours
(TCP-illustrated vol 1 Chap 23).
 
> > Is the script tested with multiuser Windows NT systems?
> 
> The script runs on the Samba server. I quoted the URL for my original
> e-mail about this to the list; that e-mail includes the root
> preexec/postexec commands I used and the script in question (I should
> post a newer version of it). Read that. Basically, the script is called
> with a number of arguments (% token substitutions) about the share
> connection and it creates a pid/lock file named after those arguments.
> These pid/lock files are used to detect stale smbd processes; it works
> because the script's arguments identify the session/share connection in
> question accurately.

Oh, I ment if the script was tested on a Unix-box with samba and the
clients where multiuser Windows NT systems. If I understands it right
the same user may have more than one connection to the server from
that type of client.

> > Your analysis sounds right, but I think that because we think that the NT-
> > client is aggressive in reconnecting (we should check the exact timeout) we
> > could just say that it is just hopeless to wait for longer than that in recv.
 
> NT clients reconnect after a 45 second timeout.

Good, then there is never any need to wait for more that say 50 seconds
before timing the server out if expected data hasn't arrived.

This actually meens that we could clean up a lot of timeout-code and
allways use read_socket_with_timeout() insted of read_socket_data().
We should also remove the "blocking" code in read_socket_with_timeout
and force the timeout to no more than 50 seconds.

> Samba needs a way to deal with these stale smbd processes. I'm still not
> exactly clear on what goes on that causes Samba to block on a socket,
> that ought to be dead, waiting for input; I've not spent enough time
> tracing the packets or the smbd processes so my analysis is partly based
> on guessing (I had no idea about the FIN-FIN/ACK bug when I sent my very
> first e-mail about this to the list); it could even be that there's a
> bug in the way the NT clients abandon the old connection (i.e., maybe
> they don't explicitly close it) or maybe there's a bug in NT's TCP/IP
> stack that causes TCP shutdown to not be reliable.

I have investigated it, it has always been receive_smb() calling
read_socket_data() as of 2.0.5a source.

> Whatever the case I do know this: the solution I use works.
> 
> > At the moment I have a fix only in one-place that timeouts after 5 minutes
> > that seems to keep us from getting hurds of smbd:s for a user...
> 
> Yeah. Five minutes works. When I foudn this problem I determined, during
> testing, that with no workarounds the system recovers withing 12-15
> minutes (i.e., the stale smbd processes take that long to figure out
> that they ought to exit).

Your system may have lower keepalive-timer in the TCP-stack I timed our
Solaris 2.5.1-server to take 2 hours to recover.

> We've only got experience with Samba running on Solaris, so the above
> might only apply to Solaris. I wonder what others' experiences on other
> platforms have been.

It looks like there is a missmatch between Solaris and M$...

/Mattias


More information about the samba-technical mailing list