Stale smbd processes (was: DOS: Clients can freeze other clients smbd)

Mattias Gronlund Mattias.Gronlund at sa.erisoft.se
Fri Sep 3 07:08:04 GMT 1999


Nicolas Williams wrote:

> Basically, NT SMB clients are very aggressive in reconnecting to SMB
> servers in the face of network timeouts all the while smbd may fail to
> notice that the TCP socket in question is dead.

Yes, we ues a broken 3com-network ATM/ELAN network with repetable
packetlosts...

> This shouldn't happen as TCP should recover from any loss of FIN or
> FIN/ACK packets. However I remember a Solaris bug where Solaris would
> respond with two packets (ACK, then FIN/ACK) to FIN packets that have
> more than 0 bytes of data; though this is technically legal behaviour,
> some stacks respond by resending the FIN + data packet (as opposed to
> just a FIN), thus entering an infinite retransmission loop and the
> socket never closes.

Ok, we use SUN too, but I think that we have a patch that patches
this very feature...

> Meantime, the NT clients that reconnect cause a new smbd process to be
> spawned for them. When the reconnecting client attempts to obtain locks
> it already had, smbd blocks while waiting for the stale smbd to give up
> the locks in question, and since the stale smbd is waiting for the
> client (which considers the connection closed) to respond to oplock
> break requests (or what have you) you get a DEADLOCK.
>
> There's two ways around this problem:
>
>  - set the 'keepalive' parameter to a sufficiently small value
>
>  - run a script via 'root preexec' and 'root postexec' to check for and
>    kill smbd process[es] that were serving the same client.

But how do you set the keepalive-parameter per connection?

Is the script tested with multiuser Windows NT systems?

> I use both approaches (they are not mutually exclusive).

>
> My analysis of the problem may not be correct or very accurate and I'm
> writing from memory here, but the solution works fine for me.

Your analysis sounds right, but I think that because we think that the NT-
client is aggressive in reconnecting (we should check the exact timeout) we
could just say that it is just hopeless to wait for longer than that in recv.

If the PC do not alvays reconnect after a short timeout, we should either
process oplock-requests when waiting and give up if someone needs one
of our locks. But I think we will find a timeout and not have to do any
more advanced oplock-handling...

At the moment I have a fix only in one-place that timeouts after 5 minutes
that seems to keep us from getting hurds of smbd:s for a user...

/Mattias



More information about the samba-technical mailing list