Samba, NT, and transient network failures

Wed Jan 27 15:30:30 GMT 1999

Ahhhh!

So the SMB protocol does have a keepalive/ping type feature.

Blame for not looking more closely.

I'll check this out, though even setting keepalive to 5 minutes (the
default in Samba 1.9.18pl10 is no SMB keepalives) would still result in
average hang time to be 2.5 minutes. I think there should be no problem
setting it to a shorter amount of time though.

Note that the docs don't specify how much time must pass without a reply
to a keepalive call before the server decides that the given connection
is dead. This doesn't matter if, by the time the smbd sends the
keepalive, network connectivity is restored in time for the NT client
kernel to respond to the server's keepalive with a RST TCP packet,
causing the server to close the socket and smbd to notice right away.
Methinks.

The fact that SMB does provide the server with a way to check on the
client is excellent news however. It would be cool if Samba had a signal
handler for some signal which would cause smbd to check on it's
connected clients and close any apparently dead connections.

Thanks Frank!

Nico

PS: As for AMD, it ought to be multi-threaded. That it isn't is a
    problem for those who use it. ;) :) :)

    Also, one shouldn't export NFS-mounts, wether made by automount or
    not. IMNSHO. :)

On Thu, Jan 28, 1999 at 12:21:43AM +1100, Frank Varnavas wrote:
> Hi Nick,
> 
> In reverse order of interest::
> 
> > We did think so set the SO_KEEPALIVE option in the smb.conf thinking
> > that it might speed up the process of getting old smbd processes to
> > elicit a RST response to TCP keepalives, but, going by W. Richard
> > Stevens' TCP/IP and Unix books, it seems that the default timeout
> values
> > associated with SO_KEEPALIVE are too large to be helpful with this
> > problem.
> 
> You are right on all counts but there is also a 'keepalive =
> time_in_sec' protocol-level keepalive in the smb.conf.  I believe the
> default value is 5 minutes. At one time when the SIGPIPE code was broken
> using this would kill the process so we called it the 'keepdead'
> option.  Now it seems to do what it should.
> 
> > Thanks for the info about AMD. We're still using the standard Solaris
> > automounter, so we're not affected. The Solaris automounter is
> [mostly]
> > multi-threaded, so it doesn't hang just because a mount request is
> > hanging... And only some of us geeks re-export NFS-mounted partitions,
> 
> > from our workstations :)
> 
> The hang I was describing is not in AMD, it's in any process (like smbd)
> that does a getcwd() in an AMD-mounted directory.  Since this covered
> all of the user home dirs that were exported by Samba you can imagine
> the fun and confusion when they would all hang at once because somebody
> who was exporting a home dir from their desktop would power off their
> desktop and go home.
> 
> Good luck,
> Frank V
>