Samba, NT, and transient network failures

Tue Jan 26 13:32:14 GMT 1999

Hi Nick,

Glad to see you made the right decision regarding Samba vs Syntax.

I've seen this problem before.  In extreme cases it can lead to samba
server failure.   The root cause is that the NT redirector has a timeout
of approximately 45 seconds on smb requests.  If the timeout is exceeded
then the redirector will log an event, close the connection, and open a
new connection.  I did not test increasing the timeout value since the
value could not be set on a per-share basis, analogous to hard vs soft
mounts on NFS.

I almost allways saw this error as a result of NFS problems or network
delays causing the smbd to hang for long enough to trigger the timeout.
The most extreme cases occurred when I was using a particular Samba
server as a gateway for AMD mounted homedirs from other machines.  If a
server were to hang the smbd requests on the hard mounts would hang as
well.  These processes could not be terminated.  Smbd processes would
continue to be spawned (and subsequently hung) until the NFS problem was
corrected.  Once the problem was corrected the processes would die by
themselves upon receiving an error sending the reply (or a keepalive) to
the client.

I had one other problem as well with AMD and Solaris that probably does
not affect you.  In this instance if the AMD in use was not compiled to
add device id's to the mount entries in the mnttab the getcwd() calls in
an AMD mounted directory would hang if ANY amd-mounted server hung.

If you are seeing this problem enough to warrant this workaround I think
more debugging is in order to determine why it is happening. AFAIK error
recovery for files on bounced shares is an application responsibility
and not all apps are so well written.

Good luck,
Frank V