[Samba] Occasional problem with hanging SMB mounts.

Wed Jul 3 13:03:02 GMT 2002

On Wed, 3 Jul 2002, Kris Kelley wrote:

> Every so often, one of the servers, LINUX-ONE, logs a couple of
> Samba-related errors.  The following is an example:
> 
>    Jun 30 06:30:45 mx-two kernel: smb_trans2_request: result=-104,
>       setting invalid
>    Jun 30 06:30:45 mx-two kernel: smb_retry: successful, new pid=553,
>       generation=25

Those are not really errors ...

The first is saying that it detected that the tcp connection to the server
was gone when trying to send. This is normal, smb servers like to do that.
The second message is saying that smbmount reconnected to the server and 
everything is ok.

> issues.  Sometimes when one of these "disconnects" occurs, the
> connection isn't always reestablished when it should be.  When this
> happens, the mount hangs, and all processes trying to access the mount
> are blocked, resulting in a high load average.  Anywhere from two to
> fifteen minutes after I've discovered the problem, it clears up on its
> own, and I see messages like these in the syslog:

>    Jul  1 13:52:57 mx-two kernel: smb_get_length: recv error = 110

-110 is "Connection timed out" (/usr/include/asm/errno.h). Which is 
interesting, I don't recall having seen that from anyone. But I forget.

The current smbfs version is completely single threaded on one mount and
while one process is sending (and receiving) no one else can do anything.
This is old code from 2.1.something (or 2.0?) when all of the kernel was
like that.

What has probably happened is that one request has attempted to send
something. It fails, but the apparently time it takes for a -110 failure
is a lot longer than a -104. Because of the single thread issue nothing
happens while this is waiting so you get high load.

When the request finally fails all the queued up requests get through,
only to find the tcp socket closed (-5 = I/O error), until smbmount again
manages to reconnect.

The long delay points to another problem with the current smbfs socket
code. It lets the network select the length of a timeout. Patches for this
exists for different 2.4 and 2.2 kernels that sets the timeout for any
operation to 30 seconds (user cfg). I plan to get that into 2.4.20.

There is a more advanced version that should let people always interrupt
processes that are sleeping while accessing smbfs and not be single
threaded and thus faster with multiple accesses ... for 2.5, eventually.

> Why does this happen?  Is this a known issue with Samba 2.2.4?

Yes, with the kernel, nothing to do with samba.

> Also, I have yet to see any of these events on the other linux server,
> LINUX-TWO.  I believe this is because LINUX-TWO has processes running on
> it that hit these mounts every five seconds, and so there is never an
> opportunity for the underlying network connections to become inactive.
> Does this make sense?  If so, all I have to do to work around the
> problem on LINUX-ONE is set up a script that periodically pings the
> mounts, perhaps running an "ls" command, every so often, correct?  If

Yes. There will eventually be similar code inside smbfs to do whatever it
needs to keep the connection up while mounted.

> that is true, I need to know what the inactivity time-out limit is, so
> this script doesn't have to run more often than necessary.

It's a server side setting. I believe NT (and win2k?) uses:

HKLM\System\CurrentControlSet\Services\LanmanServer\Parameters\autodisconnect
-1 to 65535 minutes

I think the default is something like 10 minutes. 5 minutes sounds good.

/Urban