[cifs-protocol] cifs client timeouts and hard/soft mounts

Sat Dec 4 09:55:08 MST 2010

On Sat, 4 Dec 2010 08:46:53 -0600
Shirish Pargaonkar <shirishpargaonkar at gmail.com> wrote:

> 
> Jeff, I am not sure.  Basically I am coming from here:
> 
> I have a bug open, where an SMB server when slow to respond
> (for a cifs client), if cifs client reconnects, causes data corruption
> on the server. If left to its own, responses from server eventually
> make through (without any intervention) and tests pass.
> 

I have a very similar bug open and that's what prompted me to go down
this road. You may want to test the patchset I proposed for cifs
against your reproducer.

> If an SMB server is unresponsive, how do we know it will respond to
> a reconnect or a reconnect will help? 
>
> I do not know enough about
> SMB servers to describe an unresponsive server i.e. how and when
> it came to be unresponsive, how it handles transport layer then,
> whether it corrects itself or how to correct it, how it handles
> underlying physical file sytem etc..

A reconnect may not help. The problem we have today however is that
Linux CIFS client is too cavalier with reconnects. It reconnects
the socket any time that a call has taken longer than an arbitrary
timeout. It tries to deal with that by varying timeouts with the
type of call, but I think that's a broken model that fails in many
situations.

It's impossible to predict how long it'll take the server to service a
particular call, as we can never be sure what the load on the
server and underlying storage is. A QPathInfo call may take just as
long as a write past EOF if the storage is being hammered.

The scheme I'm proposing makes the assumption that even when the server
is loaded, it'll still be able to respond to an echo. That may also
fail in certain situations, but empirical evidence has shown that
that it's generally true. This scheme won't fix every failure scenario,
but it should help the vast majority of situations where the server is
simply being slow to respond to a particular call.

I'm not opposed to what you're proposing, but it seems like a more
radical step than what I have proposed. We'd need to understand what
recourse the user would have in practice and what the behavior will be
in various failure scenarios. Leaving the processes hung and logging a
message when the server isn't responding isn't going to be very helpful
if there's nothing that can be done about it.

-- 
Jeff Layton <jlayton at samba.org>