[cifs-protocol] [REG: 110120160951867] Requesting clarification of CIFS client timeout behavior

Wed Dec 1 14:31:54 MST 2010

On Wed, 01 Dec 2010 14:44:44 -0600
"Christopher R. Hertel" <crh at samba.org> wrote:

> Jeff Layton wrote:
> :
> > Yes, this is probably stretching the definition of protocol
> > clarification, but I figured it wouldn't hurt to ask... :)
> 
> Not at all.
> 
> Keep in mind that I worked with Microsoft to get these docs out, so I know
> how important such details are to them, as well as third party implementers.
> 
> The interesting thing about your questions is that they touch on very
> obscure boundaries between old LANMAN behavior, NT behavior, Windows
> behavior, and actual protocol.  Perfect storm.  I love this stuff.
> 
> The more I think about it, the more I believe that the Echo is sent to
> determine whether the physical connection is still up.  If it's not, then
> there is no sense in sending an SMB_COM_NT_CANCEL anyway, since the other
> end would likely never receive it.
> 
> As I mentioned, NT and OS/2 were able to support a single logical SMB
> Session over multiple connections (think of a client with three dial-up
> modems connection to a server with three or more modems).  I think that the
> idea was to use the Echo to test a specific link, and shut down the
> connection bound to that link if it was down.
> 
> The client closes the entire session only if the server is non-responsive.
> 
> ...but that's guess-work based upon my memory.  The real answer is in the
> Windows source and I don't have access to that any more (thank goodness!).
> 

Perfect storm indeed, especially since MS-CIFS also says:

3.2.7.1 Handling a Transport Disconnect:

When the transport indicates a disconnection, the client MUST walk
through the PIDMIDList and return an error for each outstanding command
to the calling application. All resources associated with the
connection MUST be freed. Finally, the connection MUST be freed.

...so I guess you'd have to stretch "connection" in that case to mean
the entire bonded connection group...(Blech!)

In any case, this may all be a matter of opinion since the spec doesn't
really spell it out. It is of concern however -- it can take a VERY long
time for some reads or writes to complete.

Consider, for instance, a small write that is long way past EOF on a
server with NTFS under the hood. My understanding is that NTFS will
zero-fill the files, and on slow storage that can take a *really* long
time (far longer than the default 45 second SESSTIMEOUT).

It would seem to make far more sense to simply apply a timeout to the
socket as a whole. IOW, only perform a reconnect if the server doesn't
respond to echoes within a reasonable amount of time (whatever
"reasonable" is).

That said, since Windows is the reference platform here, I'm quite
interested in what it does in this situation...

-- 
Jeff Layton <jlayton at samba.org>