[PATCH] CIFS: Decrease reconnection delay when switching nics

Tom Talpey ttalpey at microsoft.com
Thu Feb 28 06:01:20 MST 2013


> -----Original Message-----
> From: samba-technical-bounces at lists.samba.org [mailto:samba-technical-
> bounces at lists.samba.org] On Behalf Of Stefan (metze) Metzmacher
> Sent: Wednesday, February 27, 2013 7:16 PM
> To: Jeff Layton
> Cc: Steve French; Dave Chiluk; samba-technical at lists.samba.org; linux-
> kernel at vger.kernel.org; linux-cifs at vger.kernel.org
> Subject: Re: [PATCH] CIFS: Decrease reconnection delay when switching nics
> 
> Am 27.02.2013 17:34, schrieb Jeff Layton:
> > On Wed, 27 Feb 2013 12:06:14 +0100
> > "Stefan (metze) Metzmacher" <metze at samba.org> wrote:
> >
> >> Hi Dave,
> >>
> >>> When messages are currently in queue awaiting a response, decrease
> >>> amount of time before attempting cifs_reconnect to SMB_MAX_RTT =
> 10
> >>> seconds. The current wait time before attempting to reconnect is
> >>> currently 2*SMB_ECHO_INTERVAL(120
> >>> seconds) since the last response was recieved.  This does not take
> >>> into account the fact that messages waiting for a response should be
> >>> serviced within a reasonable round trip time.
> >>
> >> Wouldn't that mean that the client will disconnect a good connection,
> >> if the server doesn't response within 10 seconds?
> >> Reads and Writes can take longer than 10 seconds...
> >>
> >
> > Where does this magic value of 10s come from? Note that a slow server
> > can take *minutes* to respond to writes that are long past the EOF.
> >
> >>> This fixes the issue where user moves from wired to wireless or vice
> >>> versa causing the mount to hang for 120 seconds, when it could
> >>> reconnect considerably faster.  After this fix it will take
> >>> SMB_MAX_RTT (10 seconds) from the last time the user attempted to
> >>> access the volume or SMB_MAX_RTT after the last echo.  The worst
> >>> case of the latter scenario being
> 2*SMB_ECHO_INTERVAL+SMB_MAX_RTT+small scheduling delay (about 130
> seconds).
> >>> Statistically speaking it would normally reconnect sooner.  However
> >>> in the best case where the user changes nics, and immediately tries
> >>> to access the cifs share it will take SMB_MAX_RTT=10 seconds.
> >>
> >> I think it would be better to detect the broken connection by using
> >> an AF_NETLINK socket listening for RTM_DELADDR messages?
> >>
> >> metze
> >>
> >
> > Ick -- that sounds horrid ;)
> 
> This is what winbindd uses to detect that a source ip of outgoing connections
> are gone. I don't know much of the kernel, there might be a better way from
> within the kernel to detect this. But this is exactly the correct thing to do to
> failover to another interface, as it just happens when the ip is removed
> without messing with a timeout value.
> 
> Another optimization would be to use tcp keepalives (I think there 10
> seconds would be ok), I think that's what Windows SMB3 clients are using.

Yes, they do. See MS-SMB2 behavior note 144 attached to section 3.2.5.14.9.

10 seconds seems a fairly rapid keepalive interval. The TCP stack probably
won't allow it to be less than the maximum retransmit, for instance.

Tom.


More information about the samba-technical mailing list