[linux-cifs-client] [patch] Increase send time out on a socket long enough inorder to eliminate any timeouts on large sends

Thu Jul 23 11:00:25 MDT 2009

On Thu, Jul 23, 2009 at 10:34 AM, Jeff Layton<jlayton at redhat.com> wrote:
> On Thu, 23 Jul 2009 09:51:32 -0500
> Shirish Pargaonkar <shirishpargaonkar at gmail.com> wrote:
>
>> On Thu, Jul 23, 2009 at 7:05 AM, Jeff Layton<jlayton at redhat.com> wrote:
>> > On Wed, 22 Jul 2009 20:14:38 -0500
>> > Shirish Pargaonkar <shirishpargaonkar at gmail.com> wrote:
>> >
>> >> Inspite of a set of data integrity patches in cifs last yer, there
>> >> still persist errors
>> >> caused due to timeouts resulting in sending incomplete data and
>> >> hence data integrity errors.
>> >>
>> >> The proposed socket send timeout is large enough to elminate that possibility.
>> >
>> > On what evidence do you base the above statement? Who's to say that 30s
>> > is long enough if someone has a high-latency enough connection?
>> >
>> >> The tests with this patches have resulted in elminating data integrity errors on
>> >> an 80 hours test runs which otherwise manifest in matter of hours of a test run.
>> >>
>> >
>> > Also, can you give some details about these data integrity errors? Were
>> > writes failing? If so, were they not reported at fsync or close?
>>
>> The errors logged by cifs client were like this
>> This is what I had seen last year when the patches were developed.
>> The entire write could not be sent because of socket timeout, other thread
>> fills in rest of the 56K write so that second 56K is not responded and client
>> logs 'No response for cmd'.
>> The longer timeout seems to be long enough for server to receive entire
>> smbwrite (56K).
>>
>> May 12 05:17:09 voyBCSsles11-rc3 kernel:  CIFS VFS: server not responding
>> May 12 05:17:09 voyBCSsles11-rc3 kernel:  CIFS VFS: No response for cmd 50 mid
>> 20646
>> May 12 05:17:09 voyBCSsles11-rc3 kernel:  CIFS VFS: No response to cmd 47 mid
>> 20647
>> May 12 05:17:09 voyBCSsles11-rc3 kernel:  CIFS VFS: Write2 ret -11, wrote 0
>> May 12 05:17:11 voyBCSsles11-rc3 kernel:  CIFS VFS: Write2 ret -9, wrote 0
>> May 12 05:17:39 voyBCSsles11-rc3 kernel:  CIFS VFS: server not responding
>> May 12 05:17:39 voyBCSsles11-rc3 kernel:  CIFS VFS: No response for cmd 50 mid
>> 21347
>> May 12 05:17:39 voyBCSsles11-rc3 kernel:  CIFS VFS: No response to cmd 47 mid
>> 21348
>> May 12 05:17:39 voyBCSsles11-rc3 kernel:  CIFS VFS: Write2 ret -11, wrote 0
>> May 12 05:17:39 voyBCSsles11-rc3 kernel:  CIFS VFS: Write2 ret -9, wrote 0
>> May 12 05:18:09 voyBCSsles11-rc3 kernel:  CIFS VFS: server not responding
>> May 12 05:18:09 voyBCSsles11-rc3 kernel:  CIFS VFS: No response to cmd 46 mid
>> 24859
>> May 12 05:18:09 voyBCSsles11-rc3 kernel:  CIFS VFS: Send error in read = -11
>> May 12 05:18:09 voyBCSsles11-rc3 kernel:  CIFS VFS: No response for cmd 50 mid
>> 24858
>>
>>
>
> It sounds like the original bug was never fixed then, only made less
> likely by changing the timing. This patch looks like it just does the
> same thing.

The first step was to change the socket from non-blocking to blocking
to prevent interleaved sends.
A longer send timeout makes sure the send has enough duration to
complete the send instead of returning prematurely.

I can not think of a way to abort a partialy sent request to the server and
I do not know whether it is possible to be sure that entire 56K buffer is
available before dispatching a send on  a (test induced) stressed socket.

>
> Rather than papering over the bug by increasing the timeout, I think a
> patch is needed that fixes the actual bug. That is, you need to make it
> impossible for these sorts of interleaved sends to occur.
>
> --
> Jeff Layton <jlayton at redhat.com>
>