[linux-cifs-client][patch] prevent data integrity problems when socket starts returning EAGAINs

Shirish Pargaonkar shirishpargaonkar at gmail.com
Thu Oct 2 13:54:36 GMT 2008


On Thu, Oct 2, 2008 at 8:29 AM, Jeff Layton <jlayton at redhat.com> wrote:
> On Sat, 27 Sep 2008 20:42:41 -0500
> "Shirish Pargaonkar" <shirishpargaonkar at gmail.com> wrote:
>
>> One more attempt.  There was a bug in prior one, lenght += is incorrect.
>>
>> Regards,
>>
>> Shirish
>>
>
> Tested this patch out today. I see some warnings that ought to be
> cleaned up before we take it:
>
> fs/cifs/transport.c: In function 'smb_send_blocking':
> fs/cifs/transport.c:267: warning: suggest explicit braces to avoid ambiguous 'else'
> fs/cifs/transport.c: In function 'smb_send2_blocking':
> fs/cifs/transport.c:413: warning: suggest explicit braces to avoid ambiguous 'else'
> fs/cifs/transport.c: At top level:
> fs/cifs/transport.c:280: warning: 'smb_send2' defined but not used
>
> I'm also not sure I understand the problem you're solving here. How do
> the writes get mixed up like this? It looks like after smb_send2 gives
> up, it returns -EAGAIN to the caller. How does this become the data
> corruption you mention?
>
> After this patch smb_send and smb_send2 are no longer used. It might be
> best to modify smb_send and smb_send2 to do what their blocking
> variants do. There's no real need to bloat out the module with code that
> isn't used.
>
>
> Thanks,
> --
> Jeff Layton <jlayton at redhat.com>
>

Jeff,

Thanks.  Regading warnings, the one about braces, I had fixed it but
scripts/checkpatch.pl did not like the braces.  I decided to go with
checkpatch.pl but I can change the patch so that compiler does not warn
and ignore warnings by checkpatch.pl.

About overwriting smb_send and smb_send2 with smb_send_blocking
and smb_send2_blocking respectively, I do not have any problems.
I think some of the teams are still doing testing and some have yet to do,
so once all the tests pass, I can overwrite those functions if that is what
we agree upon.

Regarding corruption, this is how it happens.
Suppose we have a 56K write andx with say mid 1000, another 56K write andx
request with say mid 1001.
Say the mid 1000 request could send only 32K out of 56K and it
encountered EAGAIN,
so the smd_send2 returned with error and removed the mid entry in the list.
Now server has 32K of the mid 1000 request.
So now cifs attempts to send mid 1001 request and it succeeds but part of that
request (24K) becomes part of the request with mid 1000 at SMB server as socket
does not distinguish between these two cifs requets.
So the request data has got mixed up and corruption has occurred at
the SMB server.

I could see smb headers in the file contents in the middle of the data.
What also happens is SMB server sends response to mid 1000, now that that
request is complete albeit corrupt but cifs client has already removed
the mid entry
in the list so it does not recognize the response and logs 'unknown response.
Then the mid request 1001 does not get serviced since it is not a request that
SMB server can recognize as its initial part (including SMB header) has become
part of the other request (mid 1000), so cifs client times out and
logs response
not received and reconnects thinking server is not responding.

So cifs client receiving EAGAIN in the middle of a 56K (or any size) write andx
request causes the data corruption.

Regards,

Shirish


More information about the linux-cifs-client mailing list