[Samba] io_uring cause data corruption

A L mail at lechevalier.se
Thu Apr 30 08:25:49 UTC 2020



On 2020-04-30 09:08, A L via samba wrote:
>
> On 2020-04-29 00:40, Jeremy Allison via samba wrote:
>> On Mon, Apr 27, 2020 at 11:21:35PM +0200, A L wrote:
>>> I set up the following test case:
>>> * Linux 5.7-rc3 (with the patch from previous mail)
>>> * samba-4.12.1
>>> * gcc-9.3.0
>>> * liburing-0.6
>>> * glibc-2.30-r8
>>>
>>> =================================
>>> Test 1)
>>> Copy 10 10GB files.
>>> 1) ddrescue -s 10G -v -f /dev/urandom 0.bin
>>> 2) for((i=1;i<=10;i+=1)); do cp --reflink=always 0.bin $i.bin; done
>>> 3) sha256sum *.bin > sha256sum.txt
>>> 4) Windows 10, file explorer, copy the 10 files to a local disk 
>>> D:\test\
>>> 5) Verify local files in D:\test with sha256sum
>>> 6) sha256sum was correct.
>>> 7) redid step 4 and 5. Now sha256sum was wrong, but all 10 files had 
>>> the
>>> same (but wrong) csum!
>>>
>>>
>>> =================================
>>> Test 2)
>>> Copy 1000 10MB files.
>>> 1) ddrescue -s 10M -v -f /dev/urandom 0.bin
>>> 2) for((i=1;i<=1000;i+=1)); do cp --reflink=always 0.bin $i.bin; done
>>> 3) sha256sum *.bin > sha256sum.txt
>>> 4) Windows 10, file explorer, copy all 1000 files to a local disk 
>>> D:\test\
>>> 5) Verify local files in D:\test with sha256sum
>> I just tried to reproduce this using
>> Samba master on Ubuntu 19.10 kernel 5.3.0-51-generic
>> liburing-dev:0.4-2.
>>
>> I only tried with 100 files, and fetched
>> them using smbclient "mget", and the results
>> were always the same - identical sha256sum
>> hashes on all files.
>>
>> We're going to need more info to track this
>> down in your environment I'm afraid.
>>
> I'll do some more testing with locally mounted samba using the cifs 
> module and also the smbclient tool.
>
> Does mget use multiple concurrent threads? I notice that when I copy 
> from Windows explorer (and using FastCopy), I have about 10 I/O 
> threads with smbd, according to iotop.
>
So I did some more tests. smbclient mget does not copy in the same way 
Windows Explorer does. When copying in Windows Explorer, there are many 
multiple concurrent threads used to transfer the files. With smbclient 
mget there are no corruptions, both locally and over the network from 
another Linux machine.

I analysed the difference between a correct file and a corrupt file.
At position 0x7A0000 the corrupt file started to contain only binary 
zero. At position 0x800000 the zeroes ended and correct data continued. 
To me it sound like some wrong memory is copied somehow.

These two files shows the difference as shown in a hex-editor.
https://paste.tnonline.net/files/MO1FJvDOG6E8_smb_1
https://paste.tnonline.net/files/Rglite4KWmU8_smb_2

I will redo the tests with different Windows clients and see if that 
shows different results.




More information about the samba mailing list