[Samba] io_uring cause data corruption

A L mail at lechevalier.se
Mon Apr 27 21:21:35 UTC 2020


On 2020-04-27 18:45, Jeremy Allison via samba wrote:
> On Mon, Apr 27, 2020 at 10:27:17AM +0200, A L via samba wrote:
>> On 2020-04-26 19:46, Jeremy Allison via samba wrote:
>>> On Sun, Apr 26, 2020 at 11:51:42AM +0200, A L via samba wrote:
>>>> * Connected from a Windows 10 computer over 1G ethernet. * Copy 
>>>> data using Windows Explorer and FastCopy(1) from the Samba share to 
>>>> a local disk. * Verify the sha-256 sum on the files. From what I 
>>>> can see there is data corruption on many of the files. Sha-256 does 
>>>> not match. I copied the same files many times and the data 
>>>> corruption occurs within minutes. The total data set is about 800GB. 
>>> Can you do checksums on file fragments so we can discover at what 
>>> offset (if non-zero) the corruption occurs. 
>> Yes, I will check this. I saw a patch on the kernel mailing list 
>> about possible corruptions in during re-scheduling. I wonder if this 
>> is the problem I am hitting. I'll make some more tests with this 
>> patch. https://www.spinics.net/lists/io-uring/msg01706.html 
> Oh, that might explain it. I won't do further work until you can 
> confirm the Samba corruptions happen with this kernel patch also.
Hello again,

I set up the following test case:
* Linux 5.7-rc3 (with the patch from previous mail)
* samba-4.12.1
* gcc-9.3.0
* liburing-0.6
* glibc-2.30-r8

=================================
Test 1)
Copy 10 10GB files.
1) ddrescue -s 10G -v -f /dev/urandom 0.bin
2) for((i=1;i<=10;i+=1)); do cp --reflink=always 0.bin $i.bin; done
3) sha256sum *.bin > sha256sum.txt
4) Windows 10, file explorer, copy the 10 files to a local disk D:\test\
5) Verify local files in D:\test with sha256sum
6) sha256sum was correct.
7) redid step 4 and 5. Now sha256sum was wrong, but all 10 files had the 
same (but wrong) csum!


=================================
Test 2)
Copy 1000 10MB files.
1) ddrescue -s 10M -v -f /dev/urandom 0.bin
2) for((i=1;i<=1000;i+=1)); do cp --reflink=always 0.bin $i.bin; done
3) sha256sum *.bin > sha256sum.txt
4) Windows 10, file explorer, copy all 1000 files to a local disk D:\test\
5) Verify local files in D:\test with sha256sum

The results are very surprising!

Correct sha256sum is:
c5ce0d7596c26b18a11eb0609abcd1ba5a4fc12cedcf5ce011a4bf1e227347ae

This is how the files verified:
=======================
D:\TEST\sha256sum.exe *.bin
7bff13155c38b825def06e269fae3543395f7273104aff8eb7c8b488419d09fe *0.bin
2eae29bc03989b7594550837dacacdebedc8757bec6b889e39e8289492653881 *1.bin
7bff13155c38b825def06e269fae3543395f7273104aff8eb7c8b488419d09fe *10.bin
7bff13155c38b825def06e269fae3543395f7273104aff8eb7c8b488419d09fe *100.bin
7bff13155c38b825def06e269fae3543395f7273104aff8eb7c8b488419d09fe *1000.bin
7bff13155c38b825def06e269fae3543395f7273104aff8eb7c8b488419d09fe *101.bin
7bff13155c38b825def06e269fae3543395f7273104aff8eb7c8b488419d09fe *102.bin
...
...
7bff13155c38b825def06e269fae3543395f7273104aff8eb7c8b488419d09fe *153.bin
7bff13155c38b825def06e269fae3543395f7273104aff8eb7c8b488419d09fe *154.bin
2eae29bc03989b7594550837dacacdebedc8757bec6b889e39e8289492653881 *155.bin
2eae29bc03989b7594550837dacacdebedc8757bec6b889e39e8289492653881 *156.bin
2eae29bc03989b7594550837dacacdebedc8757bec6b889e39e8289492653881 *157.bin
2eae29bc03989b7594550837dacacdebedc8757bec6b889e39e8289492653881 *158.bin
2eae29bc03989b7594550837dacacdebedc8757bec6b889e39e8289492653881 *159.bin
2eae29bc03989b7594550837dacacdebedc8757bec6b889e39e8289492653881 *16.bin
2eae29bc03989b7594550837dacacdebedc8757bec6b889e39e8289492653881 *160.bin
...
The csum changed here and continued for roughly 200 files until
eae29bc03989b7594550837dacacdebedc8757bec6b889e39e8289492653881 *308.bin
2eae29bc03989b7594550837dacacdebedc8757bec6b889e39e8289492653881 *309.bin
ae2601d8dcd1ef592a92907843a673703e6161173164b1094a52508ff65ab60a *31.bin
0c627526f677704d7beec0b56dedb89a1118b78e481d3f012fbc01f923211838 *310.bin
ae2601d8dcd1ef592a92907843a673703e6161173164b1094a52508ff65ab60a *311.bin
ae2601d8dcd1ef592a92907843a673703e6161173164b1094a52508ff65ab60a *312.bin
0d3d00122af0d486b2a9e1231239c0a77c034564957f069e310186ef8a7ba4aa *313.bin
de84b1fae759a25b0679f73da68747fd8183635d1a6d39d1b28b35d306837fd2 *314.bin
e681cdbc8bf557047967edfe2de71d62753af58c8eba422dfb1a7c6220b58f7b *315.bin
fd517535b5d7115ef7f76480b7f121f957eaba07baee4e58c47f0c2dd3c8614c *316.bin
ae2601d8dcd1ef592a92907843a673703e6161173164b1094a52508ff65ab60a *317.bin
c5ce0d7596c26b18a11eb0609abcd1ba5a4fc12cedcf5ce011a4bf1e227347ae *318.bin
ae2601d8dcd1ef592a92907843a673703e6161173164b1094a52508ff65ab60a *319.bin
7bff13155c38b825def06e269fae3543395f7273104aff8eb7c8b488419d09fe *32.bin
ae2601d8dcd1ef592a92907843a673703e6161173164b1094a52508ff65ab60a *320.bin
ae2601d8dcd1ef592a92907843a673703e6161173164b1094a52508ff65ab60a *321.bin
0c627526f677704d7beec0b56dedb89a1118b78e481d3f012fbc01f923211838 *322.bin
c5ce0d7596c26b18a11eb0609abcd1ba5a4fc12cedcf5ce011a4bf1e227347ae *323.bin
ae2601d8dcd1ef592a92907843a673703e6161173164b1094a52508ff65ab60a *324.bin
ae2601d8dcd1ef592a92907843a673703e6161173164b1094a52508ff65ab60a *325.bin
c5ce0d7596c26b18a11eb0609abcd1ba5a4fc12cedcf5ce011a4bf1e227347ae *326.bin
2eae29bc03989b7594550837dacacdebedc8757bec6b889e39e8289492653881 *327.bin
6720f9dc964c0fa2125366338f921a6772b2b0751ef194b12779989ef25a9be8 *328.bin
ec8c312be8d7c20bb39e20007d53e6ac49022df56877ec00f9dc757f74deaa7d *329.bin
7bff13155c38b825def06e269fae3543395f7273104aff8eb7c8b488419d09fe *33.bin
7bff13155c38b825def06e269fae3543395f7273104aff8eb7c8b488419d09fe *330.bin
7bff13155c38b825def06e269fae3543395f7273104aff8eb7c8b488419d09fe *331.bin
7bff13155c38b825def06e269fae3543395f7273104aff8eb7c8b488419d09fe *332.bin
...
continued with the same (wrong) csums till the end.


RESULT: 23 out of 1001 files had correct csum.

=================================
Test 3)
1) Remove io_uring from vfs objects in smb.confand restart Samba.
2) Copy all original files from Test 2
3) All 1001 files' csums are now correct.
=================================



smb.conf:
###########################
[global]
     log level = 1
     workgroup = WORKGROUP
     netbios name = SAMBA
     server string = Samba Server
     server role = standalone server
     hosts allow = 192.168.0. 127.
     interfaces = lan
     max protocol = SMB3_11

     log file = /var/log/samba/%I.log
     max log size = 10240

     security = user
     passdb backend = tdbsam
     wins support = yes
     dns proxy = yes

[usb-backup]
     comment = USB Backup - Media files
     path = /media/usb-backup
     writeable = no
     browseable = yes
     read only = yes
     create mask = 0664
     directory mask = 0775
     guest only = Yes
     guest ok = Yes
     force user = nasuser
     force group = nas
     store dos attributes = yes
     ea support = no
     acl group control = no
     inherit owner = Yes
     vfs objects = btrfs, io_uring
###########################

The samba logfile does not contain much. These are the logs during the test:

[2020/04/27 23:12:54.432587,  1] 
../../source3/param/loadparm.c:2512(lp_idmap_range)
   idmap range not specified for domain '*'
[2020/04/27 23:13:02.227075,  1] 
../../source3/param/loadparm.c:2512(lp_idmap_range)
   idmap range not specified for domain '*'
[2020/04/27 23:13:02.882414,  1] 
../../source3/param/loadparm.c:2512(lp_idmap_range)
   idmap range not specified for domain '*'



Regards,
Anders



More information about the samba mailing list