[Samba] io_uring cause data corruption

A L mail at lechevalier.se
Sat May 2 09:34:41 UTC 2020


On 2020-05-01 22:04, Jeremy Allison via samba wrote:
> On Fri, May 01, 2020 at 09:27:58PM +0200, A L via samba wrote:
>>> Jeremy.
>> I did not use a command line, but rather File Explorer
>> 1) In File explorer, go to \\SAMBA\share_io_uring\
>> 2) Select folder "test2-ro" and choose copy
>> 3) Paste to a local drive
>>
>> The amount of files not working seems to vary, but usually within the 10
>> first copied files. You can download a copy of the source test files from
>> https://mirrors.tnonline.net/?dir=samba including all checksums.
>>
>> The file https://mirrors.tnonline.net/samba/test2-ro_no_io_uring.7z contains
>> 1000 copies of the source file
>> The file https://mirrors.tnonline.net/samba/test2-ro_with_io_uring.7z
>> contains the same files as they ended up after copying to the Windows
>> client. Here you can see where the holes (bunch of zeroed data) in the files
>> are created at different offsets.
>>
>> I hope this helps!
> OK, I just did this locally with 100 10MB duplicate files.
>
> Completely correct SHA256 output for every file on both
> Windows and Linux after the copy.
>
> Note that these are *separate* files on ext4, not created
> with --reflink (ext4 doesn't support it).
I forgot to mention that when copied the files to a SSD, then I did not 
use any reflinks or hard links, just plain copies.
> The Windows client first tries to do the FSCTL_OFFLOAD_READ
> (which our btrfs module does support, but not the default
> ext4 module) and then falls back to regular SMB2 async
> read, ramping up from 32k initially to 1MB reads.
Is this the "vfs module = btrfs". This was disabled too in the latest tests.
> I'm starting to think the problem is in your btrfs
> filesystem, not the Samba uring module.
>
> Before I spend more time on this I'd like you to
> create a standard ext4 filesystem, create all the
> files with cp *without* using reflink (which ext4
> doesn't support) so we know there's no COW shenanigans
> and then copy from *that* filesystem to the Windows client
> using the iouring module.
>
> I'm not a betting man, but if I was I'd bet the problem
> dissapears :-).
>
So, I found a 160GB USB disk that I formatted to ext4 and exported it as:
########
[share_io_uring_ext4]
     comment = ext4-test io_uring
     path = /mnt/ext4
     browseable = yes
     read only = yes
     guest only = Yes
     guest ok = Yes
     vfs objects = io_uring

[share_no_io_uring_ext4]
     comment = ext4-test no_io_uring
     path = /mnt/ext4
     browseable = yes
     read only = yes
     guest only = Yes
     guest ok = Yes
     vfs objects =
########
I also tried various combinations of io_uring settings.

io_uring:sqpoll = true/false
io_uring:num_entries = 2, 4, 8, 64, 128 (default), 256

Between each test i do "echo 3 > /proc/sys/vm/drop_caches" and restart 
the samba service.

One difference I noticed is that there is now only one [io_wqe_worker-0] 
io thread active when copying from this USB disk. This is probably 
because it is so slow, about 25MB/s maximum read speed. This seems to 
have an impact on where the corruptions happen in the files. All files 
have a corrupt/zeroed block at the end of the file, 99% of the time at 
the same place, as seen here 
https://paste.tnonline.net/files/Zdq7YKOfzogc_ext4_copy_io_uring.png

So, now I think we have at least ruled out Btrfs as filesystem as 
culprit with the io_uring.

You mentioned a wire trace of the SMB traffic. Can you provide some 
examples on how I do that?

Perhaps I should open a bug on the Bugzilla tracker and continue this 
discussion there?

Regards,
Anders



More information about the samba mailing list