[Samba] io_uring cause data corruption
A L
mail at lechevalier.se
Sat May 2 09:34:41 UTC 2020
On 2020-05-01 22:04, Jeremy Allison via samba wrote:
> On Fri, May 01, 2020 at 09:27:58PM +0200, A L via samba wrote:
>>> Jeremy.
>> I did not use a command line, but rather File Explorer
>> 1) In File explorer, go to \\SAMBA\share_io_uring\
>> 2) Select folder "test2-ro" and choose copy
>> 3) Paste to a local drive
>>
>> The amount of files not working seems to vary, but usually within the 10
>> first copied files. You can download a copy of the source test files from
>> https://mirrors.tnonline.net/?dir=samba including all checksums.
>>
>> The file https://mirrors.tnonline.net/samba/test2-ro_no_io_uring.7z contains
>> 1000 copies of the source file
>> The file https://mirrors.tnonline.net/samba/test2-ro_with_io_uring.7z
>> contains the same files as they ended up after copying to the Windows
>> client. Here you can see where the holes (bunch of zeroed data) in the files
>> are created at different offsets.
>>
>> I hope this helps!
> OK, I just did this locally with 100 10MB duplicate files.
>
> Completely correct SHA256 output for every file on both
> Windows and Linux after the copy.
>
> Note that these are *separate* files on ext4, not created
> with --reflink (ext4 doesn't support it).
I forgot to mention that when copied the files to a SSD, then I did not
use any reflinks or hard links, just plain copies.
> The Windows client first tries to do the FSCTL_OFFLOAD_READ
> (which our btrfs module does support, but not the default
> ext4 module) and then falls back to regular SMB2 async
> read, ramping up from 32k initially to 1MB reads.
Is this the "vfs module = btrfs". This was disabled too in the latest tests.
> I'm starting to think the problem is in your btrfs
> filesystem, not the Samba uring module.
>
> Before I spend more time on this I'd like you to
> create a standard ext4 filesystem, create all the
> files with cp *without* using reflink (which ext4
> doesn't support) so we know there's no COW shenanigans
> and then copy from *that* filesystem to the Windows client
> using the iouring module.
>
> I'm not a betting man, but if I was I'd bet the problem
> dissapears :-).
>
So, I found a 160GB USB disk that I formatted to ext4 and exported it as:
########
[share_io_uring_ext4]
comment = ext4-test io_uring
path = /mnt/ext4
browseable = yes
read only = yes
guest only = Yes
guest ok = Yes
vfs objects = io_uring
[share_no_io_uring_ext4]
comment = ext4-test no_io_uring
path = /mnt/ext4
browseable = yes
read only = yes
guest only = Yes
guest ok = Yes
vfs objects =
########
I also tried various combinations of io_uring settings.
io_uring:sqpoll = true/false
io_uring:num_entries = 2, 4, 8, 64, 128 (default), 256
Between each test i do "echo 3 > /proc/sys/vm/drop_caches" and restart
the samba service.
One difference I noticed is that there is now only one [io_wqe_worker-0]
io thread active when copying from this USB disk. This is probably
because it is so slow, about 25MB/s maximum read speed. This seems to
have an impact on where the corruptions happen in the files. All files
have a corrupt/zeroed block at the end of the file, 99% of the time at
the same place, as seen here
https://paste.tnonline.net/files/Zdq7YKOfzogc_ext4_copy_io_uring.png
So, now I think we have at least ruled out Btrfs as filesystem as
culprit with the io_uring.
You mentioned a wire trace of the SMB traffic. Can you provide some
examples on how I do that?
Perhaps I should open a bug on the Bugzilla tracker and continue this
discussion there?
Regards,
Anders
More information about the samba
mailing list