Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3

Wed May 6 17:03:44 UTC 2020

On Wed, May 06, 2020 at 04:08:03PM +0200, Stefan Metzmacher wrote:
> Am 06.05.20 um 14:41 schrieb Anoop C S:
> > On Wed, 2020-05-06 at 12:33 +0200, Stefan Metzmacher wrote:
> >> Hi Anoop,
> >>
> >>> I could reproduce the difference in SHA256 checksum after copying a
> >>> directory with 100 copies of test file(provided by reporter) from
> >>> io_uring VFS module enabled share using Windows explorer(right-
> >>> click-
> >>>> copy/paste). Only 5 out of 100 files had correct checksum after
> >>>> copy
> >>> operation :-/
> >>
> >> Great! Can you please try to collect level 1 log files with
> >> the patch https://bugzilla.samba.org/attachment.cgi?id=15955
> >> applied?
> > 
> > I have attached three log files.
> > log.io_uring.smbd -- Copy using Windows explorer
> > log.io_uring-mget.smd -- Copy using smbclient
> > log.io_uring-powershell.smd -- Copy using `Copy-Item`
> 
> Thanks! All of them show short reads like:
> 
> > [2020/05/06 17:27:28.130248,  1] ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)
> >   vfs_io_uring_finish_req: pread ofs=0 (0x0) len=32768 (0x8000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536 dir/1.bin fnum 1607026405
> > [2020/05/06 17:27:28.131049,  1] ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)
> >   vfs_io_uring_finish_req: pread ofs=9969664 (0x982000) len=30336 (0x7680) nread=30336 (0x30336) eof=10000000 (0x989680) blks=4096 blocks=19536 dir/1.bin fnum 1607026405
> > [2020/05/06 17:27:28.133679,  1] ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)
> >   vfs_io_uring_finish_req: pread ofs=61440 (0xf000) len=32768 (0x8000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536 dir/1.bin fnum 1607026405
> > [2020/05/06 17:27:28.140184,  0] ../../source3/modules/vfs_io_uring.c:88(vfs_io_uring_finish_req)
> >   vfs_io_uring_finish_req: Invalid pread ofs=0 (0x0) len=1048576 (0x100000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536 dir/1.bin fnum 1607026405
> 
> It seems the first request is at ofs=0 len=32768 (0x8000) and we get
> 32768 back.
> A follow up request with ofs=0 len=1048576 (0x100000) only gets the
> first 32768 bytes which are already in the page cache.
> 
> I can easily reproduce this with the Ubuntu 5.4 kernel once I add
> state->ur.sqe.rw_flags |= RWF_NOWAIT; to vfs_io_uring_pread_send()
> and use this.
> 
> echo 1 > /proc/sys/vm/drop_caches
> head -c 1024 /root/samba-test/ff1.dat | hexdump -C
> 00000000  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
> |................|
> *
> 00000400
> smbclient //172.31.9.167/uringff -Uroot%test -c "get ff1.dat"
> 
> results in this log entries:
> > [2020/05/06 06:51:57.069990,  0] ../../source3/modules/vfs_io_uring.c:89(vfs_io_uring_finish_req)
> >   vfs_io_uring_finish_req: Invalid pread ofs=0 (0x0) len=8388608 (0x800000) nread=16384 (0x4000) eof=8388608 (0x800000) blks=4096 blocks=16384 ff1.dat fnum 840153065
> > [2020/05/06 06:51:57.076882,  1] ../../source3/modules/vfs_io_uring.c:104(vfs_io_uring_finish_req)
> >   vfs_io_uring_finish_req: pread ofs=16384 (0x4000) len=8372224 (0x7fc000) nread=8372224 (0x7fc000) eof=8388608 (0x800000) blks=4096 blocks=16384 ff1.dat fnum 840153065
> 
> smbclient is just smart enough to recover itself from the short read.
> But the windows client isn't.

Well we pay attention to the amount of data returned
and only increment the next read request by the amount
actually returned.

I'm amazed that the Windows client doesn't seem to
check this !

> The attached test against liburing (git://git.kernel.dk/liburing) should
> be able to demonstrate the problem. It can also be found in
> https://github.com/metze-samba/liburing/tree/implicit-rwf-nowaithttps://github.com/metze-samba/liburing/commit/eb06dcfde747e46bd08bedf9def2e6cb536c39e3
> 
> 
> I added the sqe->rw_flags = RWF_NOWAIT; line in order to demonstrate it
> against the Ubuntu 5.3 and 5.4 kernels. They both seem to have the bug.
> 
> Can someone run the unmodified test/implicit-rwf_nowait against
> a newer kernel?

Aha. I wondered about the short read issue when this
was first reported but I could never catch it in the
act.

If the Windows client doesn't check and the kernel
returns short reads I guess we'll have to add logic
similar to tstream_readv_send()/tstream_writev_send()
that ensure all bytes requested/send actually go through
the interface and from/into the kernel unless a read
returns 0 (EOF) or a write returns an error.

What a pain though :-(. SMB2+ server implementors
really need to take note that Windows clients will corrupt
files if they get a short read/write return.

The fact that early kernels don't return short
reads on io_uring but later kernels do makes it
even worse :-(.

There's even an SMB2 protocol field in SMB2_READ:

"MinimumCount (4 bytes): The minimum number of bytes to be read for this operation to be
successful. If fewer than the minimum number of bytes are read by the server, the server
MUST return an error rather than the bytes read."

We correctly return EOF if the amount read from
the kernel is less than SMB2_READ.MinimumCount
so I'm guessing they're not using it or looking
at it (or setting it to zero).

MinimumCount is supposed to allow the client to cope with
this. Anoop, do you have wireshark traces so we can
see what the Windows clients are setting here ?

Jeremy.