[PATCH 5/5] vfs_default: add copy_file_range support for copy chunk

Björn Jacke bj at sernet.de
Sat Jan 21 01:52:56 UTC 2017


Hi David,

thanks for the feedback.

On 2017-01-20 at 14:55 +0100 David Disseldorp sent off:
> On Fri, 20 Jan 2017 00:01:43 +0100, Björn Jacke wrote:
> 
> > we try to use the copy_file_range syscall first, which can ideally be a
> > zero-copy operation. We fall back to userspace read/write if copy_file_range
> > is not available.
> 
> Thanks for working on this. A couple of comments on this patch:
> - Please use the off_in/off_out syscall parameters instead of seeking.

this is what I wanted to do in the first place. I hit a weird bug when doing a
second copy chunc call on the same file twice. The reason is that the
off_in/off_out parameters are realative to the current file offset and not
absolute. Don't ask me what this is good for. When we call the copy chunk on
the same file twice, the file offset is not 0 and we end up with a file
offset+off_in larger that the file. The lseek to 0 is needed to sanitize this
thing again. The man page isn't telling this very clear but if you knnow it and
if you then read the man page you will spot the point :-)

> - I'd prefer to see the copy_file_range() loop split into a separate
>   copy_chunk_send_fn hook, with a fallback to vfswrap_copy_chunk_send(),
>   for the following reasons:
>   + this would avoid the 8M I/O buffer allocation in the fast-path.
>   + mid-function #ifdef pollution would be avoided.

I agree, I'll rework this.


>   + FSCTL Duplicate Extents To File doesn't permit server-side
>     read/write fallback. Blocking this in the copy_file_range()
>     parent function would be cleaner.

I think the intention of callers of FSCTL Duplicate Extents is to copy data
fast. NTFS is extent based and does not need a fallback to read/write. It just
always works as expected here. If we don't offer a r/w fallback and fail, then
this would just lead to clients having to do the r/w over the wire, which we
definetely don't want.  Also copy_file_range() doesn't guarantee any specific
technique of copying. It tries to do a file copy as fast and efficient as
possible. It tries reflinks on fs layer first, if reflinks are not supported,
it tries using r/w with sendfile and so on. I think the kerne guys even plan to
support cross device copying of data with copy_file_range. The fallback path is
always there, we can't turn it off with the generic copy_file_range call on our
side.

>   + I/O sizes could be tuned independently of the read/write loop.


Björn
-- 
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
  ☎ +49-551-370000-0, ℻ +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen



More information about the samba-technical mailing list