[PATCH 5/5] vfs_default: add copy_file_range support for copy chunk

David Disseldorp ddiss at samba.org
Sun Jan 22 13:09:38 UTC 2017


Hi Björn,

On Sat, 21 Jan 2017 02:52:56 +0100, Björn Jacke wrote:

> Hi David,
> 
> thanks for the feedback.
> 
> On 2017-01-20 at 14:55 +0100 David Disseldorp sent off:
> > On Fri, 20 Jan 2017 00:01:43 +0100, Björn Jacke wrote:
> >   
> > > we try to use the copy_file_range syscall first, which can ideally be a
> > > zero-copy operation. We fall back to userspace read/write if copy_file_range
> > > is not available.  
> > 
> > Thanks for working on this. A couple of comments on this patch:
> > - Please use the off_in/off_out syscall parameters instead of seeking.  
> 
> this is what I wanted to do in the first place. I hit a weird bug when doing a
> second copy chunc call on the same file twice. The reason is that the
> off_in/off_out parameters are realative to the current file offset and not
> absolute. Don't ask me what this is good for. When we call the copy chunk on
> the same file twice, the file offset is not 0 and we end up with a file
> offset+off_in larger that the file. The lseek to 0 is needed to sanitize this
> thing again. The man page isn't telling this very clear but if you knnow it and
> if you then read the man page you will spot the point :-)

That sounds like a horribly broken API. Do you have a standalone test
for this? Looking at the kernel VFS implementation[1], that doesn't
appear to be the case - what does happen to the off_in/off_out args is
that on success they're incremented by the amount copied, so maybe your
previous implementation was incorrectly updating the offsets twice, i.e.
once in kernel and again in your copy loop?

> >   + FSCTL Duplicate Extents To File doesn't permit server-side
> >     read/write fallback. Blocking this in the copy_file_range()
> >     parent function would be cleaner.  
> 
> I think the intention of callers of FSCTL Duplicate Extents is to copy data
> fast. NTFS is extent based and does not need a fallback to read/write. It just
> always works as expected here.

FSCTL Duplicate Extents To File is currently ReFS specific[2].

> If we don't offer a r/w fallback and fail, then
> this would just lead to clients having to do the r/w over the wire, which we
> definetely don't want.

I'll have to do some more tracing, but I think FSCTL Duplicate Extents
To File is only attempted if the FILE_SUPPORTS_BLOCK_REFCOUNTING flag is
present in the server's FS capabilities. If it's not available then the
client (again, IIRC) falls back to using a different copy offload
mechanism.
The MS-FSA documentation[3] explicitly states:
  The purpose of this operation is to make it look like a copy of a
  region from the source stream to the target stream has occurred when
  in reality no data is actually copied. This operation modifies the
  target stream's extent list such that, the same clusters are pointed
  to by both the source and target streams' extent lists for the region
  being copied.
So I don't think falling back to read/write would be the right thing to
do for this specific FSCTL.

> Also copy_file_range() doesn't guarantee any specific
> technique of copying. It tries to do a file copy as fast and efficient as
> possible. It tries reflinks on fs layer first, if reflinks are not supported,
> it tries using r/w with sendfile and so on. I think the kerne guys even plan to
> support cross device copying of data with copy_file_range. The fallback path is
> always there, we can't turn it off with the generic copy_file_range call on our
> side.

You're right. I think we'll have to use the raw BTRFS_IOC_CLONE_RANGE
+ XFS_IOC_CLONE_RANGE ioctls to handle this.

Cheers, David

1. copy_file_range() kernel VFS source
   https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/read_write.c?id=c497f8d17246720afe680ea1a8fa6e48e75af852#n1583

2. [MS-FSCC] FSCTL_DUPLICATE_EXTENTS_TO_FILE
   https://msdn.microsoft.com/en-us/library/mt182620.aspx#Appendix_A_Target_14

3. [MS-FSA] FSCTL_DUPLICATE_EXTENTS_TO_FILE
   https://msdn.microsoft.com/en-us/library/mt182614.aspx



More information about the samba-technical mailing list