[LSF/MM/BPF TOPIC] Enhancing Linux Copy Performance and Function and improving backup scenarios
smfrench at gmail.com
Sat Feb 1 19:54:46 UTC 2020
On Wed, Jan 29, 2020 at 7:54 PM Darrick J. Wong <darrick.wong at oracle.com> wrote:
> On Wed, Jan 22, 2020 at 05:13:53PM -0600, Steve French wrote:
> > As discussed last year:
> > Current Linux copy tools have various problems compared to other
> > platforms - small I/O sizes (and most don't allow it to be
> > configured), lack of parallel I/O for multi-file copies, inability to
> > reduce metadata updates by setting file size first, lack of cross
> ...and yet weirdly we tell everyone on xfs not to do that or to use
> fallocate, so that delayed speculative allocation can do its thing.
> We also tell them not to create deep directory trees because xfs isn't
Delayed speculative allocation may help xfs but changing file size
thousands of times for network and cluster fs for a single file copy
can be a disaster for other file systems (due to the excessive cost
it adds to metadata sync time) - so there are file systems where
setting the file size first can help
> > And copy tools rely less on
> > the kernel file system (vs. code in the user space tool) in Linux than
> > would be expected, in order to determine which optimizations to use.
> What kernel interfaces would we expect userspace to use to figure out
> the confusing mess of optimizations? :)
copy_file_range and clone_file_range are a good start ... few tools
use them ...
> There's a whole bunch of xfs ioctls like dioinfo and the like that we
> ought to push to statx too. Is that an example of what you mean?
That is a good example. And then getting tools to use these,
even if there are some file system dependent cases.
> > But some progress has been made since last year's summit, with new
> > copy tools being released and improvements to some of the kernel file
> > systems, and also some additional feedback on lwn and on the mailing
> > lists. In addition these discussions have prompted additional
> > feedback on how to improve file backup/restore scenarios (e.g. to
> > mounts to the cloud from local Linux systems) which require preserving
> > more timestamps, ACLs and metadata, and preserving them efficiently.
> I suppose it would be useful to think a little more about cross-device
> fs copies considering that the "devices" can be VM block devs backed by
> files on a filesystem that supports reflink. I have no idea how you
> manage that sanely though.
I trust XFS and BTRFS and SMB3 and cluster fs etc. to solve this better
than the block level (better locking, leases/delegation, state management, etc.)
More information about the samba-technical