[LSF/MM TOPIC] Enhancing Copy Tools for Linux FS

Andreas Dilger adilger at dilger.ca
Fri Feb 8 22:37:45 UTC 2019

On Feb 8, 2019, at 8:19 AM, Steve French <smfrench at gmail.com> wrote:
> Current Linux copy tools have various problems compared to other
> platforms - small I/O sizes (and not even configurable for most),

Hmm, this comment puzzles me, since "cp" already uses s_blksize
returned for the file as the IO size?  Not sure if tar/rsync do
the same, but if they don't already use s_blksize they should.

> lack of parallel I/O for multi-file copies, inability to reduce metadata
> updates by setting file size first, lack of cross mount (to the same
> file system) copy optimizations, limited ability to handle the wide
> variety of server side copy (and copy offload) mechanisms and various
> error handling problems.   And copy tools rely less on the kernel file
> system (vs. code in the user space tool) in Linux than would be
> expected, in order to determine which optimizations to use.

The rest of these issues are definitely a concern.  It is worthwhile to
point out MPIFileUtils (https://github.com/hpc/mpifileutils) that already
solves a lot of these problems.  As the name suggests, it currently uses
MPI to run in parallel across multiple nodes, but it should be possible
to add a wrapper for the MPI calls in the library with fork()+exec() or
so and run multi-threaded on one node for parallel copy/find/sync/etc.

IMHO, it makes sense to try and optimize a single set of tools, rather
than adding yet another set of tools that are not widely used.  There
is also "mutils" (https://github.com/pkolano/mutil) which are patches
for GNU cp and md5sum, but they are less widely used vs. MPIFileUtils.

That said, most users are going to have GNU Fileutils installed, so the
best option is to add improvements directly into those tools if possible,
with the caveat that you will get a headache reading that code, and they
may object to including parallel extensions due to portability concerns.

> Would like to discuss some of the observations about copy tools and
> how we can move forward on improving the performance of common copy
> operations.

Unfortunately, I'm unable to attend LSF/MM this year, or this would
definitely be a topic of interest to me.

Cheers, Andreas

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 873 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20190208/1c269754/signature.sig>

More information about the samba-technical mailing list