[LSF/MM TOPIC] Enhancing Copy Tools for Linux FS

Andreas Dilger adilger at dilger.ca
Mon Feb 11 08:32:11 UTC 2019


On Feb 8, 2019, at 4:56 PM, Steve French <smfrench at gmail.com> wrote:
> 
> On Fri, Feb 8, 2019 at 5:03 PM Steve French <smfrench at gmail.com> wrote:
>> 
>> On Fri, Feb 8, 2019 at 4:37 PM Andreas Dilger <adilger at dilger.ca> wrote:
>>> 
>>> On Feb 8, 2019, at 8:19 AM, Steve French <smfrench at gmail.com> wrote:
>>>> 
>>>> Current Linux copy tools have various problems compared to other
>>>> platforms - small I/O sizes (and not even configurable for most),
>>> 
>>> Hmm, this comment puzzles me, since "cp" already uses s_blksize
>>> returned for the file as the IO size?  Not sure if tar/rsync do
>>> the same, but if they don't already use s_blksize they should.
> 
> I did some experiments changing the block size returned from 1K to 64K to 1MB
> and see no difference in the copy size used by cp (it was always 128K in all
> the cases when caching is disabled)

Strange.  I just re-tested this on Lustre, in case something had changed in
GNU fileutils that I didn't notice, and it worked fine for me, using both
"cp --version = 8.4" on RHEL and "cp --version = 8.26" on Ubuntu:

$ dd if=/dev/urandom of=/tmp/foo bs=1M count=12
$ strace -v cp /tmp/foo /testfs/tmp
:
open("/tmp/foo", O_RDONLY)              = 3
fstat(3, {... st_blksize=4096, st_blocks=24576, st_size=12582912, ...}) = 0
open("/testfs/tmp/foo", O_WRONLY|O_CREAT|O_EXCL, 0664) = 4
fstat(4, { ... st_blksize=4194304, st_blocks=0, st_size=0, ...}) = 0
read(3, "h\230#`\2\223\273\3423W\24\222:\2113w\327"..., 4194304) = 4194304
write(4, "h\230#`\2\223\273\3423W\24\222:\2113w\327"..., 4194304) = 4194304
:

Note the "st_blksize=4194304" for the target file returned by Lustre matches
the read and write buffer size used by "cp".  The same is true if Lustre is
the source file and not the target, so it probably picks the maximum of both:

open("/testfs/tmp/foo", O_RDONLY)     = 3
fstat(3, {... st_blksize=4194304, st_blocks=24576, st_size=12582912 ...}) = 0
open("/tmp/bar", O_WRONLY|O_TRUNC)      = 4
fstat(4, {... st_blksize=4096, st_blocks=0, st_size=0 ...}) = 0
read(3, "h\230#`\2\223\273\3423W\24\222:\2113w\327"..., 4194304) = 4194304
write(4, "h\230#`\2\223\273\3423W\24\222:\2113w\327"..., 4194304) = 4194304
:

Running the same command with /tmp as the target uses a smaller buffer size
matching the "st_blocks=32768" and correspondingly more read/write calls:

$ strace -v cp /tmp/foo /tmp/baz
:
open("/tmp/baz", O_WRONLY|O_CREAT|O_EXCL, 0664) = 4
fstat(4, {... st_blksize=4096, st_blocks=0, st_size=0, ...}) = 0
read(3, "h\230#`\2\223\273\3423W\24\222:\2113w\327"..., 32768) = 32768
write(4, "h\230#`\2\223\273\3423W\24\222:\2113w\327"..., 32768) = 32768
:

In this case, cp probably has some minimum buffer size it uses to avoid the
poor performance of using 4KB blocks.

Cheers, Andreas





-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 873 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20190211/df069eaa/signature.sig>


More information about the samba-technical mailing list