Parallel data transfer

Nikolaus Rath Nikolaus at rath.org
Tue Aug 3 19:31:17 MDT 2010


Hello,

I want to copy lots of small files from a network file system. Reading
a file always takes at least one network round trip. This makes any
program that tries to copy the files one after another terribly slow.

I searched the archives and already found that rsync is not (and
will not be for quite some time) able to work in parallel, but that
people have had success by starting several rsync instances which work
on different parts of the tree by using exclusion patterns.

Therefore I wanted to write a little wrapper script that starts x rsync
instances with the same set of parameters, but adds an additional filter
rule that makes sure that the instances work on different files.

However, I can't quite figure out how to write the file rules. Actually,
I'm starting to get the impression that it's impossible to get what I
want. Hopefully someone will be able to prove me wrong :-).

Suppose I just have x=2. Then I would like one instance to work on all
files ending with [a-m] and the other instance on everything else. All I
could come up with for e.g. the second instance was this:

# Include all directories
include */
# Exclude files ending with wrong suffix
exclude,! *[a-m]

But this messes up any additional exclude option that may have been
passed by the user and was supposed to work on directories. Is there any
way to exclude all non-directories that do not end with [a-m]?


Also, I am not sure how to protect the excluded files from being
deleted if --delete-exclude was specified. Will

protect,! *[a-m]

do what I want? The rsync manpage has just one sentence about this:
"protect, P specifies a pattern for protecting files from deletion.".
But does this work even against --delete-excluded? Or do I need the
"hide" rule prefix? It seems that rsync(1) lacks a
"HIDE/SHOW/PROTECT/RISK PATTERN RULES" section similar to the sections
about include/exclude and merge.



Or am I completely on the wrong track and there is a much easier way to
copy lots of files in parallel?


Thanks for reading all this,


   -Nikolaus

-- 
 »Time flies like an arrow, fruit flies like a Banana.«

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C


More information about the rsync mailing list