User controlled i/o block size?

Tue Apr 12 18:54:36 UTC 2016

On Mon, Apr 11, 2016 at 7:05 PM, Kevin Korb <kmk at sanitarium.net> wrote:
> You didn't say if you were networking or what features of rsync you
> are using but if you aren't networking and aren't doing anything fancy
> you are probably better off with cp -au which is essentially the same
> as rsync -au except faster.

I was curious if "cp -au" was indeed as robust as rsync.

No it isn't.  My test:

Create a folder with numerous files in it (a dozen in my case).  Have
one of them be 9GB (or anything relatively big).

cp -au <src-folder> <dest-folder>

Look in the destination folder and when you see the 9GB file growing,
kill "cp -au".  (I just did a control-C).

Restart "cp -au".

I ended up with a truncated copy of the 9GB file.  (roughly a 3GB file.)

The copy I did yesterday was about 1200 files.  Almost all were about
1.5GB in size, so that was a multi-hour process to make the copy.

Using rsync, I can kill the copy at any time (by desire or system
issue) and just restart it.

Using the simple "rsync -avp --progress" command I end up recopying
the file that was in progress when rsync was aborted, but 1.5GB files
only take 10 or 15 seconds to copy, so that is a minimal wasted effort
when considering a copy process that runs for hours.

fyi: In my job I work with 100GB+ read-only datasets all the time.
The tools are all designed  to segment the data into 1.5 GB files.
One advantage is if a file becomes corrupt, just that segment file has
to be replaced.  All the large files are validated via MD5 hash (or
SHA-256, etc).  I keep a minimum of two copies of all datasets.
Yesterday I was making a third copy of several of the datasets, so I
had almost 2TB of data to copy.

Thanks
Greg
--
Greg Freemyer
www.IntelligentAvatar.net