Rsync 4TB datafiles...?

Fri Mar 24 13:33:18 GMT 2006

Paul Slootman wrote:
> On Wed 22 Mar 2006, Linus Hicks wrote:
>> Paul Slootman wrote:
>>> I'd recommend doing --inplace, as chances are that data won't move
>>> within a file with oracle data files (so it's not useful to try to find
>>> moved data), and copying the 4TB to temp. files every time could become
>>> a big timewaster. Also the -t option could be handy, not all files
>>> change all the time IIRC.
>> The above remark about not being "useful to try to find moved data" 
>> provoked an idea. But my understanding of --inplace is apparently different 
>> from yours. I thought --inplace only meant that the destination file would 
>> be directly overwritten, not that it would turn off any of the 
>> optimizations for finding moved data.
> 
> I go on what's in the manpage:
> 
>  --inplace
>     This causes rsync not to create a new copy of the file and then move  it
>     into  place.   Instead  rsync  will overwrite the existing file, meaning
>     that the rsync algorithm can't accomplish the  full  amount  of  network
>     reduction  it  might  be able to otherwise (since it does not yet try to
>     sort data matches).  One exception to this is if you combine the  option
>     with --backup, since rsync is smart enough to use the backup file as the
>     basis file for the transfer.

Well, it would be nice if it were more explicit about what difference there is 
in the "rsync algorithm", because from my experience, I would guess that it does 
try to find moved data. What I have seen during my testing on a 1gbit network is 
that a large file (I don't remember the exact details, but it was between 1gb - 
4gb) took some seven minutes to rsync with no destination file. When there is a 
destination file with just a few blocks changed, it took a little longer, and 
with a lot of blocks changed, it took a lot longer, like four hours.

Linus