Rsync 4TB datafiles...?
lihicks at gpi.com
Fri Mar 24 13:33:18 GMT 2006
Paul Slootman wrote:
> On Wed 22 Mar 2006, Linus Hicks wrote:
>> Paul Slootman wrote:
>>> I'd recommend doing --inplace, as chances are that data won't move
>>> within a file with oracle data files (so it's not useful to try to find
>>> moved data), and copying the 4TB to temp. files every time could become
>>> a big timewaster. Also the -t option could be handy, not all files
>>> change all the time IIRC.
>> The above remark about not being "useful to try to find moved data"
>> provoked an idea. But my understanding of --inplace is apparently different
>> from yours. I thought --inplace only meant that the destination file would
>> be directly overwritten, not that it would turn off any of the
>> optimizations for finding moved data.
> I go on what's in the manpage:
> This causes rsync not to create a new copy of the file and then move it
> into place. Instead rsync will overwrite the existing file, meaning
> that the rsync algorithm can't accomplish the full amount of network
> reduction it might be able to otherwise (since it does not yet try to
> sort data matches). One exception to this is if you combine the option
> with --backup, since rsync is smart enough to use the backup file as the
> basis file for the transfer.
Well, it would be nice if it were more explicit about what difference there is
in the "rsync algorithm", because from my experience, I would guess that it does
try to find moved data. What I have seen during my testing on a 1gbit network is
that a large file (I don't remember the exact details, but it was between 1gb -
4gb) took some seven minutes to rsync with no destination file. When there is a
destination file with just a few blocks changed, it took a little longer, and
with a lot of blocks changed, it took a lot longer, like four hours.
More information about the rsync