rsync in-place (was Re: rsync 1tb+ each day)

Wed Feb 5 13:55:44 EST 2003

jw schultz wrote:
> 
> I was thinking more in terms of no block relocation at all.
> Checksums only match if at the same offset.  The receiver simply
> discards (or never gets) info about blocks that are
> unchanged.  It would just lseek and write with a possible
> truncate at the end.

This would seem to help a lot on larger database files. Why look at a
700 byte block of data from a source file and try to find a matching
block by fully scanning block checksums at all offsets in a 8G
destination datafile? And then doing it again for every 700 bytes? (I
read the rsync technical paper -- but I might be confused) 

In the case of Oracle data files the only place a meaningful/syncable
delta will occur is at the same offset. Yes this is a special case --
but it has the potential to really help in rsyncing oracle datafiles
during a hotbackup or when syncing from a snapshot to nearstore storage.
This approach should be faster than the -W option for very large Oracle
datafiles (which often have small amounts of changed blocks). It should
also be faster than deleting the destination files and resending (-W)
like has been suggested. 

> > You can imagine a smarter algorithm that does non-sequential writes to
> > the output so as to avoid writing over blocks that will be needed
> > later.  Alternatively, if you assume some amount of temporary storage,
> > then it might be possible to still produce output as a stream.
> 
> I really doubt it is worthwhile doing to rsync.  This
> principly applies to block oriented files such as devices
> and database files.  For the most part rsync handles these
> fine.

agreed. 

The original post still raises an interesting issue -- it should not be
faster to remove destination files before running rsync. That is counter
to one of the main purposes of rsync  -- efficiently detect and send
only the deltas. 

eric